Hi guys
A bit of curve ball here ... I have a document (Word) that contains a series
of instructions in sections and subsections (and sub-subsections). There are
350 pages of them.
I need to translate these instructions into something that can be processed
automatically, so I have used the Command pattern to set up a set of
commands that correspond to the various instructions in the document.
I have started to enter the instructions into an xml file, which I can
deserialise into my command hierarchy. However, transcribing 350 pages into
an xml document is tedious, time-consuming and error prone. Because I have
sections and subsections, my xml file is quite wide, as well as very long. I
use XMLSpy to edit the file, but I am forever scrolling backwards and
forwards, up and down, cutting and pasting, and losing my place.
Does anyone have any thoughts on how I might improve the situation, make my
file more maintainable, and perhaps automate the process somehow?
My first thought is to write a simple program to maintain the xml file, but
that could take just as long as entering the data.
Any thoughts very welcome.
TIA
Charles
Jul 21 '05
12 1657
Hi Jay
It is a create once, update occasionally file. Unfortunately, the creation
and maintenance of the document is outside my control, and at 650 pages
(last count) it is unlikely that the client will change it now to fit some
template that I might define.
I am currently looking at creating a VB.NET program to iterate through the
document extracting the bits I need, and perhaps changing them to be more
consistent. You mention VBA script: is that for a specific reason (as
opposed to VB.NET) or does it not matter especially?
Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. .. Charles, The resultant file, with no transform, was 9 Mb. That is where doing what Thug & I suggested first using a VBA Script to automate cleaning up the document first. Getting it closer to a "nicer" XML format first. Then save it, then possible apply an XSLT, then process it....
Is this document a one time thing or is it going to be ongoing?
If its ongoing I would seriously consider defining a template in Word that helps enforce the format required.
Hope this helps Jay
"Charles Law" <bl***@nowhere. com> wrote in message news:e%******** ********@TK2MSF TNGP09.phx.gbl. .. Hi Jay
I noticed the Save As XML so tried it (I have just moved from Word XP to 2003). The resultant file, with no transform, was 9 Mb. I then tried to load it into XMLSpy and after about 10 minutes of a blank window it GPF'ed on me :-(
I think you have probably hit on something though, but I don't know XSLT well enough to know how to start with transforming the file. From what I could make of the file after loading it in Notepad, it contains a tremendous amount of bloat. For example, formatting and layout information that I just don't need. I really only want the structure, after the first pass anyway. Then I could set about translating the text into something more formal. Also, this translation process will be a one-off, or at most occasional when the document changes. It will be the cut down, formal xml file that my program will read at start-up.
Thanks for the suggestion. I will look into it further.
Cheers.
Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in message news:uz******** ********@TK2MSF TNGP10.phx.gbl. .. Charles, Which version of Word?
Later versions of Word (XP, 2003, not sure about 2000) support saving as an XML file.
I would then consider passing Word's XML file to a XSLT transform to "simplify" the document, then read this "simplified " XML in my program...
Looking at the help for Word 2003, you might be able to define an Xml Schema that you could attach to your Word Document replace parts of the Word document with Xml tags. I would think with some effort you might be able to automate replacing parts of the document with tags, which may eliminate the need for the XSLT transform.
Note: I've used Xml in Word very minimally.
Hope this helps Jay
"Charles Law" <bl***@nowhere. com> wrote in message news:%2******** ********@TK2MSF TNGP15.phx.gbl. .. Hi guys
A bit of curve ball here ... I have a document (Word) that contains a series of instructions in sections and subsections (and sub-subsections). There are 350 pages of them.
I need to translate these instructions into something that can be processed automatically, so I have used the Command pattern to set up a set of commands that correspond to the various instructions in the document.
I have started to enter the instructions into an xml file, which I can deserialise into my command hierarchy. However, transcribing 350 pages into an xml document is tedious, time-consuming and error prone. Because I have sections and subsections, my xml file is quite wide, as well as very long. I use XMLSpy to edit the file, but I am forever scrolling backwards and forwards, up and down, cutting and pasting, and losing my place.
Does anyone have any thoughts on how I might improve the situation, make my file more maintainable, and perhaps automate the process somehow?
My first thought is to write a simple program to maintain the xml file, but that could take just as long as entering the data.
Any thoughts very welcome.
TIA
Charles
Charles,
The VBA script runs within Word, VB.NET would drive word.
If the VBA script is going to be doing a lot, then it may run faster then
VB.NET will, as VBA is an in-process COM object, while VB.NET is (normally)
an out-of-process COM Interop object.
If the script is only going to be one or two routines I find doing it
directly in Word is easier then creating a VB.NET program to do it,
especially if the routine is only going to be used once.
If the problem looks like it could benefit from OO then I start with VB.NET
to leverage OO. If the problem looks like it will simply be one or two
routines & a couple of loops, I leave it as VBA.
Of course if the routine needs to be used often in that it is tied to a
specific VB.NET program, then its generally easier to make it part of the
VB.NET program although its only one or two routines...
Using "Tools - Upgrade Visual Basic 6 Code" I've converted VBA code to
VB.NET code.
Hope this helps
Jay
"Charles Law" <bl***@nowhere. com> wrote in message
news:OV******** ******@TK2MSFTN GP15.phx.gbl... Hi Jay
It is a create once, update occasionally file. Unfortunately, the creation and maintenance of the document is outside my control, and at 650 pages (last count) it is unlikely that the client will change it now to fit some template that I might define.
I am currently looking at creating a VB.NET program to iterate through the document extracting the bits I need, and perhaps changing them to be more consistent. You mention VBA script: is that for a specific reason (as opposed to VB.NET) or does it not matter especially?
Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in message news:%2******** ********@TK2MSF TNGP14.phx.gbl. .. Charles, The resultant file, with no transform, was 9 Mb. That is where doing what Thug & I suggested first using a VBA Script to automate cleaning up the document first. Getting it closer to a "nicer" XML format first. Then save it, then possible apply an XSLT, then process it....
Is this document a one time thing or is it going to be ongoing?
If its ongoing I would seriously consider defining a template in Word that helps enforce the format required.
Hope this helps Jay
"Charles Law" <bl***@nowhere. com> wrote in message news:e%******** ********@TK2MSF TNGP09.phx.gbl. .. Hi Jay
I noticed the Save As XML so tried it (I have just moved from Word XP to 2003). The resultant file, with no transform, was 9 Mb. I then tried to load it into XMLSpy and after about 10 minutes of a blank window it GPF'ed on me :-(
I think you have probably hit on something though, but I don't know XSLT well enough to know how to start with transforming the file. From what I could make of the file after loading it in Notepad, it contains a tremendous amount of bloat. For example, formatting and layout information that I just don't need. I really only want the structure, after the first pass anyway. Then I could set about translating the text into something more formal. Also, this translation process will be a one-off, or at most occasional when the document changes. It will be the cut down, formal xml file that my program will read at start-up.
Thanks for the suggestion. I will look into it further.
Cheers.
Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in message news:uz******** ********@TK2MSF TNGP10.phx.gbl. .. Charles, Which version of Word?
Later versions of Word (XP, 2003, not sure about 2000) support saving as an XML file.
I would then consider passing Word's XML file to a XSLT transform to "simplify" the document, then read this "simplified " XML in my program...
Looking at the help for Word 2003, you might be able to define an Xml Schema that you could attach to your Word Document replace parts of the Word document with Xml tags. I would think with some effort you might be able to automate replacing parts of the document with tags, which may eliminate the need for the XSLT transform.
Note: I've used Xml in Word very minimally.
Hope this helps Jay
"Charles Law" <bl***@nowhere. com> wrote in message news:%2******** ********@TK2MSF TNGP15.phx.gbl. .. > Hi guys > > A bit of curve ball here ... I have a document (Word) that contains a > series of instructions in sections and subsections (and > sub-subsections). There are 350 pages of them. > > I need to translate these instructions into something that can be > processed automatically, so I have used the Command pattern to set up > a set of commands that correspond to the various instructions in the > document. > > I have started to enter the instructions into an xml file, which I can > deserialise into my command hierarchy. However, transcribing 350 pages > into an xml document is tedious, time-consuming and error prone. > Because I have sections and subsections, my xml file is quite wide, as > well as very long. I use XMLSpy to edit the file, but I am forever > scrolling backwards and forwards, up and down, cutting and pasting, > and losing my place. > > Does anyone have any thoughts on how I might improve the situation, > make my file more maintainable, and perhaps automate the process > somehow? > > My first thought is to write a simple program to maintain the xml > file, but that could take just as long as entering the data. > > Any thoughts very welcome. > > TIA > > Charles > >
Jay
Thanks for the clarification.
Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in message
news:%2******** ********@TK2MSF TNGP11.phx.gbl. .. Charles, The VBA script runs within Word, VB.NET would drive word.
If the VBA script is going to be doing a lot, then it may run faster then VB.NET will, as VBA is an in-process COM object, while VB.NET is (normally) an out-of-process COM Interop object.
If the script is only going to be one or two routines I find doing it directly in Word is easier then creating a VB.NET program to do it, especially if the routine is only going to be used once.
If the problem looks like it could benefit from OO then I start with VB.NET to leverage OO. If the problem looks like it will simply be one or two routines & a couple of loops, I leave it as VBA.
Of course if the routine needs to be used often in that it is tied to a specific VB.NET program, then its generally easier to make it part of the VB.NET program although its only one or two routines...
Using "Tools - Upgrade Visual Basic 6 Code" I've converted VBA code to VB.NET code.
Hope this helps Jay
"Charles Law" <bl***@nowhere. com> wrote in message news:OV******** ******@TK2MSFTN GP15.phx.gbl... Hi Jay
It is a create once, update occasionally file. Unfortunately, the creation and maintenance of the document is outside my control, and at 650 pages (last count) it is unlikely that the client will change it now to fit some template that I might define.
I am currently looking at creating a VB.NET program to iterate through the document extracting the bits I need, and perhaps changing them to be more consistent. You mention VBA script: is that for a specific reason (as opposed to VB.NET) or does it not matter especially?
Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in message news:%2******** ********@TK2MSF TNGP14.phx.gbl. .. Charles, The resultant file, with no transform, was 9 Mb. That is where doing what Thug & I suggested first using a VBA Script to automate cleaning up the document first. Getting it closer to a "nicer" XML format first. Then save it, then possible apply an XSLT, then process it....
Is this document a one time thing or is it going to be ongoing?
If its ongoing I would seriously consider defining a template in Word that helps enforce the format required.
Hope this helps Jay
"Charles Law" <bl***@nowhere. com> wrote in message news:e%******** ********@TK2MSF TNGP09.phx.gbl. .. Hi Jay
I noticed the Save As XML so tried it (I have just moved from Word XP to 2003). The resultant file, with no transform, was 9 Mb. I then tried to load it into XMLSpy and after about 10 minutes of a blank window it GPF'ed on me :-(
I think you have probably hit on something though, but I don't know XSLT well enough to know how to start with transforming the file. From what I could make of the file after loading it in Notepad, it contains a tremendous amount of bloat. For example, formatting and layout information that I just don't need. I really only want the structure, after the first pass anyway. Then I could set about translating the text into something more formal. Also, this translation process will be a one-off, or at most occasional when the document changes. It will be the cut down, formal xml file that my program will read at start-up.
Thanks for the suggestion. I will look into it further.
Cheers.
Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in message news:uz******** ********@TK2MSF TNGP10.phx.gbl. .. > Charles, > Which version of Word? > > Later versions of Word (XP, 2003, not sure about 2000) support saving > as an XML file. > > I would then consider passing Word's XML file to a XSLT transform to > "simplify" the document, then read this "simplified " XML in my > program... > > Looking at the help for Word 2003, you might be able to define an Xml > Schema that you could attach to your Word Document replace parts of > the Word document with Xml tags. I would think with some effort you > might be able to automate replacing parts of the document with tags, > which may eliminate the need for the XSLT transform. > > Note: I've used Xml in Word very minimally. > > Hope this helps > Jay > > "Charles Law" <bl***@nowhere. com> wrote in message > news:%2******** ********@TK2MSF TNGP15.phx.gbl. .. >> Hi guys >> >> A bit of curve ball here ... I have a document (Word) that contains a >> series of instructions in sections and subsections (and >> sub-subsections). There are 350 pages of them. >> >> I need to translate these instructions into something that can be >> processed automatically, so I have used the Command pattern to set up >> a set of commands that correspond to the various instructions in the >> document. >> >> I have started to enter the instructions into an xml file, which I >> can deserialise into my command hierarchy. However, transcribing 350 >> pages into an xml document is tedious, time-consuming and error >> prone. Because I have sections and subsections, my xml file is quite >> wide, as well as very long. I use XMLSpy to edit the file, but I am >> forever scrolling backwards and forwards, up and down, cutting and >> pasting, and losing my place. >> >> Does anyone have any thoughts on how I might improve the situation, >> make my file more maintainable, and perhaps automate the process >> somehow? >> >> My first thought is to write a simple program to maintain the xml >> file, but that could take just as long as entering the data. >> >> Any thoughts very welcome. >> >> TIA >> >> Charles >> >> > >
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: ÂÑTØÑ |
last post by:
Hi,
I was looking for a list of commands, but I can't find it.
It's about commands you can type in the Internet Explorer adress bar, to get
some information about a website.
For instance "javascript:alert(document.lastmodified)" to find out when the
website was updated.
Can someone help me out?
Thanx in advance,
|
by: Charles Law |
last post by:
Hi guys
A bit of curve ball here ... I have a document (Word) that contains a series
of instructions in sections and subsections (and sub-subsections). There are
350 pages of them.
I need to translate these instructions into something that can be processed
automatically, so I have used the Command pattern to set up a set of
commands that correspond to the various instructions in the document.
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
| |
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |