473,756 Members | 1,799 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Looking for Ideas: Translating Document into Commands in XML

Hi guys

A bit of curve ball here ... I have a document (Word) that contains a series
of instructions in sections and subsections (and sub-subsections). There are
350 pages of them.

I need to translate these instructions into something that can be processed
automatically, so I have used the Command pattern to set up a set of
commands that correspond to the various instructions in the document.

I have started to enter the instructions into an xml file, which I can
deserialise into my command hierarchy. However, transcribing 350 pages into
an xml document is tedious, time-consuming and error prone. Because I have
sections and subsections, my xml file is quite wide, as well as very long. I
use XMLSpy to edit the file, but I am forever scrolling backwards and
forwards, up and down, cutting and pasting, and losing my place.

Does anyone have any thoughts on how I might improve the situation, make my
file more maintainable, and perhaps automate the process somehow?

My first thought is to write a simple program to maintain the xml file, but
that could take just as long as entering the data.

Any thoughts very welcome.

TIA

Charles
Jul 21 '05
12 1657
Hi Jay

It is a create once, update occasionally file. Unfortunately, the creation
and maintenance of the document is outside my control, and at 650 pages
(last count) it is unlikely that the client will change it now to fit some
template that I might define.

I am currently looking at creating a VB.NET program to iterate through the
document extracting the bits I need, and perhaps changing them to be more
consistent. You mention VBA script: is that for a specific reason (as
opposed to VB.NET) or does it not matter especially?

Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. ..
Charles,
The resultant file, with no transform, was 9 Mb.

That is where doing what Thug & I suggested first using a VBA Script to
automate cleaning up the document first. Getting it closer to a "nicer"
XML format first. Then save it, then possible apply an XSLT, then process
it....

Is this document a one time thing or is it going to be ongoing?

If its ongoing I would seriously consider defining a template in Word that
helps enforce the format required.

Hope this helps
Jay

"Charles Law" <bl***@nowhere. com> wrote in message
news:e%******** ********@TK2MSF TNGP09.phx.gbl. ..
Hi Jay

I noticed the Save As XML so tried it (I have just moved from Word XP to
2003). The resultant file, with no transform, was 9 Mb. I then tried to
load it into XMLSpy and after about 10 minutes of a blank window it
GPF'ed on me :-(

I think you have probably hit on something though, but I don't know XSLT
well enough to know how to start with transforming the file. From what I
could make of the file after loading it in Notepad, it contains a
tremendous amount of bloat. For example, formatting and layout
information that I just don't need. I really only want the structure,
after the first pass anyway. Then I could set about translating the text
into something more formal. Also, this translation process will be a
one-off, or at most occasional when the document changes. It will be the
cut down, formal xml file that my program will read at start-up.

Thanks for the suggestion. I will look into it further.

Cheers.

Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in message
news:uz******** ********@TK2MSF TNGP10.phx.gbl. ..
Charles,
Which version of Word?

Later versions of Word (XP, 2003, not sure about 2000) support saving as
an XML file.

I would then consider passing Word's XML file to a XSLT transform to
"simplify" the document, then read this "simplified " XML in my
program...

Looking at the help for Word 2003, you might be able to define an Xml
Schema that you could attach to your Word Document replace parts of the
Word document with Xml tags. I would think with some effort you might be
able to automate replacing parts of the document with tags, which may
eliminate the need for the XSLT transform.

Note: I've used Xml in Word very minimally.

Hope this helps
Jay

"Charles Law" <bl***@nowhere. com> wrote in message
news:%2******** ********@TK2MSF TNGP15.phx.gbl. ..
Hi guys

A bit of curve ball here ... I have a document (Word) that contains a
series of instructions in sections and subsections (and
sub-subsections). There are 350 pages of them.

I need to translate these instructions into something that can be
processed automatically, so I have used the Command pattern to set up a
set of commands that correspond to the various instructions in the
document.

I have started to enter the instructions into an xml file, which I can
deserialise into my command hierarchy. However, transcribing 350 pages
into an xml document is tedious, time-consuming and error prone.
Because I have sections and subsections, my xml file is quite wide, as
well as very long. I use XMLSpy to edit the file, but I am forever
scrolling backwards and forwards, up and down, cutting and pasting, and
losing my place.

Does anyone have any thoughts on how I might improve the situation,
make my file more maintainable, and perhaps automate the process
somehow?

My first thought is to write a simple program to maintain the xml file,
but that could take just as long as entering the data.

Any thoughts very welcome.

TIA

Charles



Jul 21 '05 #11
Charles,
The VBA script runs within Word, VB.NET would drive word.

If the VBA script is going to be doing a lot, then it may run faster then
VB.NET will, as VBA is an in-process COM object, while VB.NET is (normally)
an out-of-process COM Interop object.

If the script is only going to be one or two routines I find doing it
directly in Word is easier then creating a VB.NET program to do it,
especially if the routine is only going to be used once.

If the problem looks like it could benefit from OO then I start with VB.NET
to leverage OO. If the problem looks like it will simply be one or two
routines & a couple of loops, I leave it as VBA.

Of course if the routine needs to be used often in that it is tied to a
specific VB.NET program, then its generally easier to make it part of the
VB.NET program although its only one or two routines...

Using "Tools - Upgrade Visual Basic 6 Code" I've converted VBA code to
VB.NET code.

Hope this helps
Jay

"Charles Law" <bl***@nowhere. com> wrote in message
news:OV******** ******@TK2MSFTN GP15.phx.gbl...
Hi Jay

It is a create once, update occasionally file. Unfortunately, the creation
and maintenance of the document is outside my control, and at 650 pages
(last count) it is unlikely that the client will change it now to fit some
template that I might define.

I am currently looking at creating a VB.NET program to iterate through the
document extracting the bits I need, and perhaps changing them to be more
consistent. You mention VBA script: is that for a specific reason (as
opposed to VB.NET) or does it not matter especially?

Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. ..
Charles,
The resultant file, with no transform, was 9 Mb.

That is where doing what Thug & I suggested first using a VBA Script to
automate cleaning up the document first. Getting it closer to a "nicer"
XML format first. Then save it, then possible apply an XSLT, then process
it....

Is this document a one time thing or is it going to be ongoing?

If its ongoing I would seriously consider defining a template in Word
that helps enforce the format required.

Hope this helps
Jay

"Charles Law" <bl***@nowhere. com> wrote in message
news:e%******** ********@TK2MSF TNGP09.phx.gbl. ..
Hi Jay

I noticed the Save As XML so tried it (I have just moved from Word XP to
2003). The resultant file, with no transform, was 9 Mb. I then tried to
load it into XMLSpy and after about 10 minutes of a blank window it
GPF'ed on me :-(

I think you have probably hit on something though, but I don't know XSLT
well enough to know how to start with transforming the file. From what I
could make of the file after loading it in Notepad, it contains a
tremendous amount of bloat. For example, formatting and layout
information that I just don't need. I really only want the structure,
after the first pass anyway. Then I could set about translating the text
into something more formal. Also, this translation process will be a
one-off, or at most occasional when the document changes. It will be the
cut down, formal xml file that my program will read at start-up.

Thanks for the suggestion. I will look into it further.

Cheers.

Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in
message news:uz******** ********@TK2MSF TNGP10.phx.gbl. ..
Charles,
Which version of Word?

Later versions of Word (XP, 2003, not sure about 2000) support saving
as an XML file.

I would then consider passing Word's XML file to a XSLT transform to
"simplify" the document, then read this "simplified " XML in my
program...

Looking at the help for Word 2003, you might be able to define an Xml
Schema that you could attach to your Word Document replace parts of the
Word document with Xml tags. I would think with some effort you might
be able to automate replacing parts of the document with tags, which
may eliminate the need for the XSLT transform.

Note: I've used Xml in Word very minimally.

Hope this helps
Jay

"Charles Law" <bl***@nowhere. com> wrote in message
news:%2******** ********@TK2MSF TNGP15.phx.gbl. ..
> Hi guys
>
> A bit of curve ball here ... I have a document (Word) that contains a
> series of instructions in sections and subsections (and
> sub-subsections). There are 350 pages of them.
>
> I need to translate these instructions into something that can be
> processed automatically, so I have used the Command pattern to set up
> a set of commands that correspond to the various instructions in the
> document.
>
> I have started to enter the instructions into an xml file, which I can
> deserialise into my command hierarchy. However, transcribing 350 pages
> into an xml document is tedious, time-consuming and error prone.
> Because I have sections and subsections, my xml file is quite wide, as
> well as very long. I use XMLSpy to edit the file, but I am forever
> scrolling backwards and forwards, up and down, cutting and pasting,
> and losing my place.
>
> Does anyone have any thoughts on how I might improve the situation,
> make my file more maintainable, and perhaps automate the process
> somehow?
>
> My first thought is to write a simple program to maintain the xml
> file, but that could take just as long as entering the data.
>
> Any thoughts very welcome.
>
> TIA
>
> Charles
>
>



Jul 21 '05 #12
Jay

Thanks for the clarification.

Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in message
news:%2******** ********@TK2MSF TNGP11.phx.gbl. ..
Charles,
The VBA script runs within Word, VB.NET would drive word.

If the VBA script is going to be doing a lot, then it may run faster then
VB.NET will, as VBA is an in-process COM object, while VB.NET is
(normally) an out-of-process COM Interop object.

If the script is only going to be one or two routines I find doing it
directly in Word is easier then creating a VB.NET program to do it,
especially if the routine is only going to be used once.

If the problem looks like it could benefit from OO then I start with
VB.NET to leverage OO. If the problem looks like it will simply be one or
two routines & a couple of loops, I leave it as VBA.

Of course if the routine needs to be used often in that it is tied to a
specific VB.NET program, then its generally easier to make it part of the
VB.NET program although its only one or two routines...

Using "Tools - Upgrade Visual Basic 6 Code" I've converted VBA code to
VB.NET code.

Hope this helps
Jay

"Charles Law" <bl***@nowhere. com> wrote in message
news:OV******** ******@TK2MSFTN GP15.phx.gbl...
Hi Jay

It is a create once, update occasionally file. Unfortunately, the
creation and maintenance of the document is outside my control, and at
650 pages (last count) it is unlikely that the client will change it now
to fit some template that I might define.

I am currently looking at creating a VB.NET program to iterate through
the document extracting the bits I need, and perhaps changing them to be
more consistent. You mention VBA script: is that for a specific reason
(as opposed to VB.NET) or does it not matter especially?

Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. ..
Charles,
The resultant file, with no transform, was 9 Mb.
That is where doing what Thug & I suggested first using a VBA Script to
automate cleaning up the document first. Getting it closer to a "nicer"
XML format first. Then save it, then possible apply an XSLT, then
process it....

Is this document a one time thing or is it going to be ongoing?

If its ongoing I would seriously consider defining a template in Word
that helps enforce the format required.

Hope this helps
Jay

"Charles Law" <bl***@nowhere. com> wrote in message
news:e%******** ********@TK2MSF TNGP09.phx.gbl. ..
Hi Jay

I noticed the Save As XML so tried it (I have just moved from Word XP
to 2003). The resultant file, with no transform, was 9 Mb. I then tried
to load it into XMLSpy and after about 10 minutes of a blank window it
GPF'ed on me :-(

I think you have probably hit on something though, but I don't know
XSLT well enough to know how to start with transforming the file. From
what I could make of the file after loading it in Notepad, it contains
a tremendous amount of bloat. For example, formatting and layout
information that I just don't need. I really only want the structure,
after the first pass anyway. Then I could set about translating the
text into something more formal. Also, this translation process will be
a one-off, or at most occasional when the document changes. It will be
the cut down, formal xml file that my program will read at start-up.

Thanks for the suggestion. I will look into it further.

Cheers.

Charles
"Jay B. Harlow [MVP - Outlook]" <Ja************ @msn.com> wrote in
message news:uz******** ********@TK2MSF TNGP10.phx.gbl. ..
> Charles,
> Which version of Word?
>
> Later versions of Word (XP, 2003, not sure about 2000) support saving
> as an XML file.
>
> I would then consider passing Word's XML file to a XSLT transform to
> "simplify" the document, then read this "simplified " XML in my
> program...
>
> Looking at the help for Word 2003, you might be able to define an Xml
> Schema that you could attach to your Word Document replace parts of
> the Word document with Xml tags. I would think with some effort you
> might be able to automate replacing parts of the document with tags,
> which may eliminate the need for the XSLT transform.
>
> Note: I've used Xml in Word very minimally.
>
> Hope this helps
> Jay
>
> "Charles Law" <bl***@nowhere. com> wrote in message
> news:%2******** ********@TK2MSF TNGP15.phx.gbl. ..
>> Hi guys
>>
>> A bit of curve ball here ... I have a document (Word) that contains a
>> series of instructions in sections and subsections (and
>> sub-subsections). There are 350 pages of them.
>>
>> I need to translate these instructions into something that can be
>> processed automatically, so I have used the Command pattern to set up
>> a set of commands that correspond to the various instructions in the
>> document.
>>
>> I have started to enter the instructions into an xml file, which I
>> can deserialise into my command hierarchy. However, transcribing 350
>> pages into an xml document is tedious, time-consuming and error
>> prone. Because I have sections and subsections, my xml file is quite
>> wide, as well as very long. I use XMLSpy to edit the file, but I am
>> forever scrolling backwards and forwards, up and down, cutting and
>> pasting, and losing my place.
>>
>> Does anyone have any thoughts on how I might improve the situation,
>> make my file more maintainable, and perhaps automate the process
>> somehow?
>>
>> My first thought is to write a simple program to maintain the xml
>> file, but that could take just as long as entering the data.
>>
>> Any thoughts very welcome.
>>
>> TIA
>>
>> Charles
>>
>>
>
>



Jul 21 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
9596
by: ÂÑTØÑ | last post by:
Hi, I was looking for a list of commands, but I can't find it. It's about commands you can type in the Internet Explorer adress bar, to get some information about a website. For instance "javascript:alert(document.lastmodified)" to find out when the website was updated. Can someone help me out? Thanx in advance,
12
1016
by: Charles Law | last post by:
Hi guys A bit of curve ball here ... I have a document (Word) that contains a series of instructions in sections and subsections (and sub-subsections). There are 350 pages of them. I need to translate these instructions into something that can be processed automatically, so I have used the Command pattern to set up a set of commands that correspond to the various instructions in the document.
0
9462
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10046
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9886
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9722
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8723
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6542
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5155
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5318
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2677
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.