473,396 Members | 2,076 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

XML and Word Docs

An article at http://news.com.com/2100-1012-991694.html?tag=fd_top states:
"XML [in Office 2003] would allow easier interchange of data generated in
Office documents with back-end systems or existing Web services."

As part of an Access 2000 application, I have to continually parse Word
documents and store the parsings in Access tables using Automation to
control Word and parse the document. Is there a way that XML would help with
that?

Thanks.
Nov 13 '05 #1
10 3015
Check out the articals on wml, which is the type of xml that word uses.
(wml = Word processing eXended mark-up Language, for what's it's
worth.)

I've played a bit with WML and Access, and I can't say as it's any
better (or worse) then writting or reading HTML or any other form of
XML, but it's interesting (and a bit of fun) to write a Word document
without having to use RTF or having word installed anywhere. In
addition, WML is a *heck* of a lot easier to work with then RTF!

Google "WML" and you'll probably find a bunch of information on it.
That's how I started, but sadly, it appears that I didn't keep the
links. I think I started out at xml.org, but don''t quote me on that.

Nov 13 '05 #2
Thanks, Chuck. Probably wouldn't help too much with my parsing issue,
though, right?

"Chuck Grimsby" <c.*******@worldnet.att.net> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Check out the articals on wml, which is the type of xml that word uses.
(wml = Word processing eXended mark-up Language, for what's it's
worth.)

I've played a bit with WML and Access, and I can't say as it's any
better (or worse) then writting or reading HTML or any other form of
XML, but it's interesting (and a bit of fun) to write a Word document
without having to use RTF or having word installed anywhere. In
addition, WML is a *heck* of a lot easier to work with then RTF!

Google "WML" and you'll probably find a bunch of information on it.
That's how I started, but sadly, it appears that I didn't keep the
links. I think I started out at xml.org, but don''t quote me on that.

Nov 13 '05 #3
Not a bit actually. Users are unlikely to save their Word documents as
WML files just so you can parse them.

Nov 13 '05 #4
I reckon it depends on your word documents. If they are highly structured
then xml could help. If they are basically unstructured then xml will not
help.

The RSS schema is a useful example. I sometimes create RSS files in Word.
RSS files can be fairly weakly structured if long passages of text are
embedded between <Description></Description> tags. XML is still useful for
me because the Description tag corresponds one-to-one with a column in my
database.

Assuming your documents pass the structure test, the key thing is whether
you can control the document creation process. If you can get the authors to
create their documents in xml, parsing it is much easier and robust than
parsing regular text. You can use XML schema to enforce validity and
well-formedness; you can use types from the Xml namespace in the framework
class library; and you can use xslt to transform from one format to another.


"Neil" <no****@nospam.net> wrote in message
news:fU*************@newsread1.news.pas.earthlink. net...
An article at http://news.com.com/2100-1012-991694.html?tag=fd_top states:
"XML [in Office 2003] would allow easier interchange of data generated in
Office documents with back-end systems or existing Web services."

As part of an Access 2000 application, I have to continually parse Word
documents and store the parsings in Access tables using Automation to
control Word and parse the document. Is there a way that XML would help
with that?

Thanks.

Nov 13 '05 #5
I don't quite agree. I get the impression that the docx format will make
parsing of unstructured documents easier, if only by making it easier to
bring a heavy-duty regex engine to bear. That said, "easier" may just
mean the difference between impossible and not-quite-so-impossible<g>.

On Wed, 29 Jun 2005 19:44:06 +0100, "Richard P" <or**@community.nospam>
wrote:
I reckon it depends on your word documents. If they are highly structured
then xml could help. If they are basically unstructured then xml will not
help.

The RSS schema is a useful example. I sometimes create RSS files in Word.
RSS files can be fairly weakly structured if long passages of text are
embedded between <Description></Description> tags. XML is still useful for
me because the Description tag corresponds one-to-one with a column in my
database.

Assuming your documents pass the structure test, the key thing is whether
you can control the document creation process. If you can get the authors to
create their documents in xml, parsing it is much easier and robust than
parsing regular text. You can use XML schema to enforce validity and
well-formedness; you can use types from the Xml namespace in the framework
class library; and you can use xslt to transform from one format to another.


"Neil" <no****@nospam.net> wrote in message
news:fU*************@newsread1.news.pas.earthlink .net...
An article at http://news.com.com/2100-1012-991694.html?tag=fd_top states:
"XML [in Office 2003] would allow easier interchange of data generated in
Office documents with back-end systems or existing Web services."

As part of an Access 2000 application, I have to continually parse Word
documents and store the parsings in Access tables using Automation to
control Word and parse the document. Is there a way that XML would help
with that?

Thanks.


--
John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
Nov 13 '05 #6

I have to agree with you John. But then again, as the original poster
mentioned, it won't help him out a bit in his application since the
user's probably won't be saving their documents as WML files. And
unless they're using a version above Word 2000, they won't be saving
WML documents at all!

So, what's the point? Everyone needs to upgrade? <sigh> And what if
they _are_ using Word 2003? How's that going to help Neil out in
getting the data into an Access table?
John Nurick wrote:
I don't quite agree. I get the impression that the docx format will make
parsing of unstructured documents easier, if only by making it easier to
bring a heavy-duty regex engine to bear. That said, "easier" may just
mean the difference between impossible and not-quite-so-impossible<g>.

On Wed, 29 Jun 2005 19:44:06 +0100, "Richard P" <or**@community.nospam>
wrote:
I reckon it depends on your word documents. If they are highly structured
then xml could help. If they are basically unstructured then xml will not
help.
The RSS schema is a useful example. I sometimes create RSS files in Word.
RSS files can be fairly weakly structured if long passages of text are
embedded between <Description></Description> tags. XML is still useful for
me because the Description tag corresponds one-to-one with a column in my
database.
Assuming your documents pass the structure test, the key thing is whether
you can control the document creation process. If you can get the authors to
create their documents in xml, parsing it is much easier and robust than
parsing regular text. You can use XML schema to enforce validity and
well-formedness; you can use types from the Xml namespace in the framework
class library; and you can use xslt to transform from one format to another. "Neil" <no****@nospam.net> wrote in message
news:fU*************@newsread1.news.pas.earthlink .net...
An article at http://news.com.com/2100-1012-991694.html?tag=fd_top states:
"XML [in Office 2003] would allow easier interchange of data generated in
Office documents with back-end systems or existing Web services."
As part of an Access 2000 application, I have to continually parse Word
documents and store the parsings in Access tables using Automation to
control Word and parse the document. Is there a way that XML would help
with that?


Nov 13 '05 #7
I'm a bit confused by WML in this context -Wireless Markup Language??

I think the important point here is that Word 2003 can save 2 types of XML:
there is the default specialized sort (docx??) which looks incomprehensible
if you view it in NotePad and thus difficult to parse, and then there is the
nice and simple, highly parsable standard sort when you create the document
with an imported schema template and click the 'save data only option' when
you save.

If Neil controls the doc creation process, the data is structured and
everyone uses Word 2003, he can make it equally easy for users to create
standard XML files as native Word format docs. XML data and relational data
are interchangeable (though I'm not too familiar with the specific
capabilities of Access).

If none of the tests succeed he has to stick with old fashioned regex
parsing of regular text. The specialized form of XML cannot help because the
markup is document smart not content smart.

-richard

"Chuck Grimsby" <c.*******@worldnet.att.net> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...

I have to agree with you John. But then again, as the original poster
mentioned, it won't help him out a bit in his application since the
user's probably won't be saving their documents as WML files. And
unless they're using a version above Word 2000, they won't be saving
WML documents at all!

So, what's the point? Everyone needs to upgrade? <sigh> And what if
they _are_ using Word 2003? How's that going to help Neil out in
getting the data into an Access table?
John Nurick wrote:
I don't quite agree. I get the impression that the docx format will make
parsing of unstructured documents easier, if only by making it easier to
bring a heavy-duty regex engine to bear. That said, "easier" may just
mean the difference between impossible and not-quite-so-impossible<g>.

On Wed, 29 Jun 2005 19:44:06 +0100, "Richard P" <or**@community.nospam>
wrote:
>I reckon it depends on your word documents. If they are highly
>structured
>then xml could help. If they are basically unstructured then xml will
>not
>help.
>The RSS schema is a useful example. I sometimes create RSS files in
>Word.
>RSS files can be fairly weakly structured if long passages of text are
>embedded between <Description></Description> tags. XML is still useful
>for
>me because the Description tag corresponds one-to-one with a column in
>my
>database.
>Assuming your documents pass the structure test, the key thing is
>whether
>you can control the document creation process. If you can get the
>authors to
>create their documents in xml, parsing it is much easier and robust than
>parsing regular text. You can use XML schema to enforce validity and
>well-formedness; you can use types from the Xml namespace in the
>framework
>class library; and you can use xslt to transform from one format to
>another. >"Neil" <no****@nospam.net> wrote in message
>news:fU*************@newsread1.news.pas.earthlink .net...
>> An article at http://news.com.com/2100-1012-991694.html?tag=fd_top
>> states:
>> "XML [in Office 2003] would allow easier interchange of data generated
>> in
>> Office documents with back-end systems or existing Web services."
>> As part of an Access 2000 application, I have to continually parse
>> Word
>> documents and store the parsings in Access tables using Automation to
>> control Word and parse the document. Is there a way that XML would
>> help
>> with that?

Nov 13 '05 #8

WML = "Word processing eXtensible Markup" or Word processing XML.

By the way, you don't need a "template" to create a WML file. You need
a DTD, but (just as with HTML) there is a default set, and there's the
one from Microsoft that is automatically referenced on save, just as
when you save a word document as HTML.
Richard P wrote:
I'm a bit confused by WML in this context -Wireless Markup Language??

I think the important point here is that Word 2003 can save 2 types of XML:
there is the default specialized sort (docx??) which looks incomprehensible
if you view it in NotePad and thus difficult to parse, and then there is the
nice and simple, highly parsable standard sort when you create the document
with an imported schema template and click the 'save data only option' when
you save.

If Neil controls the doc creation process, the data is structured and
everyone uses Word 2003, he can make it equally easy for users to create
standard XML files as native Word format docs. XML data and relational data
are interchangeable (though I'm not too familiar with the specific
capabilities of Access).

If none of the tests succeed he has to stick with old fashioned regex
parsing of regular text. The specialized form of XML cannot help because the
markup is document smart not content smart.

-richard

"Chuck Grimsby" <c.*******@worldnet.att.net> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...

I have to agree with you John. But then again, as the original poster
mentioned, it won't help him out a bit in his application since the
user's probably won't be saving their documents as WML files. And
unless they're using a version above Word 2000, they won't be saving
WML documents at all!

So, what's the point? Everyone needs to upgrade? <sigh> And what if
they _are_ using Word 2003? How's that going to help Neil out in
getting the data into an Access table?
John Nurick wrote:
I don't quite agree. I get the impression that the docx format will make
parsing of unstructured documents easier, if only by making it easier to
bring a heavy-duty regex engine to bear. That said, "easier" may just
mean the difference between impossible and not-quite-so-impossible<g>.

On Wed, 29 Jun 2005 19:44:06 +0100, "Richard P" <or**@community.nospam>
wrote:
>I reckon it depends on your word documents. If they are highly
>structured
>then xml could help. If they are basically unstructured then xml will
>not
>help.
>The RSS schema is a useful example. I sometimes create RSS files in
>Word.
>RSS files can be fairly weakly structured if long passages of text are
>embedded between <Description></Description> tags. XML is still useful
>for
>me because the Description tag corresponds one-to-one with a column in
>my
>database.
>Assuming your documents pass the structure test, the key thing is
>whether
>you can control the document creation process. If you can get the
>authors to
>create their documents in xml, parsing it is much easier and robust than
>parsing regular text. You can use XML schema to enforce validity and
>well-formedness; you can use types from the Xml namespace in the
>framework
>class library; and you can use xslt to transform from one format to
>another.

>"Neil" <no****@nospam.net> wrote in message
>news:fU*************@newsread1.news.pas.earthlink .net...
>> An article at http://news.com.com/2100-1012-991694.html?tag=fd_top
>> states:
>> "XML [in Office 2003] would allow easier interchange of data generated
>> in
>> Office documents with back-end systems or existing Web services."
>> As part of an Access 2000 application, I have to continually parse
>> Word
>> documents and store the parsings in Access tables using Automation to
>> control Word and parse the document. Is there a way that XML would
>> help
>> with that?


Nov 13 '05 #9
Please excuse my clumsy terminology, I'm not familiar with the Word SDK. I
think what I described as the 'native' sort of XML is called
WordProcessingML in the SDK which I guess in another term for "Word
processing eXtensible Markup".

Thinking about it a bit more, I think WPML does help the slicing and dicing
process a bit when document content is semantically unstructured. It gives
you the option of using XML tools and techniques for intelligent parsing:
you are not compelled to use the Word SDK to hunt for patterns in document
elements such as paragraphs and formatting that provide the clues for the
existence of interesting data fragments.

I have suddenly discovered that I have a similar business need to the
original post. Its time I boned up on the Word SDK / WPML.

- Richard

"Chuck Grimsby" <c.*******@worldnet.att.net> wrote in message
news:11**********************@g47g2000cwa.googlegr oups.com...

WML = or Word processing XML.

By the way, you don't need a "template" to create a WML file. You need
a DTD, but (just as with HTML) there is a default set, and there's the
one from Microsoft that is automatically referenced on save, just as
when you save a word document as HTML.
Richard P wrote:
I'm a bit confused by WML in this context -Wireless Markup Language??

I think the important point here is that Word 2003 can save 2 types of
XML:
there is the default specialized sort (docx??) which looks
incomprehensible
if you view it in NotePad and thus difficult to parse, and then there is
the
nice and simple, highly parsable standard sort when you create the
document
with an imported schema template and click the 'save data only option'
when
you save.

If Neil controls the doc creation process, the data is structured and
everyone uses Word 2003, he can make it equally easy for users to create
standard XML files as native Word format docs. XML data and relational
data
are interchangeable (though I'm not too familiar with the specific
capabilities of Access).

If none of the tests succeed he has to stick with old fashioned regex
parsing of regular text. The specialized form of XML cannot help because
the
markup is document smart not content smart.

-richard

"Chuck Grimsby" <c.*******@worldnet.att.net> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...
>
> I have to agree with you John. But then again, as the original poster
> mentioned, it won't help him out a bit in his application since the
> user's probably won't be saving their documents as WML files. And
> unless they're using a version above Word 2000, they won't be saving
> WML documents at all!
>
> So, what's the point? Everyone needs to upgrade? <sigh> And what if
> they _are_ using Word 2003? How's that going to help Neil out in
> getting the data into an Access table?
>
>
> John Nurick wrote:
>> I don't quite agree. I get the impression that the docx format will
>> make
>> parsing of unstructured documents easier, if only by making it easier
>> to
>> bring a heavy-duty regex engine to bear. That said, "easier" may just
>> mean the difference between impossible and not-quite-so-impossible<g>.
>>
>> On Wed, 29 Jun 2005 19:44:06 +0100, "Richard P"
>> <or**@community.nospam>
>> wrote:
>> >I reckon it depends on your word documents. If they are highly
>> >structured
>> >then xml could help. If they are basically unstructured then xml will
>> >not
>> >help.
>> >The RSS schema is a useful example. I sometimes create RSS files in
>> >Word.
>> >RSS files can be fairly weakly structured if long passages of text
>> >are
>> >embedded between <Description></Description> tags. XML is still
>> >useful
>> >for
>> >me because the Description tag corresponds one-to-one with a column
>> >in
>> >my
>> >database.
>> >Assuming your documents pass the structure test, the key thing is
>> >whether
>> >you can control the document creation process. If you can get the
>> >authors to
>> >create their documents in xml, parsing it is much easier and robust
>> >than
>> >parsing regular text. You can use XML schema to enforce validity and
>> >well-formedness; you can use types from the Xml namespace in the
>> >framework
>> >class library; and you can use xslt to transform from one format to
>> >another.
>
>> >"Neil" <no****@nospam.net> wrote in message
>> >news:fU*************@newsread1.news.pas.earthlink .net...
>> >> An article at http://news.com.com/2100-1012-991694.html?tag=fd_top
>> >> states:
>> >> "XML [in Office 2003] would allow easier interchange of data
>> >> generated
>> >> in
>> >> Office documents with back-end systems or existing Web services."
>> >> As part of an Access 2000 application, I have to continually parse
>> >> Word
>> >> documents and store the parsings in Access tables using Automation
>> >> to
>> >> control Word and parse the document. Is there a way that XML would
>> >> help
>> >> with that?
>

Nov 13 '05 #10
<eR**************@TK2MSFTNGP12.phx.gbl>
<dr********************************@4ax.com>
<11*********************@g47g2000cwa.googlegroups. com>

<u#**************@TK2MSFTNGP12.phx.gbl>
<11**********************@g47g2000cwa.googlegroups .com>
Newsgroups: comp.databases.ms-access,microsoft.public.access.externaldata,micros oft.public.access.interopoledde,microsoft.public.o ffice.developer.automation,microsoft.public.office .developer.officedev.other
NNTP-Posting-Host: 219.195.76.83.cust.bluewin.ch 83.76.195.219
Path: number1.nntp.dca.giganews.com!border1.nntp.dca.gig anews.com!nntp.giganews.com!news.maxwell.syr.edu!m srn-out!msrtrans!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP09.phx .gbl
Lines: 1
Xref: number1.nntp.dca.giganews.com comp.databases.ms-access:826977 microsoft.public.access.externaldata:60553 microsoft.public.access.interopoledde:10133 microsoft.public.office.developer.automation:10073 microsoft.public.office.developer.officedev.other: 4275

Hi Chuck,

Actually, you don't need a DTD, you need a schema...
By the way, you don't need a "template" to create a WML file. You need
a DTD, but (just as with HTML) there is a default set, and there's the
one from Microsoft that is automatically referenced on save, just as
when you save a word document as HTML.


Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :-)

Nov 13 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: sudha | last post by:
Hi I need to write a c# program which has to merge selected word documents into one word document. Kindly advise the efficient way to achieve this. Thanks in advance sudha
41
by: Ruby Tuesday | last post by:
Hi, I was wondering if expert can give me some lite to convert my word table into access database. Note: within each cell of my word table(s), some has multi-line data in it. In addition, there...
2
by: maheshwari | last post by:
Hi Can you please help with some info on how to open existing word documents / merge two word documents from a C# program ? My user will select 2 or many word docs and i have to create a c# exe...
1
by: ajk | last post by:
. Hi, All: I know how to insert files into a Word doc using C#. However, the program I've written to do this runs much too slowly. The "myObj".Application.Selection.InsertFile method executes...
0
by: kris | last post by:
hi can any one help me out, i have written a code for Word Indexing using Dll's i think this is an incomplete code for WORD INDEX. I had encountered this error "Error! No index entries found"...
3
by: gabe | last post by:
I have a directory of word docs (maybe 2 or 3 hundred), all of the docs are based on a custom template and they have 20 or so fields, I would like to read all of the fields into a db. Any help...
6
by: Eric Layman | last post by:
Hi, I have fields from textareas. With a click of a button, php is able to grab these fields and by using header(), convert the output to Ms Word doc. But the outcome of the word doc...
1
by: webgirl | last post by:
Hi everyone, I have a weird problem with some Word/Excel automation code that I run from Access (not sure if I should therefore post this in the Access forum..? Thought I'd try here first) ...
2
by: KYG | last post by:
Hi , I'm trying to design and build a web app (my first C++ app) using the WT C++ web toolkit. Been looking for a way to read an MS Word doc into a 'stream?' and manipulate (search, copy/ delete...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.