By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,644 Members | 1,866 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,644 IT Pros & Developers. It's quick & easy.

XML and Word Docs

P: n/a
An article at http://news.com.com/2100-1012-991694.html?tag=fd_top states:
"XML [in Office 2003] would allow easier interchange of data generated in
Office documents with back-end systems or existing Web services."

As part of an Access 2000 application, I have to continually parse Word
documents and store the parsings in Access tables using Automation to
control Word and parse the document. Is there a way that XML would help with
that?

Thanks.
Nov 13 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
Check out the articals on wml, which is the type of xml that word uses.
(wml = Word processing eXended mark-up Language, for what's it's
worth.)

I've played a bit with WML and Access, and I can't say as it's any
better (or worse) then writting or reading HTML or any other form of
XML, but it's interesting (and a bit of fun) to write a Word document
without having to use RTF or having word installed anywhere. In
addition, WML is a *heck* of a lot easier to work with then RTF!

Google "WML" and you'll probably find a bunch of information on it.
That's how I started, but sadly, it appears that I didn't keep the
links. I think I started out at xml.org, but don''t quote me on that.

Nov 13 '05 #2

P: n/a
Thanks, Chuck. Probably wouldn't help too much with my parsing issue,
though, right?

"Chuck Grimsby" <c.*******@worldnet.att.net> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Check out the articals on wml, which is the type of xml that word uses.
(wml = Word processing eXended mark-up Language, for what's it's
worth.)

I've played a bit with WML and Access, and I can't say as it's any
better (or worse) then writting or reading HTML or any other form of
XML, but it's interesting (and a bit of fun) to write a Word document
without having to use RTF or having word installed anywhere. In
addition, WML is a *heck* of a lot easier to work with then RTF!

Google "WML" and you'll probably find a bunch of information on it.
That's how I started, but sadly, it appears that I didn't keep the
links. I think I started out at xml.org, but don''t quote me on that.

Nov 13 '05 #3

P: n/a
Not a bit actually. Users are unlikely to save their Word documents as
WML files just so you can parse them.

Nov 13 '05 #4

P: n/a
I reckon it depends on your word documents. If they are highly structured
then xml could help. If they are basically unstructured then xml will not
help.

The RSS schema is a useful example. I sometimes create RSS files in Word.
RSS files can be fairly weakly structured if long passages of text are
embedded between <Description></Description> tags. XML is still useful for
me because the Description tag corresponds one-to-one with a column in my
database.

Assuming your documents pass the structure test, the key thing is whether
you can control the document creation process. If you can get the authors to
create their documents in xml, parsing it is much easier and robust than
parsing regular text. You can use XML schema to enforce validity and
well-formedness; you can use types from the Xml namespace in the framework
class library; and you can use xslt to transform from one format to another.


"Neil" <no****@nospam.net> wrote in message
news:fU*************@newsread1.news.pas.earthlink. net...
An article at http://news.com.com/2100-1012-991694.html?tag=fd_top states:
"XML [in Office 2003] would allow easier interchange of data generated in
Office documents with back-end systems or existing Web services."

As part of an Access 2000 application, I have to continually parse Word
documents and store the parsings in Access tables using Automation to
control Word and parse the document. Is there a way that XML would help
with that?

Thanks.

Nov 13 '05 #5

P: n/a
I don't quite agree. I get the impression that the docx format will make
parsing of unstructured documents easier, if only by making it easier to
bring a heavy-duty regex engine to bear. That said, "easier" may just
mean the difference between impossible and not-quite-so-impossible<g>.

On Wed, 29 Jun 2005 19:44:06 +0100, "Richard P" <or**@community.nospam>
wrote:
I reckon it depends on your word documents. If they are highly structured
then xml could help. If they are basically unstructured then xml will not
help.

The RSS schema is a useful example. I sometimes create RSS files in Word.
RSS files can be fairly weakly structured if long passages of text are
embedded between <Description></Description> tags. XML is still useful for
me because the Description tag corresponds one-to-one with a column in my
database.

Assuming your documents pass the structure test, the key thing is whether
you can control the document creation process. If you can get the authors to
create their documents in xml, parsing it is much easier and robust than
parsing regular text. You can use XML schema to enforce validity and
well-formedness; you can use types from the Xml namespace in the framework
class library; and you can use xslt to transform from one format to another.


"Neil" <no****@nospam.net> wrote in message
news:fU*************@newsread1.news.pas.earthlink .net...
An article at http://news.com.com/2100-1012-991694.html?tag=fd_top states:
"XML [in Office 2003] would allow easier interchange of data generated in
Office documents with back-end systems or existing Web services."

As part of an Access 2000 application, I have to continually parse Word
documents and store the parsings in Access tables using Automation to
control Word and parse the document. Is there a way that XML would help
with that?

Thanks.


--
John Nurick [Microsoft Access MVP]

Please respond in the newgroup and not by email.
Nov 13 '05 #6

P: n/a

I have to agree with you John. But then again, as the original poster
mentioned, it won't help him out a bit in his application since the
user's probably won't be saving their documents as WML files. And
unless they're using a version above Word 2000, they won't be saving
WML documents at all!

So, what's the point? Everyone needs to upgrade? <sigh> And what if
they _are_ using Word 2003? How's that going to help Neil out in
getting the data into an Access table?
John Nurick wrote:
I don't quite agree. I get the impression that the docx format will make
parsing of unstructured documents easier, if only by making it easier to
bring a heavy-duty regex engine to bear. That said, "easier" may just
mean the difference between impossible and not-quite-so-impossible<g>.

On Wed, 29 Jun 2005 19:44:06 +0100, "Richard P" <or**@community.nospam>
wrote:
I reckon it depends on your word documents. If they are highly structured
then xml could help. If they are basically unstructured then xml will not
help.
The RSS schema is a useful example. I sometimes create RSS files in Word.
RSS files can be fairly weakly structured if long passages of text are
embedded between <Description></Description> tags. XML is still useful for
me because the Description tag corresponds one-to-one with a column in my
database.
Assuming your documents pass the structure test, the key thing is whether
you can control the document creation process. If you can get the authors to
create their documents in xml, parsing it is much easier and robust than
parsing regular text. You can use XML schema to enforce validity and
well-formedness; you can use types from the Xml namespace in the framework
class library; and you can use xslt to transform from one format to another. "Neil" <no****@nospam.net> wrote in message
news:fU*************@newsread1.news.pas.earthlink .net...
An article at http://news.com.com/2100-1012-991694.html?tag=fd_top states:
"XML [in Office 2003] would allow easier interchange of data generated in
Office documents with back-end systems or existing Web services."
As part of an Access 2000 application, I have to continually parse Word
documents and store the parsings in Access tables using Automation to
control Word and parse the document. Is there a way that XML would help
with that?


Nov 13 '05 #7

P: n/a
I'm a bit confused by WML in this context -Wireless Markup Language??

I think the important point here is that Word 2003 can save 2 types of XML:
there is the default specialized sort (docx??) which looks incomprehensible
if you view it in NotePad and thus difficult to parse, and then there is the
nice and simple, highly parsable standard sort when you create the document
with an imported schema template and click the 'save data only option' when
you save.

If Neil controls the doc creation process, the data is structured and
everyone uses Word 2003, he can make it equally easy for users to create
standard XML files as native Word format docs. XML data and relational data
are interchangeable (though I'm not too familiar with the specific
capabilities of Access).

If none of the tests succeed he has to stick with old fashioned regex
parsing of regular text. The specialized form of XML cannot help because the
markup is document smart not content smart.

-richard

"Chuck Grimsby" <c.*******@worldnet.att.net> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...

I have to agree with you John. But then again, as the original poster
mentioned, it won't help him out a bit in his application since the
user's probably won't be saving their documents as WML files. And
unless they're using a version above Word 2000, they won't be saving
WML documents at all!

So, what's the point? Everyone needs to upgrade? <sigh> And what if
they _are_ using Word 2003? How's that going to help Neil out in
getting the data into an Access table?
John Nurick wrote:
I don't quite agree. I get the impression that the docx format will make
parsing of unstructured documents easier, if only by making it easier to
bring a heavy-duty regex engine to bear. That said, "easier" may just
mean the difference between impossible and not-quite-so-impossible<g>.

On Wed, 29 Jun 2005 19:44:06 +0100, "Richard P" <or**@community.nospam>
wrote:
>I reckon it depends on your word documents. If they are highly
>structured
>then xml could help. If they are basically unstructured then xml will
>not
>help.
>The RSS schema is a useful example. I sometimes create RSS files in
>Word.
>RSS files can be fairly weakly structured if long passages of text are
>embedded between <Description></Description> tags. XML is still useful
>for
>me because the Description tag corresponds one-to-one with a column in
>my
>database.
>Assuming your documents pass the structure test, the key thing is
>whether
>you can control the document creation process. If you can get the
>authors to
>create their documents in xml, parsing it is much easier and robust than
>parsing regular text. You can use XML schema to enforce validity and
>well-formedness; you can use types from the Xml namespace in the
>framework
>class library; and you can use xslt to transform from one format to
>another. >"Neil" <no****@nospam.net> wrote in message
>news:fU*************@newsread1.news.pas.earthlink .net...
>> An article at http://news.com.com/2100-1012-991694.html?tag=fd_top
>> states:
>> "XML [in Office 2003] would allow easier interchange of data generated
>> in
>> Office documents with back-end systems or existing Web services."
>> As part of an Access 2000 application, I have to continually parse
>> Word
>> documents and store the parsings in Access tables using Automation to
>> control Word and parse the document. Is there a way that XML would
>> help
>> with that?

Nov 13 '05 #8

P: n/a

WML = "Word processing eXtensible Markup" or Word processing XML.

By the way, you don't need a "template" to create a WML file. You need
a DTD, but (just as with HTML) there is a default set, and there's the
one from Microsoft that is automatically referenced on save, just as
when you save a word document as HTML.
Richard P wrote:
I'm a bit confused by WML in this context -Wireless Markup Language??

I think the important point here is that Word 2003 can save 2 types of XML:
there is the default specialized sort (docx??) which looks incomprehensible
if you view it in NotePad and thus difficult to parse, and then there is the
nice and simple, highly parsable standard sort when you create the document
with an imported schema template and click the 'save data only option' when
you save.

If Neil controls the doc creation process, the data is structured and
everyone uses Word 2003, he can make it equally easy for users to create
standard XML files as native Word format docs. XML data and relational data
are interchangeable (though I'm not too familiar with the specific
capabilities of Access).

If none of the tests succeed he has to stick with old fashioned regex
parsing of regular text. The specialized form of XML cannot help because the
markup is document smart not content smart.

-richard

"Chuck Grimsby" <c.*******@worldnet.att.net> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...

I have to agree with you John. But then again, as the original poster
mentioned, it won't help him out a bit in his application since the
user's probably won't be saving their documents as WML files. And
unless they're using a version above Word 2000, they won't be saving
WML documents at all!

So, what's the point? Everyone needs to upgrade? <sigh> And what if
they _are_ using Word 2003? How's that going to help Neil out in
getting the data into an Access table?
John Nurick wrote:
I don't quite agree. I get the impression that the docx format will make
parsing of unstructured documents easier, if only by making it easier to
bring a heavy-duty regex engine to bear. That said, "easier" may just
mean the difference between impossible and not-quite-so-impossible<g>.

On Wed, 29 Jun 2005 19:44:06 +0100, "Richard P" <or**@community.nospam>
wrote:
>I reckon it depends on your word documents. If they are highly
>structured
>then xml could help. If they are basically unstructured then xml will
>not
>help.
>The RSS schema is a useful example. I sometimes create RSS files in
>Word.
>RSS files can be fairly weakly structured if long passages of text are
>embedded between <Description></Description> tags. XML is still useful
>for
>me because the Description tag corresponds one-to-one with a column in
>my
>database.
>Assuming your documents pass the structure test, the key thing is
>whether
>you can control the document creation process. If you can get the
>authors to
>create their documents in xml, parsing it is much easier and robust than
>parsing regular text. You can use XML schema to enforce validity and
>well-formedness; you can use types from the Xml namespace in the
>framework
>class library; and you can use xslt to transform from one format to
>another.

>"Neil" <no****@nospam.net> wrote in message
>news:fU*************@newsread1.news.pas.earthlink .net...
>> An article at http://news.com.com/2100-1012-991694.html?tag=fd_top
>> states:
>> "XML [in Office 2003] would allow easier interchange of data generated
>> in
>> Office documents with back-end systems or existing Web services."
>> As part of an Access 2000 application, I have to continually parse
>> Word
>> documents and store the parsings in Access tables using Automation to
>> control Word and parse the document. Is there a way that XML would
>> help
>> with that?


Nov 13 '05 #9

P: n/a
Please excuse my clumsy terminology, I'm not familiar with the Word SDK. I
think what I described as the 'native' sort of XML is called
WordProcessingML in the SDK which I guess in another term for "Word
processing eXtensible Markup".

Thinking about it a bit more, I think WPML does help the slicing and dicing
process a bit when document content is semantically unstructured. It gives
you the option of using XML tools and techniques for intelligent parsing:
you are not compelled to use the Word SDK to hunt for patterns in document
elements such as paragraphs and formatting that provide the clues for the
existence of interesting data fragments.

I have suddenly discovered that I have a similar business need to the
original post. Its time I boned up on the Word SDK / WPML.

- Richard

"Chuck Grimsby" <c.*******@worldnet.att.net> wrote in message
news:11**********************@g47g2000cwa.googlegr oups.com...

WML = or Word processing XML.

By the way, you don't need a "template" to create a WML file. You need
a DTD, but (just as with HTML) there is a default set, and there's the
one from Microsoft that is automatically referenced on save, just as
when you save a word document as HTML.
Richard P wrote:
I'm a bit confused by WML in this context -Wireless Markup Language??

I think the important point here is that Word 2003 can save 2 types of
XML:
there is the default specialized sort (docx??) which looks
incomprehensible
if you view it in NotePad and thus difficult to parse, and then there is
the
nice and simple, highly parsable standard sort when you create the
document
with an imported schema template and click the 'save data only option'
when
you save.

If Neil controls the doc creation process, the data is structured and
everyone uses Word 2003, he can make it equally easy for users to create
standard XML files as native Word format docs. XML data and relational
data
are interchangeable (though I'm not too familiar with the specific
capabilities of Access).

If none of the tests succeed he has to stick with old fashioned regex
parsing of regular text. The specialized form of XML cannot help because
the
markup is document smart not content smart.

-richard

"Chuck Grimsby" <c.*******@worldnet.att.net> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...
>
> I have to agree with you John. But then again, as the original poster
> mentioned, it won't help him out a bit in his application since the
> user's probably won't be saving their documents as WML files. And
> unless they're using a version above Word 2000, they won't be saving
> WML documents at all!
>
> So, what's the point? Everyone needs to upgrade? <sigh> And what if
> they _are_ using Word 2003? How's that going to help Neil out in
> getting the data into an Access table?
>
>
> John Nurick wrote:
>> I don't quite agree. I get the impression that the docx format will
>> make
>> parsing of unstructured documents easier, if only by making it easier
>> to
>> bring a heavy-duty regex engine to bear. That said, "easier" may just
>> mean the difference between impossible and not-quite-so-impossible<g>.
>>
>> On Wed, 29 Jun 2005 19:44:06 +0100, "Richard P"
>> <or**@community.nospam>
>> wrote:
>> >I reckon it depends on your word documents. If they are highly
>> >structured
>> >then xml could help. If they are basically unstructured then xml will
>> >not
>> >help.
>> >The RSS schema is a useful example. I sometimes create RSS files in
>> >Word.
>> >RSS files can be fairly weakly structured if long passages of text
>> >are
>> >embedded between <Description></Description> tags. XML is still
>> >useful
>> >for
>> >me because the Description tag corresponds one-to-one with a column
>> >in
>> >my
>> >database.
>> >Assuming your documents pass the structure test, the key thing is
>> >whether
>> >you can control the document creation process. If you can get the
>> >authors to
>> >create their documents in xml, parsing it is much easier and robust
>> >than
>> >parsing regular text. You can use XML schema to enforce validity and
>> >well-formedness; you can use types from the Xml namespace in the
>> >framework
>> >class library; and you can use xslt to transform from one format to
>> >another.
>
>> >"Neil" <no****@nospam.net> wrote in message
>> >news:fU*************@newsread1.news.pas.earthlink .net...
>> >> An article at http://news.com.com/2100-1012-991694.html?tag=fd_top
>> >> states:
>> >> "XML [in Office 2003] would allow easier interchange of data
>> >> generated
>> >> in
>> >> Office documents with back-end systems or existing Web services."
>> >> As part of an Access 2000 application, I have to continually parse
>> >> Word
>> >> documents and store the parsings in Access tables using Automation
>> >> to
>> >> control Word and parse the document. Is there a way that XML would
>> >> help
>> >> with that?
>

Nov 13 '05 #10

P: n/a
<eR**************@TK2MSFTNGP12.phx.gbl>
<dr********************************@4ax.com>
<11*********************@g47g2000cwa.googlegroups. com>

<u#**************@TK2MSFTNGP12.phx.gbl>
<11**********************@g47g2000cwa.googlegroups .com>
Newsgroups: comp.databases.ms-access,microsoft.public.access.externaldata,micros oft.public.access.interopoledde,microsoft.public.o ffice.developer.automation,microsoft.public.office .developer.officedev.other
NNTP-Posting-Host: 219.195.76.83.cust.bluewin.ch 83.76.195.219
Path: number1.nntp.dca.giganews.com!border1.nntp.dca.gig anews.com!nntp.giganews.com!news.maxwell.syr.edu!m srn-out!msrtrans!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP09.phx .gbl
Lines: 1
Xref: number1.nntp.dca.giganews.com comp.databases.ms-access:826977 microsoft.public.access.externaldata:60553 microsoft.public.access.interopoledde:10133 microsoft.public.office.developer.automation:10073 microsoft.public.office.developer.officedev.other: 4275

Hi Chuck,

Actually, you don't need a DTD, you need a schema...
By the way, you don't need a "template" to create a WML file. You need
a DTD, but (just as with HTML) there is a default set, and there's the
one from Microsoft that is automatically referenced on save, just as
when you save a word document as HTML.


Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :-)

Nov 13 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.