Please excuse my clumsy terminology, I'm not familiar with the Word SDK. I
think what I described as the 'native' sort of XML is called
WordProcessingML in the SDK which I guess in another term for "Word
processing eXtensible Markup".
Thinking about it a bit more, I think WPML does help the slicing and dicing
process a bit when document content is semantically unstructured. It gives
you the option of using XML tools and techniques for intelligent parsing:
you are not compelled to use the Word SDK to hunt for patterns in document
elements such as paragraphs and formatting that provide the clues for the
existence of interesting data fragments.
I have suddenly discovered that I have a similar business need to the
original post. Its time I boned up on the Word SDK / WPML.
- Richard
"Chuck Grimsby" <c.grimsby@worldnet.att.net> wrote in message
news:1120122771.761085.243640@g47g2000cwa.googlegr oups.com...[color=blue]
>
> WML = or Word processing XML.
>
> By the way, you don't need a "template" to create a WML file. You need
> a DTD, but (just as with HTML) there is a default set, and there's the
> one from Microsoft that is automatically referenced on save, just as
> when you save a word document as HTML.
>
>
> Richard P wrote:[color=green]
>> I'm a bit confused by WML in this context -Wireless Markup Language??
>>
>> I think the important point here is that Word 2003 can save 2 types of
>> XML:
>> there is the default specialized sort (docx??) which looks
>> incomprehensible
>> if you view it in NotePad and thus difficult to parse, and then there is
>> the
>> nice and simple, highly parsable standard sort when you create the
>> document
>> with an imported schema template and click the 'save data only option'
>> when
>> you save.
>>
>> If Neil controls the doc creation process, the data is structured and
>> everyone uses Word 2003, he can make it equally easy for users to create
>> standard XML files as native Word format docs. XML data and relational
>> data
>> are interchangeable (though I'm not too familiar with the specific
>> capabilities of Access).
>>
>> If none of the tests succeed he has to stick with old fashioned regex
>> parsing of regular text. The specialized form of XML cannot help because
>> the
>> markup is document smart not content smart.
>>
>> -richard
>>
>> "Chuck Grimsby" <c.grimsby@worldnet.att.net> wrote in message
>> news:1120082181.628198.62790@g47g2000cwa.googlegro ups.com...[color=darkred]
>> >
>> > I have to agree with you John. But then again, as the original poster
>> > mentioned, it won't help him out a bit in his application since the
>> > user's probably won't be saving their documents as WML files. And
>> > unless they're using a version above Word 2000, they won't be saving
>> > WML documents at all!
>> >
>> > So, what's the point? Everyone needs to upgrade? <sigh> And what if
>> > they _are_ using Word 2003? How's that going to help Neil out in
>> > getting the data into an Access table?
>> >
>> >
>> > John Nurick wrote:
>> >> I don't quite agree. I get the impression that the docx format will
>> >> make
>> >> parsing of unstructured documents easier, if only by making it easier
>> >> to
>> >> bring a heavy-duty regex engine to bear. That said, "easier" may just
>> >> mean the difference between impossible and not-quite-so-impossible<g>.
>> >>
>> >> On Wed, 29 Jun 2005 19:44:06 +0100, "Richard P"
>> >> <orix@community.nospam>
>> >> wrote:
>> >> >I reckon it depends on your word documents. If they are highly
>> >> >structured
>> >> >then xml could help. If they are basically unstructured then xml will
>> >> >not
>> >> >help.
>> >> >The RSS schema is a useful example. I sometimes create RSS files in
>> >> >Word.
>> >> >RSS files can be fairly weakly structured if long passages of text
>> >> >are
>> >> >embedded between <Description></Description> tags. XML is still
>> >> >useful
>> >> >for
>> >> >me because the Description tag corresponds one-to-one with a column
>> >> >in
>> >> >my
>> >> >database.
>> >> >Assuming your documents pass the structure test, the key thing is
>> >> >whether
>> >> >you can control the document creation process. If you can get the
>> >> >authors to
>> >> >create their documents in xml, parsing it is much easier and robust
>> >> >than
>> >> >parsing regular text. You can use XML schema to enforce validity and
>> >> >well-formedness; you can use types from the Xml namespace in the
>> >> >framework
>> >> >class library; and you can use xslt to transform from one format to
>> >> >another.
>> >
>> >> >"Neil" <nospam@nospam.net> wrote in message
>> >> >news:fURfe.167$r7.48@newsread1.news.pas.earthlink .net...
>> >> >> An article at
http://news.com.com/2100-1012-991694.html?tag=fd_top
>> >> >> states:
>> >> >> "XML [in Office 2003] would allow easier interchange of data
>> >> >> generated
>> >> >> in
>> >> >> Office documents with back-end systems or existing Web services."
>> >> >> As part of an Access 2000 application, I have to continually parse
>> >> >> Word
>> >> >> documents and store the parsings in Access tables using Automation
>> >> >> to
>> >> >> control Word and parse the document. Is there a way that XML would
>> >> >> help
>> >> >> with that?
>> >[/color][/color]
>[/color]