Hey guys,
I currently have a 100 page word document filled with various
"articles". These articles are delimited by the Style of the text
(IE. Heading 1 for the various titles) These articles will then be
converted into HTML and saved. I want to write a parser through
vb.net that uses the word object model and was wondering how this
could be achieved? The problem i am running into is that i can not
test whether the selected text is of a certain style. My code so far
in vb.net:
Dim dc As Word.Document
Dim w As Word.Application
w = New Word.Application()
Dim arguments As [String]() = Environment.GetCommandLineArgs()
dc = w.Documents.Open("c:\static\test.doc")
w.Visible = True
w.Selection.EndKey(Word.WdUnits.wdStory)
w.Selection.HomeKey(Word.WdUnits.wdStory,
Word.WdKey.wdKeyShift)
Dim count As Integer
count = w.Selection.Range.ComputeStatistics(Word.WdStatist ic.wdStatisticLines)
Dim i As Integer
For i = 0 To 2
w.Selection.HomeKey(Word.WdUnits.wdLine)
If w.Selection.Style = "Heading 3" Then
MsgBox("heading 1")
End If
w.Selection.MoveDown(Word.WdUnits.wdLine, 1)
Next
I currently have something like this in a macro in the word document,
note this is just a prototype, saveHTML simply copies and pastes the
selected text from one document to a new document and saves that
content as an HTML file due to the fact that microsoft word is unable
to "save to html" highlighted selections:
Sub ParseDocument()
Selection.HomeKey Unit:=wdStory
Selection.EndKey Unit:=wdStory
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
count1 = Selection.Range.ComputeStatistics(Statistic:=wdSta tisticLines)
For x = 0 To count1
Selection.HomeKey Unit:=wdLine
If Selection.Style = "Heading 3" Then
saveHTML (file)
ElseIf Selection.Style = "Heading 2" Then
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
z = Selection.Text
End If
If Selection.Style = "Heading 1" Then
file = path + y
saveHTML (file)
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
y = Selection.Text
y = Replace(y, Chr(10), "")
y = Replace(y, Chr(13), "")
y = Replace(y, Chr(12), "")
y1 = y
End If
Selection.MoveDown Unit:=wdLine, count:=1
Next
file = path + y
saveHTML (file)
End Sub
Any help is appreciated, thanks guys
Steve 5 12698
Hi Steve,
The easiest way to check the styles in the document is to iterate through
each paragraph in the document, rather than each line. I believe that a
style applies to the entire paragraph, so it's shouldn't be possible for
different lines in the same paragraph to have different styles.
Also, although it's not too important, it's usually preferable to use the
Range object instead of the Selection object when using the Word object
model -- the Range object gives you greater options, and is invisible
(doesn't cause screen flicker.)
Here's some quick code that iterates through each paragraph and prints the
style name:
Sub Test()
Dim p As Paragraph
For Each p In w.ActiveDocument.Paragraphs
Debug.Writeline(p.Style & ": " & p.Range.Text)
Next p
End Sub
Does this help? If not, let me know more specifically where you're having
trouble.
"STeve" <sc***@symcor.com> wrote in message
news:53*************************@posting.google.co m... Hey guys,
I currently have a 100 page word document filled with various "articles". These articles are delimited by the Style of the text (IE. Heading 1 for the various titles) These articles will then be converted into HTML and saved. I want to write a parser through vb.net that uses the word object model and was wondering how this could be achieved? The problem i am running into is that i can not test whether the selected text is of a certain style. My code so far in vb.net:
Dim dc As Word.Document Dim w As Word.Application w = New Word.Application()
Dim arguments As [String]() = Environment.GetCommandLineArgs()
dc = w.Documents.Open("c:\static\test.doc") w.Visible = True
w.Selection.EndKey(Word.WdUnits.wdStory) w.Selection.HomeKey(Word.WdUnits.wdStory, Word.WdKey.wdKeyShift)
Dim count As Integer count =
w.Selection.Range.ComputeStatistics(Word.WdStatist ic.wdStatisticLines) Dim i As Integer
For i = 0 To 2 w.Selection.HomeKey(Word.WdUnits.wdLine)
If w.Selection.Style = "Heading 3" Then MsgBox("heading 1") End If
w.Selection.MoveDown(Word.WdUnits.wdLine, 1) Next I currently have something like this in a macro in the word document, note this is just a prototype, saveHTML simply copies and pastes the selected text from one document to a new document and saves that content as an HTML file due to the fact that microsoft word is unable to "save to html" highlighted selections:
Sub ParseDocument() Selection.HomeKey Unit:=wdStory Selection.EndKey Unit:=wdStory Selection.HomeKey Unit:=wdStory, Extend:=wdExtend count1 =
Selection.Range.ComputeStatistics(Statistic:=wdSta tisticLines) For x = 0 To count1 Selection.HomeKey Unit:=wdLine
If Selection.Style = "Heading 3" Then saveHTML (file) ElseIf Selection.Style = "Heading 2" Then Selection.EndKey Unit:=wdLine, Extend:=wdExtend z = Selection.Text End If
If Selection.Style = "Heading 1" Then file = path + y saveHTML (file)
Selection.EndKey Unit:=wdLine, Extend:=wdExtend y = Selection.Text
y = Replace(y, Chr(10), "") y = Replace(y, Chr(13), "") y = Replace(y, Chr(12), "")
y1 = y End If
Selection.MoveDown Unit:=wdLine, count:=1 Next
file = path + y saveHTML (file) End Sub
Any help is appreciated, thanks guys Steve
You might try this article. Although it transforms Word documents to XML
based on styles, it will be trivial to change it to output HTML instead.
Alternatively, consider outputting XML and using a stylesheet to transform
the result to HTML--the resulting XML document is reusable, and you'll find
that's a lot more flexible when you want to make changes to the HTML. http://www.devx.com/dotnet/Article/17358
"STeve" <sc***@symcor.com> wrote in message
news:53*************************@posting.google.co m... Hey guys,
I currently have a 100 page word document filled with various "articles". These articles are delimited by the Style of the text (IE. Heading 1 for the various titles) These articles will then be converted into HTML and saved. I want to write a parser through vb.net that uses the word object model and was wondering how this could be achieved? The problem i am running into is that i can not test whether the selected text is of a certain style. My code so far in vb.net:
Dim dc As Word.Document Dim w As Word.Application w = New Word.Application()
Dim arguments As [String]() = Environment.GetCommandLineArgs()
dc = w.Documents.Open("c:\static\test.doc") w.Visible = True
w.Selection.EndKey(Word.WdUnits.wdStory) w.Selection.HomeKey(Word.WdUnits.wdStory, Word.WdKey.wdKeyShift)
Dim count As Integer count =
w.Selection.Range.ComputeStatistics(Word.WdStatist ic.wdStatisticLines) Dim i As Integer
For i = 0 To 2 w.Selection.HomeKey(Word.WdUnits.wdLine)
If w.Selection.Style = "Heading 3" Then MsgBox("heading 1") End If
w.Selection.MoveDown(Word.WdUnits.wdLine, 1) Next I currently have something like this in a macro in the word document, note this is just a prototype, saveHTML simply copies and pastes the selected text from one document to a new document and saves that content as an HTML file due to the fact that microsoft word is unable to "save to html" highlighted selections:
Sub ParseDocument() Selection.HomeKey Unit:=wdStory Selection.EndKey Unit:=wdStory Selection.HomeKey Unit:=wdStory, Extend:=wdExtend count1 =
Selection.Range.ComputeStatistics(Statistic:=wdSta tisticLines) For x = 0 To count1 Selection.HomeKey Unit:=wdLine
If Selection.Style = "Heading 3" Then saveHTML (file) ElseIf Selection.Style = "Heading 2" Then Selection.EndKey Unit:=wdLine, Extend:=wdExtend z = Selection.Text End If
If Selection.Style = "Heading 1" Then file = path + y saveHTML (file)
Selection.EndKey Unit:=wdLine, Extend:=wdExtend y = Selection.Text
y = Replace(y, Chr(10), "") y = Replace(y, Chr(13), "") y = Replace(y, Chr(12), "")
y1 = y End If
Selection.MoveDown Unit:=wdLine, count:=1 Next
file = path + y saveHTML (file) End Sub
Any help is appreciated, thanks guys Steve
Hey Robert,
Thanks for the quick reply and assisstance. This doesn't help me out
too much, I tried this code out but what is simplyl does is parse the
document paragraph by paragraph which doesn't necessarily work out in
my situation. For example say I have a document which looks like
this:
Heading1
para1
para2
para3
Heading3
Heading2
para1
para2
Heading1
para1
para2
para3
The parsing macro i wrote before simply parses this document line by
line looking for the various styles heading1 and heading3. So when it
comes across a new heading, i do a selection.HomeKey back to the
beginning of the document, cut that article out and paste it into a
new document and save that article as HTML. So basically for the
first article I would save as filename Heading1:
Heading1
para1
para2
para3
The next article would be (filename would be Heading3_Heading2):
Heading3
Heading2
para1
para2
and so on...
What I am thinking of doing in vb.net is using the Selection.Find
command first on "Heading 1" style. Cut that entire "article" out and
paste it into a new document. THen do another Selection.Find now on
"heading3" and then cut and paste that article into a new document
then finally save it as HTML. Is there a more efficient/elegant way
of doing this? Thanks for your time guys.
Thanks in advance
Steve
"Robert Jacobson" <rj**********************@nospam.com> wrote in message news:<Oe**************@TK2MSFTNGP11.phx.gbl>... Hi Steve,
The easiest way to check the styles in the document is to iterate through each paragraph in the document, rather than each line. I believe that a style applies to the entire paragraph, so it's shouldn't be possible for different lines in the same paragraph to have different styles.
Also, although it's not too important, it's usually preferable to use the Range object instead of the Selection object when using the Word object model -- the Range object gives you greater options, and is invisible (doesn't cause screen flicker.)
Here's some quick code that iterates through each paragraph and prints the style name:
Sub Test()
Dim p As Paragraph
For Each p In w.ActiveDocument.Paragraphs Debug.Writeline(p.Style & ": " & p.Range.Text) Next p
End Sub
Does this help? If not, let me know more specifically where you're having trouble.
Well, there isn't a single best way to do this. In my experience, the Find
object is a little difficult to use. My approach would be a bit
different -- iterating through each paragraph, testing the paragraph's style
using the Style property, and using a Range object to copy a story. Here's
some pseudo code:
Dim StoryStart, StoryEnd as Integer
Dim CurrentParagraph as Word.Paragraph
Dim StoryRange as Word.Range
' Iterate through each paragraph in the document.
For each CurrentParagraph in w.ActiveDocument.Paragraphs
If CurrentParagraph.Style = (the start of a new story) then
' Define a range object for the previous story and copy it to the
clipboard
StoryEnd = NextParagraph.Start - 1
StoryRange = w.ActiveDocument.Range(CObj(StoryStart),
CObj(StoryEnd))
StoryRange.Copy
(Paste the code into a new document)
' Reset the StoryStart counter to the start of the next story
StoryStart = CurrentParagraph.Start
End If
Next CurrentParagraph
This is just air code, but hopefully gives you the idea. The key line is
the "If" test -- you need to insert some code to detect whether the
paragraph is the start of a new story.
Hope this helps.
"STeve" <sc***@symcor.com> wrote in message
news:53**************************@posting.google.c om... Hey Robert,
Thanks for the quick reply and assisstance. This doesn't help me out too much, I tried this code out but what is simplyl does is parse the document paragraph by paragraph which doesn't necessarily work out in my situation. For example say I have a document which looks like this:
Heading1
para1 para2 para3
Heading3 Heading2
para1 para2
Heading1
para1 para2 para3
The parsing macro i wrote before simply parses this document line by line looking for the various styles heading1 and heading3. So when it comes across a new heading, i do a selection.HomeKey back to the beginning of the document, cut that article out and paste it into a new document and save that article as HTML. So basically for the first article I would save as filename Heading1:
Heading1
para1 para2 para3
The next article would be (filename would be Heading3_Heading2):
Heading3 Heading2
para1 para2
and so on...
What I am thinking of doing in vb.net is using the Selection.Find command first on "Heading 1" style. Cut that entire "article" out and paste it into a new document. THen do another Selection.Find now on "heading3" and then cut and paste that article into a new document then finally save it as HTML. Is there a more efficient/elegant way of doing this? Thanks for your time guys.
Thanks in advance Steve
"Robert Jacobson" <rj**********************@nospam.com> wrote in message
news:<Oe**************@TK2MSFTNGP11.phx.gbl>... Hi Steve,
The easiest way to check the styles in the document is to iterate
through each paragraph in the document, rather than each line. I believe that a style applies to the entire paragraph, so it's shouldn't be possible for different lines in the same paragraph to have different styles.
Also, although it's not too important, it's usually preferable to use
the Range object instead of the Selection object when using the Word object model -- the Range object gives you greater options, and is invisible (doesn't cause screen flicker.)
Here's some quick code that iterates through each paragraph and prints
the style name:
Sub Test()
Dim p As Paragraph
For Each p In w.ActiveDocument.Paragraphs Debug.Writeline(p.Style & ": " & p.Range.Text) Next p
End Sub
Does this help? If not, let me know more specifically where you're
having trouble.
Correction... the line
StoryEnd = NextParagraph.Start - 1
Should be
StoryEnd = CurrentParagraph.Start -1
"Robert Jacobson" <rj**********************@nospam.com> wrote in message
news:Oe**************@TK2MSFTNGP09.phx.gbl... Well, there isn't a single best way to do this. In my experience, the
Find object is a little difficult to use. My approach would be a bit different -- iterating through each paragraph, testing the paragraph's
style using the Style property, and using a Range object to copy a story.
Here's some pseudo code:
Dim StoryStart, StoryEnd as Integer Dim CurrentParagraph as Word.Paragraph Dim StoryRange as Word.Range
' Iterate through each paragraph in the document. For each CurrentParagraph in w.ActiveDocument.Paragraphs
If CurrentParagraph.Style = (the start of a new story) then ' Define a range object for the previous story and copy it to the clipboard StoryEnd = NextParagraph.Start - 1 StoryRange = w.ActiveDocument.Range(CObj(StoryStart), CObj(StoryEnd)) StoryRange.Copy (Paste the code into a new document) ' Reset the StoryStart counter to the start of the next story StoryStart = CurrentParagraph.Start End If
Next CurrentParagraph
This is just air code, but hopefully gives you the idea. The key line is the "If" test -- you need to insert some code to detect whether the paragraph is the start of a new story.
Hope this helps.
"STeve" <sc***@symcor.com> wrote in message news:53**************************@posting.google.c om... Hey Robert,
Thanks for the quick reply and assisstance. This doesn't help me out too much, I tried this code out but what is simplyl does is parse the document paragraph by paragraph which doesn't necessarily work out in my situation. For example say I have a document which looks like this:
Heading1
para1 para2 para3
Heading3 Heading2
para1 para2
Heading1
para1 para2 para3
The parsing macro i wrote before simply parses this document line by line looking for the various styles heading1 and heading3. So when it comes across a new heading, i do a selection.HomeKey back to the beginning of the document, cut that article out and paste it into a new document and save that article as HTML. So basically for the first article I would save as filename Heading1:
Heading1
para1 para2 para3
The next article would be (filename would be Heading3_Heading2):
Heading3 Heading2
para1 para2
and so on...
What I am thinking of doing in vb.net is using the Selection.Find command first on "Heading 1" style. Cut that entire "article" out and paste it into a new document. THen do another Selection.Find now on "heading3" and then cut and paste that article into a new document then finally save it as HTML. Is there a more efficient/elegant way of doing this? Thanks for your time guys.
Thanks in advance Steve
"Robert Jacobson" <rj**********************@nospam.com> wrote in message news:<Oe**************@TK2MSFTNGP11.phx.gbl>... Hi Steve,
The easiest way to check the styles in the document is to iterate through each paragraph in the document, rather than each line. I believe that
a style applies to the entire paragraph, so it's shouldn't be possible
for different lines in the same paragraph to have different styles.
Also, although it's not too important, it's usually preferable to use the Range object instead of the Selection object when using the Word
object model -- the Range object gives you greater options, and is invisible (doesn't cause screen flicker.)
Here's some quick code that iterates through each paragraph and prints the style name:
Sub Test()
Dim p As Paragraph
For Each p In w.ActiveDocument.Paragraphs Debug.Writeline(p.Style & ": " & p.Range.Text) Next p
End Sub
Does this help? If not, let me know more specifically where you're having trouble.
This discussion thread is closed Replies have been disabled for this discussion. Similar topics
reply
views
Thread by Raj Singh |
last post: by
|
1 post
views
Thread by Arjen van der Hulst |
last post: by
|
5 posts
views
Thread by Mason |
last post: by
|
reply
views
Thread by necroph |
last post: by
|
reply
views
Thread by peter |
last post: by
|
1 post
views
Thread by DZ |
last post: by
| | |
9 posts
views
Thread by DeZZar |
last post: by
| | | | | | | | | | |