472,099 Members | 2,266 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,099 software developers and data experts.

Word Parsing based on Selection.Style

Hey guys,

I currently have a 100 page word document filled with various
"articles". These articles are delimited by the Style of the text
(IE. Heading 1 for the various titles) These articles will then be
converted into HTML and saved. I want to write a parser through
vb.net that uses the word object model and was wondering how this
could be achieved? The problem i am running into is that i can not
test whether the selected text is of a certain style. My code so far
in vb.net:

Dim dc As Word.Document
Dim w As Word.Application
w = New Word.Application()

Dim arguments As [String]() = Environment.GetCommandLineArgs()
dc = w.Documents.Open("c:\static\test.doc")
w.Visible = True

w.Selection.EndKey(Word.WdUnits.wdStory)
w.Selection.HomeKey(Word.WdUnits.wdStory,
Word.WdKey.wdKeyShift)

Dim count As Integer
count = w.Selection.Range.ComputeStatistics(Word.WdStatist ic.wdStatisticLines)

Dim i As Integer

For i = 0 To 2
w.Selection.HomeKey(Word.WdUnits.wdLine)

If w.Selection.Style = "Heading 3" Then
MsgBox("heading 1")
End If

w.Selection.MoveDown(Word.WdUnits.wdLine, 1)
Next

I currently have something like this in a macro in the word document,
note this is just a prototype, saveHTML simply copies and pastes the
selected text from one document to a new document and saves that
content as an HTML file due to the fact that microsoft word is unable
to "save to html" highlighted selections:

Sub ParseDocument()
Selection.HomeKey Unit:=wdStory
Selection.EndKey Unit:=wdStory
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
count1 = Selection.Range.ComputeStatistics(Statistic:=wdSta tisticLines)

For x = 0 To count1
Selection.HomeKey Unit:=wdLine

If Selection.Style = "Heading 3" Then
saveHTML (file)
ElseIf Selection.Style = "Heading 2" Then
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
z = Selection.Text
End If

If Selection.Style = "Heading 1" Then
file = path + y
saveHTML (file)

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
y = Selection.Text

y = Replace(y, Chr(10), "")
y = Replace(y, Chr(13), "")
y = Replace(y, Chr(12), "")

y1 = y
End If

Selection.MoveDown Unit:=wdLine, count:=1
Next

file = path + y
saveHTML (file)
End Sub

Any help is appreciated, thanks guys
Steve
Nov 20 '05 #1
5 12698
Hi Steve,

The easiest way to check the styles in the document is to iterate through
each paragraph in the document, rather than each line. I believe that a
style applies to the entire paragraph, so it's shouldn't be possible for
different lines in the same paragraph to have different styles.

Also, although it's not too important, it's usually preferable to use the
Range object instead of the Selection object when using the Word object
model -- the Range object gives you greater options, and is invisible
(doesn't cause screen flicker.)

Here's some quick code that iterates through each paragraph and prints the
style name:

Sub Test()

Dim p As Paragraph

For Each p In w.ActiveDocument.Paragraphs
Debug.Writeline(p.Style & ": " & p.Range.Text)
Next p

End Sub

Does this help? If not, let me know more specifically where you're having
trouble.

"STeve" <sc***@symcor.com> wrote in message
news:53*************************@posting.google.co m...
Hey guys,

I currently have a 100 page word document filled with various
"articles". These articles are delimited by the Style of the text
(IE. Heading 1 for the various titles) These articles will then be
converted into HTML and saved. I want to write a parser through
vb.net that uses the word object model and was wondering how this
could be achieved? The problem i am running into is that i can not
test whether the selected text is of a certain style. My code so far
in vb.net:

Dim dc As Word.Document
Dim w As Word.Application
w = New Word.Application()

Dim arguments As [String]() = Environment.GetCommandLineArgs()
dc = w.Documents.Open("c:\static\test.doc")
w.Visible = True

w.Selection.EndKey(Word.WdUnits.wdStory)
w.Selection.HomeKey(Word.WdUnits.wdStory,
Word.WdKey.wdKeyShift)

Dim count As Integer
count = w.Selection.Range.ComputeStatistics(Word.WdStatist ic.wdStatisticLines)
Dim i As Integer

For i = 0 To 2
w.Selection.HomeKey(Word.WdUnits.wdLine)

If w.Selection.Style = "Heading 3" Then
MsgBox("heading 1")
End If

w.Selection.MoveDown(Word.WdUnits.wdLine, 1)
Next

I currently have something like this in a macro in the word document,
note this is just a prototype, saveHTML simply copies and pastes the
selected text from one document to a new document and saves that
content as an HTML file due to the fact that microsoft word is unable
to "save to html" highlighted selections:

Sub ParseDocument()
Selection.HomeKey Unit:=wdStory
Selection.EndKey Unit:=wdStory
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
count1 = Selection.Range.ComputeStatistics(Statistic:=wdSta tisticLines)
For x = 0 To count1
Selection.HomeKey Unit:=wdLine

If Selection.Style = "Heading 3" Then
saveHTML (file)
ElseIf Selection.Style = "Heading 2" Then
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
z = Selection.Text
End If

If Selection.Style = "Heading 1" Then
file = path + y
saveHTML (file)

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
y = Selection.Text

y = Replace(y, Chr(10), "")
y = Replace(y, Chr(13), "")
y = Replace(y, Chr(12), "")

y1 = y
End If

Selection.MoveDown Unit:=wdLine, count:=1
Next

file = path + y
saveHTML (file)
End Sub

Any help is appreciated, thanks guys
Steve

Nov 20 '05 #2
You might try this article. Although it transforms Word documents to XML
based on styles, it will be trivial to change it to output HTML instead.
Alternatively, consider outputting XML and using a stylesheet to transform
the result to HTML--the resulting XML document is reusable, and you'll find
that's a lot more flexible when you want to make changes to the HTML.

http://www.devx.com/dotnet/Article/17358
"STeve" <sc***@symcor.com> wrote in message
news:53*************************@posting.google.co m...
Hey guys,

I currently have a 100 page word document filled with various
"articles". These articles are delimited by the Style of the text
(IE. Heading 1 for the various titles) These articles will then be
converted into HTML and saved. I want to write a parser through
vb.net that uses the word object model and was wondering how this
could be achieved? The problem i am running into is that i can not
test whether the selected text is of a certain style. My code so far
in vb.net:

Dim dc As Word.Document
Dim w As Word.Application
w = New Word.Application()

Dim arguments As [String]() = Environment.GetCommandLineArgs()
dc = w.Documents.Open("c:\static\test.doc")
w.Visible = True

w.Selection.EndKey(Word.WdUnits.wdStory)
w.Selection.HomeKey(Word.WdUnits.wdStory,
Word.WdKey.wdKeyShift)

Dim count As Integer
count = w.Selection.Range.ComputeStatistics(Word.WdStatist ic.wdStatisticLines)
Dim i As Integer

For i = 0 To 2
w.Selection.HomeKey(Word.WdUnits.wdLine)

If w.Selection.Style = "Heading 3" Then
MsgBox("heading 1")
End If

w.Selection.MoveDown(Word.WdUnits.wdLine, 1)
Next

I currently have something like this in a macro in the word document,
note this is just a prototype, saveHTML simply copies and pastes the
selected text from one document to a new document and saves that
content as an HTML file due to the fact that microsoft word is unable
to "save to html" highlighted selections:

Sub ParseDocument()
Selection.HomeKey Unit:=wdStory
Selection.EndKey Unit:=wdStory
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
count1 = Selection.Range.ComputeStatistics(Statistic:=wdSta tisticLines)
For x = 0 To count1
Selection.HomeKey Unit:=wdLine

If Selection.Style = "Heading 3" Then
saveHTML (file)
ElseIf Selection.Style = "Heading 2" Then
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
z = Selection.Text
End If

If Selection.Style = "Heading 1" Then
file = path + y
saveHTML (file)

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
y = Selection.Text

y = Replace(y, Chr(10), "")
y = Replace(y, Chr(13), "")
y = Replace(y, Chr(12), "")

y1 = y
End If

Selection.MoveDown Unit:=wdLine, count:=1
Next

file = path + y
saveHTML (file)
End Sub

Any help is appreciated, thanks guys
Steve

Nov 20 '05 #3
Hey Robert,

Thanks for the quick reply and assisstance. This doesn't help me out
too much, I tried this code out but what is simplyl does is parse the
document paragraph by paragraph which doesn't necessarily work out in
my situation. For example say I have a document which looks like
this:

Heading1

para1
para2
para3

Heading3
Heading2

para1
para2

Heading1

para1
para2
para3

The parsing macro i wrote before simply parses this document line by
line looking for the various styles heading1 and heading3. So when it
comes across a new heading, i do a selection.HomeKey back to the
beginning of the document, cut that article out and paste it into a
new document and save that article as HTML. So basically for the
first article I would save as filename Heading1:

Heading1

para1
para2
para3

The next article would be (filename would be Heading3_Heading2):

Heading3
Heading2

para1
para2

and so on...

What I am thinking of doing in vb.net is using the Selection.Find
command first on "Heading 1" style. Cut that entire "article" out and
paste it into a new document. THen do another Selection.Find now on
"heading3" and then cut and paste that article into a new document
then finally save it as HTML. Is there a more efficient/elegant way
of doing this? Thanks for your time guys.

Thanks in advance
Steve

"Robert Jacobson" <rj**********************@nospam.com> wrote in message news:<Oe**************@TK2MSFTNGP11.phx.gbl>...
Hi Steve,

The easiest way to check the styles in the document is to iterate through
each paragraph in the document, rather than each line. I believe that a
style applies to the entire paragraph, so it's shouldn't be possible for
different lines in the same paragraph to have different styles.

Also, although it's not too important, it's usually preferable to use the
Range object instead of the Selection object when using the Word object
model -- the Range object gives you greater options, and is invisible
(doesn't cause screen flicker.)

Here's some quick code that iterates through each paragraph and prints the
style name:

Sub Test()

Dim p As Paragraph

For Each p In w.ActiveDocument.Paragraphs
Debug.Writeline(p.Style & ": " & p.Range.Text)
Next p

End Sub

Does this help? If not, let me know more specifically where you're having
trouble.

Nov 20 '05 #4
Well, there isn't a single best way to do this. In my experience, the Find
object is a little difficult to use. My approach would be a bit
different -- iterating through each paragraph, testing the paragraph's style
using the Style property, and using a Range object to copy a story. Here's
some pseudo code:
Dim StoryStart, StoryEnd as Integer
Dim CurrentParagraph as Word.Paragraph
Dim StoryRange as Word.Range

' Iterate through each paragraph in the document.
For each CurrentParagraph in w.ActiveDocument.Paragraphs

If CurrentParagraph.Style = (the start of a new story) then
' Define a range object for the previous story and copy it to the
clipboard
StoryEnd = NextParagraph.Start - 1
StoryRange = w.ActiveDocument.Range(CObj(StoryStart),
CObj(StoryEnd))
StoryRange.Copy
(Paste the code into a new document)
' Reset the StoryStart counter to the start of the next story
StoryStart = CurrentParagraph.Start
End If

Next CurrentParagraph
This is just air code, but hopefully gives you the idea. The key line is
the "If" test -- you need to insert some code to detect whether the
paragraph is the start of a new story.

Hope this helps.
"STeve" <sc***@symcor.com> wrote in message
news:53**************************@posting.google.c om...
Hey Robert,

Thanks for the quick reply and assisstance. This doesn't help me out
too much, I tried this code out but what is simplyl does is parse the
document paragraph by paragraph which doesn't necessarily work out in
my situation. For example say I have a document which looks like
this:

Heading1

para1
para2
para3

Heading3
Heading2

para1
para2

Heading1

para1
para2
para3

The parsing macro i wrote before simply parses this document line by
line looking for the various styles heading1 and heading3. So when it
comes across a new heading, i do a selection.HomeKey back to the
beginning of the document, cut that article out and paste it into a
new document and save that article as HTML. So basically for the
first article I would save as filename Heading1:

Heading1

para1
para2
para3

The next article would be (filename would be Heading3_Heading2):

Heading3
Heading2

para1
para2

and so on...

What I am thinking of doing in vb.net is using the Selection.Find
command first on "Heading 1" style. Cut that entire "article" out and
paste it into a new document. THen do another Selection.Find now on
"heading3" and then cut and paste that article into a new document
then finally save it as HTML. Is there a more efficient/elegant way
of doing this? Thanks for your time guys.

Thanks in advance
Steve

"Robert Jacobson" <rj**********************@nospam.com> wrote in message

news:<Oe**************@TK2MSFTNGP11.phx.gbl>...
Hi Steve,

The easiest way to check the styles in the document is to iterate through each paragraph in the document, rather than each line. I believe that a
style applies to the entire paragraph, so it's shouldn't be possible for
different lines in the same paragraph to have different styles.

Also, although it's not too important, it's usually preferable to use the Range object instead of the Selection object when using the Word object
model -- the Range object gives you greater options, and is invisible
(doesn't cause screen flicker.)

Here's some quick code that iterates through each paragraph and prints the style name:

Sub Test()

Dim p As Paragraph

For Each p In w.ActiveDocument.Paragraphs
Debug.Writeline(p.Style & ": " & p.Range.Text)
Next p

End Sub

Does this help? If not, let me know more specifically where you're having trouble.

Nov 20 '05 #5
Correction... the line

StoryEnd = NextParagraph.Start - 1

Should be

StoryEnd = CurrentParagraph.Start -1
"Robert Jacobson" <rj**********************@nospam.com> wrote in message
news:Oe**************@TK2MSFTNGP09.phx.gbl...
Well, there isn't a single best way to do this. In my experience, the Find object is a little difficult to use. My approach would be a bit
different -- iterating through each paragraph, testing the paragraph's style using the Style property, and using a Range object to copy a story. Here's some pseudo code:
Dim StoryStart, StoryEnd as Integer
Dim CurrentParagraph as Word.Paragraph
Dim StoryRange as Word.Range

' Iterate through each paragraph in the document.
For each CurrentParagraph in w.ActiveDocument.Paragraphs

If CurrentParagraph.Style = (the start of a new story) then
' Define a range object for the previous story and copy it to the
clipboard
StoryEnd = NextParagraph.Start - 1
StoryRange = w.ActiveDocument.Range(CObj(StoryStart),
CObj(StoryEnd))
StoryRange.Copy
(Paste the code into a new document)
' Reset the StoryStart counter to the start of the next story
StoryStart = CurrentParagraph.Start
End If

Next CurrentParagraph
This is just air code, but hopefully gives you the idea. The key line is
the "If" test -- you need to insert some code to detect whether the
paragraph is the start of a new story.

Hope this helps.
"STeve" <sc***@symcor.com> wrote in message
news:53**************************@posting.google.c om...
Hey Robert,

Thanks for the quick reply and assisstance. This doesn't help me out
too much, I tried this code out but what is simplyl does is parse the
document paragraph by paragraph which doesn't necessarily work out in
my situation. For example say I have a document which looks like
this:

Heading1

para1
para2
para3

Heading3
Heading2

para1
para2

Heading1

para1
para2
para3

The parsing macro i wrote before simply parses this document line by
line looking for the various styles heading1 and heading3. So when it
comes across a new heading, i do a selection.HomeKey back to the
beginning of the document, cut that article out and paste it into a
new document and save that article as HTML. So basically for the
first article I would save as filename Heading1:

Heading1

para1
para2
para3

The next article would be (filename would be Heading3_Heading2):

Heading3
Heading2

para1
para2

and so on...

What I am thinking of doing in vb.net is using the Selection.Find
command first on "Heading 1" style. Cut that entire "article" out and
paste it into a new document. THen do another Selection.Find now on
"heading3" and then cut and paste that article into a new document
then finally save it as HTML. Is there a more efficient/elegant way
of doing this? Thanks for your time guys.

Thanks in advance
Steve

"Robert Jacobson" <rj**********************@nospam.com> wrote in message

news:<Oe**************@TK2MSFTNGP11.phx.gbl>...
Hi Steve,

The easiest way to check the styles in the document is to iterate through each paragraph in the document, rather than each line. I believe that a style applies to the entire paragraph, so it's shouldn't be possible for different lines in the same paragraph to have different styles.

Also, although it's not too important, it's usually preferable to use the Range object instead of the Selection object when using the Word object model -- the Range object gives you greater options, and is invisible
(doesn't cause screen flicker.)

Here's some quick code that iterates through each paragraph and prints the style name:

Sub Test()

Dim p As Paragraph

For Each p In w.ActiveDocument.Paragraphs
Debug.Writeline(p.Style & ": " & p.Range.Text)
Next p

End Sub

Does this help? If not, let me know more specifically where you're having trouble.


Nov 20 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Raj Singh | last post: by
1 post views Thread by Arjen van der Hulst | last post: by
reply views Thread by necroph | last post: by
reply views Thread by peter | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.