473,382 Members | 1,443 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Word Parsing based on Selection.Style

Hey guys,

I currently have a 100 page word document filled with various
"articles". These articles are delimited by the Style of the text
(IE. Heading 1 for the various titles) These articles will then be
converted into HTML and saved. I want to write a parser through
vb.net that uses the word object model and was wondering how this
could be achieved? The problem i am running into is that i can not
test whether the selected text is of a certain style. My code so far
in vb.net:

Dim dc As Word.Document
Dim w As Word.Application
w = New Word.Application()

Dim arguments As [String]() = Environment.GetCommandLineArgs()
dc = w.Documents.Open("c:\static\test.doc")
w.Visible = True

w.Selection.EndKey(Word.WdUnits.wdStory)
w.Selection.HomeKey(Word.WdUnits.wdStory,
Word.WdKey.wdKeyShift)

Dim count As Integer
count = w.Selection.Range.ComputeStatistics(Word.WdStatist ic.wdStatisticLines)

Dim i As Integer

For i = 0 To 2
w.Selection.HomeKey(Word.WdUnits.wdLine)

If w.Selection.Style = "Heading 3" Then
MsgBox("heading 1")
End If

w.Selection.MoveDown(Word.WdUnits.wdLine, 1)
Next

I currently have something like this in a macro in the word document,
note this is just a prototype, saveHTML simply copies and pastes the
selected text from one document to a new document and saves that
content as an HTML file due to the fact that microsoft word is unable
to "save to html" highlighted selections:

Sub ParseDocument()
Selection.HomeKey Unit:=wdStory
Selection.EndKey Unit:=wdStory
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
count1 = Selection.Range.ComputeStatistics(Statistic:=wdSta tisticLines)

For x = 0 To count1
Selection.HomeKey Unit:=wdLine

If Selection.Style = "Heading 3" Then
saveHTML (file)
ElseIf Selection.Style = "Heading 2" Then
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
z = Selection.Text
End If

If Selection.Style = "Heading 1" Then
file = path + y
saveHTML (file)

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
y = Selection.Text

y = Replace(y, Chr(10), "")
y = Replace(y, Chr(13), "")
y = Replace(y, Chr(12), "")

y1 = y
End If

Selection.MoveDown Unit:=wdLine, count:=1
Next

file = path + y
saveHTML (file)
End Sub

Any help is appreciated, thanks guys
Steve
Nov 20 '05 #1
5 12820
Hi Steve,

The easiest way to check the styles in the document is to iterate through
each paragraph in the document, rather than each line. I believe that a
style applies to the entire paragraph, so it's shouldn't be possible for
different lines in the same paragraph to have different styles.

Also, although it's not too important, it's usually preferable to use the
Range object instead of the Selection object when using the Word object
model -- the Range object gives you greater options, and is invisible
(doesn't cause screen flicker.)

Here's some quick code that iterates through each paragraph and prints the
style name:

Sub Test()

Dim p As Paragraph

For Each p In w.ActiveDocument.Paragraphs
Debug.Writeline(p.Style & ": " & p.Range.Text)
Next p

End Sub

Does this help? If not, let me know more specifically where you're having
trouble.

"STeve" <sc***@symcor.com> wrote in message
news:53*************************@posting.google.co m...
Hey guys,

I currently have a 100 page word document filled with various
"articles". These articles are delimited by the Style of the text
(IE. Heading 1 for the various titles) These articles will then be
converted into HTML and saved. I want to write a parser through
vb.net that uses the word object model and was wondering how this
could be achieved? The problem i am running into is that i can not
test whether the selected text is of a certain style. My code so far
in vb.net:

Dim dc As Word.Document
Dim w As Word.Application
w = New Word.Application()

Dim arguments As [String]() = Environment.GetCommandLineArgs()
dc = w.Documents.Open("c:\static\test.doc")
w.Visible = True

w.Selection.EndKey(Word.WdUnits.wdStory)
w.Selection.HomeKey(Word.WdUnits.wdStory,
Word.WdKey.wdKeyShift)

Dim count As Integer
count = w.Selection.Range.ComputeStatistics(Word.WdStatist ic.wdStatisticLines)
Dim i As Integer

For i = 0 To 2
w.Selection.HomeKey(Word.WdUnits.wdLine)

If w.Selection.Style = "Heading 3" Then
MsgBox("heading 1")
End If

w.Selection.MoveDown(Word.WdUnits.wdLine, 1)
Next

I currently have something like this in a macro in the word document,
note this is just a prototype, saveHTML simply copies and pastes the
selected text from one document to a new document and saves that
content as an HTML file due to the fact that microsoft word is unable
to "save to html" highlighted selections:

Sub ParseDocument()
Selection.HomeKey Unit:=wdStory
Selection.EndKey Unit:=wdStory
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
count1 = Selection.Range.ComputeStatistics(Statistic:=wdSta tisticLines)
For x = 0 To count1
Selection.HomeKey Unit:=wdLine

If Selection.Style = "Heading 3" Then
saveHTML (file)
ElseIf Selection.Style = "Heading 2" Then
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
z = Selection.Text
End If

If Selection.Style = "Heading 1" Then
file = path + y
saveHTML (file)

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
y = Selection.Text

y = Replace(y, Chr(10), "")
y = Replace(y, Chr(13), "")
y = Replace(y, Chr(12), "")

y1 = y
End If

Selection.MoveDown Unit:=wdLine, count:=1
Next

file = path + y
saveHTML (file)
End Sub

Any help is appreciated, thanks guys
Steve

Nov 20 '05 #2
You might try this article. Although it transforms Word documents to XML
based on styles, it will be trivial to change it to output HTML instead.
Alternatively, consider outputting XML and using a stylesheet to transform
the result to HTML--the resulting XML document is reusable, and you'll find
that's a lot more flexible when you want to make changes to the HTML.

http://www.devx.com/dotnet/Article/17358
"STeve" <sc***@symcor.com> wrote in message
news:53*************************@posting.google.co m...
Hey guys,

I currently have a 100 page word document filled with various
"articles". These articles are delimited by the Style of the text
(IE. Heading 1 for the various titles) These articles will then be
converted into HTML and saved. I want to write a parser through
vb.net that uses the word object model and was wondering how this
could be achieved? The problem i am running into is that i can not
test whether the selected text is of a certain style. My code so far
in vb.net:

Dim dc As Word.Document
Dim w As Word.Application
w = New Word.Application()

Dim arguments As [String]() = Environment.GetCommandLineArgs()
dc = w.Documents.Open("c:\static\test.doc")
w.Visible = True

w.Selection.EndKey(Word.WdUnits.wdStory)
w.Selection.HomeKey(Word.WdUnits.wdStory,
Word.WdKey.wdKeyShift)

Dim count As Integer
count = w.Selection.Range.ComputeStatistics(Word.WdStatist ic.wdStatisticLines)
Dim i As Integer

For i = 0 To 2
w.Selection.HomeKey(Word.WdUnits.wdLine)

If w.Selection.Style = "Heading 3" Then
MsgBox("heading 1")
End If

w.Selection.MoveDown(Word.WdUnits.wdLine, 1)
Next

I currently have something like this in a macro in the word document,
note this is just a prototype, saveHTML simply copies and pastes the
selected text from one document to a new document and saves that
content as an HTML file due to the fact that microsoft word is unable
to "save to html" highlighted selections:

Sub ParseDocument()
Selection.HomeKey Unit:=wdStory
Selection.EndKey Unit:=wdStory
Selection.HomeKey Unit:=wdStory, Extend:=wdExtend
count1 = Selection.Range.ComputeStatistics(Statistic:=wdSta tisticLines)
For x = 0 To count1
Selection.HomeKey Unit:=wdLine

If Selection.Style = "Heading 3" Then
saveHTML (file)
ElseIf Selection.Style = "Heading 2" Then
Selection.EndKey Unit:=wdLine, Extend:=wdExtend
z = Selection.Text
End If

If Selection.Style = "Heading 1" Then
file = path + y
saveHTML (file)

Selection.EndKey Unit:=wdLine, Extend:=wdExtend
y = Selection.Text

y = Replace(y, Chr(10), "")
y = Replace(y, Chr(13), "")
y = Replace(y, Chr(12), "")

y1 = y
End If

Selection.MoveDown Unit:=wdLine, count:=1
Next

file = path + y
saveHTML (file)
End Sub

Any help is appreciated, thanks guys
Steve

Nov 20 '05 #3
Hey Robert,

Thanks for the quick reply and assisstance. This doesn't help me out
too much, I tried this code out but what is simplyl does is parse the
document paragraph by paragraph which doesn't necessarily work out in
my situation. For example say I have a document which looks like
this:

Heading1

para1
para2
para3

Heading3
Heading2

para1
para2

Heading1

para1
para2
para3

The parsing macro i wrote before simply parses this document line by
line looking for the various styles heading1 and heading3. So when it
comes across a new heading, i do a selection.HomeKey back to the
beginning of the document, cut that article out and paste it into a
new document and save that article as HTML. So basically for the
first article I would save as filename Heading1:

Heading1

para1
para2
para3

The next article would be (filename would be Heading3_Heading2):

Heading3
Heading2

para1
para2

and so on...

What I am thinking of doing in vb.net is using the Selection.Find
command first on "Heading 1" style. Cut that entire "article" out and
paste it into a new document. THen do another Selection.Find now on
"heading3" and then cut and paste that article into a new document
then finally save it as HTML. Is there a more efficient/elegant way
of doing this? Thanks for your time guys.

Thanks in advance
Steve

"Robert Jacobson" <rj**********************@nospam.com> wrote in message news:<Oe**************@TK2MSFTNGP11.phx.gbl>...
Hi Steve,

The easiest way to check the styles in the document is to iterate through
each paragraph in the document, rather than each line. I believe that a
style applies to the entire paragraph, so it's shouldn't be possible for
different lines in the same paragraph to have different styles.

Also, although it's not too important, it's usually preferable to use the
Range object instead of the Selection object when using the Word object
model -- the Range object gives you greater options, and is invisible
(doesn't cause screen flicker.)

Here's some quick code that iterates through each paragraph and prints the
style name:

Sub Test()

Dim p As Paragraph

For Each p In w.ActiveDocument.Paragraphs
Debug.Writeline(p.Style & ": " & p.Range.Text)
Next p

End Sub

Does this help? If not, let me know more specifically where you're having
trouble.

Nov 20 '05 #4
Well, there isn't a single best way to do this. In my experience, the Find
object is a little difficult to use. My approach would be a bit
different -- iterating through each paragraph, testing the paragraph's style
using the Style property, and using a Range object to copy a story. Here's
some pseudo code:
Dim StoryStart, StoryEnd as Integer
Dim CurrentParagraph as Word.Paragraph
Dim StoryRange as Word.Range

' Iterate through each paragraph in the document.
For each CurrentParagraph in w.ActiveDocument.Paragraphs

If CurrentParagraph.Style = (the start of a new story) then
' Define a range object for the previous story and copy it to the
clipboard
StoryEnd = NextParagraph.Start - 1
StoryRange = w.ActiveDocument.Range(CObj(StoryStart),
CObj(StoryEnd))
StoryRange.Copy
(Paste the code into a new document)
' Reset the StoryStart counter to the start of the next story
StoryStart = CurrentParagraph.Start
End If

Next CurrentParagraph
This is just air code, but hopefully gives you the idea. The key line is
the "If" test -- you need to insert some code to detect whether the
paragraph is the start of a new story.

Hope this helps.
"STeve" <sc***@symcor.com> wrote in message
news:53**************************@posting.google.c om...
Hey Robert,

Thanks for the quick reply and assisstance. This doesn't help me out
too much, I tried this code out but what is simplyl does is parse the
document paragraph by paragraph which doesn't necessarily work out in
my situation. For example say I have a document which looks like
this:

Heading1

para1
para2
para3

Heading3
Heading2

para1
para2

Heading1

para1
para2
para3

The parsing macro i wrote before simply parses this document line by
line looking for the various styles heading1 and heading3. So when it
comes across a new heading, i do a selection.HomeKey back to the
beginning of the document, cut that article out and paste it into a
new document and save that article as HTML. So basically for the
first article I would save as filename Heading1:

Heading1

para1
para2
para3

The next article would be (filename would be Heading3_Heading2):

Heading3
Heading2

para1
para2

and so on...

What I am thinking of doing in vb.net is using the Selection.Find
command first on "Heading 1" style. Cut that entire "article" out and
paste it into a new document. THen do another Selection.Find now on
"heading3" and then cut and paste that article into a new document
then finally save it as HTML. Is there a more efficient/elegant way
of doing this? Thanks for your time guys.

Thanks in advance
Steve

"Robert Jacobson" <rj**********************@nospam.com> wrote in message

news:<Oe**************@TK2MSFTNGP11.phx.gbl>...
Hi Steve,

The easiest way to check the styles in the document is to iterate through each paragraph in the document, rather than each line. I believe that a
style applies to the entire paragraph, so it's shouldn't be possible for
different lines in the same paragraph to have different styles.

Also, although it's not too important, it's usually preferable to use the Range object instead of the Selection object when using the Word object
model -- the Range object gives you greater options, and is invisible
(doesn't cause screen flicker.)

Here's some quick code that iterates through each paragraph and prints the style name:

Sub Test()

Dim p As Paragraph

For Each p In w.ActiveDocument.Paragraphs
Debug.Writeline(p.Style & ": " & p.Range.Text)
Next p

End Sub

Does this help? If not, let me know more specifically where you're having trouble.

Nov 20 '05 #5
Correction... the line

StoryEnd = NextParagraph.Start - 1

Should be

StoryEnd = CurrentParagraph.Start -1
"Robert Jacobson" <rj**********************@nospam.com> wrote in message
news:Oe**************@TK2MSFTNGP09.phx.gbl...
Well, there isn't a single best way to do this. In my experience, the Find object is a little difficult to use. My approach would be a bit
different -- iterating through each paragraph, testing the paragraph's style using the Style property, and using a Range object to copy a story. Here's some pseudo code:
Dim StoryStart, StoryEnd as Integer
Dim CurrentParagraph as Word.Paragraph
Dim StoryRange as Word.Range

' Iterate through each paragraph in the document.
For each CurrentParagraph in w.ActiveDocument.Paragraphs

If CurrentParagraph.Style = (the start of a new story) then
' Define a range object for the previous story and copy it to the
clipboard
StoryEnd = NextParagraph.Start - 1
StoryRange = w.ActiveDocument.Range(CObj(StoryStart),
CObj(StoryEnd))
StoryRange.Copy
(Paste the code into a new document)
' Reset the StoryStart counter to the start of the next story
StoryStart = CurrentParagraph.Start
End If

Next CurrentParagraph
This is just air code, but hopefully gives you the idea. The key line is
the "If" test -- you need to insert some code to detect whether the
paragraph is the start of a new story.

Hope this helps.
"STeve" <sc***@symcor.com> wrote in message
news:53**************************@posting.google.c om...
Hey Robert,

Thanks for the quick reply and assisstance. This doesn't help me out
too much, I tried this code out but what is simplyl does is parse the
document paragraph by paragraph which doesn't necessarily work out in
my situation. For example say I have a document which looks like
this:

Heading1

para1
para2
para3

Heading3
Heading2

para1
para2

Heading1

para1
para2
para3

The parsing macro i wrote before simply parses this document line by
line looking for the various styles heading1 and heading3. So when it
comes across a new heading, i do a selection.HomeKey back to the
beginning of the document, cut that article out and paste it into a
new document and save that article as HTML. So basically for the
first article I would save as filename Heading1:

Heading1

para1
para2
para3

The next article would be (filename would be Heading3_Heading2):

Heading3
Heading2

para1
para2

and so on...

What I am thinking of doing in vb.net is using the Selection.Find
command first on "Heading 1" style. Cut that entire "article" out and
paste it into a new document. THen do another Selection.Find now on
"heading3" and then cut and paste that article into a new document
then finally save it as HTML. Is there a more efficient/elegant way
of doing this? Thanks for your time guys.

Thanks in advance
Steve

"Robert Jacobson" <rj**********************@nospam.com> wrote in message

news:<Oe**************@TK2MSFTNGP11.phx.gbl>...
Hi Steve,

The easiest way to check the styles in the document is to iterate through each paragraph in the document, rather than each line. I believe that a style applies to the entire paragraph, so it's shouldn't be possible for different lines in the same paragraph to have different styles.

Also, although it's not too important, it's usually preferable to use the Range object instead of the Selection object when using the Word object model -- the Range object gives you greater options, and is invisible
(doesn't cause screen flicker.)

Here's some quick code that iterates through each paragraph and prints the style name:

Sub Test()

Dim p As Paragraph

For Each p In w.ActiveDocument.Paragraphs
Debug.Writeline(p.Style & ": " & p.Range.Text)
Next p

End Sub

Does this help? If not, let me know more specifically where you're having trouble.


Nov 20 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Raj Singh | last post by:
I am facing a problem with Visual C# 2003 and MS Word XP. I am using a customized template for Word and in that template I have defined some bookmarks. I am trying to create a Word document based...
1
by: Arjen van der Hulst | last post by:
Hi all, I am trying to implement some Word-automation using the range.find-object (using VB.net and Word 2000, early binding). The sample underneath worked well, but after installing an 'after...
5
by: Mason | last post by:
I'm having some problems converting VBA for Word 2000 to code that VB.Net understands. I recorded a macro in Word to add numbering (a. b. c.) to my paragraphs. I managed to translate quite a bit...
0
by: necroph | last post by:
Hello everybody, i am using an imap-component to read e-mails from an exchange-Server. This is allready done. My problem is that the content of that mails looks confusing. My first try was just...
0
by: peter | last post by:
Hi All, I have a template with some VBA code behind and some controls on it such as combo boxes. When I try to create a new document based on this template from C# code it opens fine but the...
1
by: DZ | last post by:
Help! I wish to have the selection bar on a ListView control look like the selection bar on menus of VS2005. ie. Navy blue border with a light blue transparent background which shows the...
2
by: guardian | last post by:
Hey Guys, I was hoping someone could help me solve a problem I'm having using vb2005 to parse a text file. I'm trying to parse the text file by Sentences that start with Words that are all...
2
by: megahurtz | last post by:
This isn't necessarily specific to PHP, but I am trying to figure out a way to pseudo-randomly generate a number based on the month. To clarify a little, I want the number returned to stay the same...
9
by: DeZZar | last post by:
Hi all, Basically I want the data in an open form to merge with a word template and create a new document. So a user navigates to a particular records and presses a button "Produce Document"...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.