By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,841 Members | 1,372 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,841 IT Pros & Developers. It's quick & easy.

Parsing word Doc using PERL in Windows

P: 23
Hi All

I am parsing a word doc using perl. I am using Win32::OLE module for this.

I am able to get the Paragraphs/styles/Text from the word doc. But facing some problem when I am trying to get a Text with Bullets/numbering. It Only displays the text, but i want to get the bullet number also from the text line.

I am not getting the different methods available in Paragraphs(like Range, Style....etc). How do i get the list of methods? Please Help me.
Here is my code :
Expand|Select|Wrap|Line Numbers
  1. use Win32::OLE;
  2. use Win32::OLE::Enum;
  3. use File::Copy;
  4. use strict;
  5.  
  6. my $fileName = "C:\\FileName.doc";
  7. my $document = Win32::OLE -> GetObject($fileName);
  8. #Creating a new excel sheet
  9. my $xl_app=Win32::OLE->new('Excel.Application','Quit');
  10.  
  11.  
  12. my $paragraphs = $document->Paragraphs();
  13. my $enumerate = new Win32::OLE::Enum($paragraphs);
  14.  
  15. while(defined($paragraph = $enumerate->Next()))
  16. {
  17.     $style = $paragraph->{Style}->{NameLocal};
  18.     $text = $paragraph->{Range}->{Text};
  19.     $text =~ s/[\n\r]//g;
  20.     $text =~ s/\x0b/\n/g;
  21.     $text =~ s/\x07//g;
  22.     print "\nStyle = $style";
  23.     print "\nText = $text";
  24. }
  25.  
Thanks & Regards
Pramod
Mar 12 '08 #1
Share this Question
Share on Google+
2 Replies


numberwhun
Expert Mod 2.5K+
P: 3,503
Hi All

I am parsing a word doc using perl. I am using Win32::OLE module for this.

I am able to get the Paragraphs/styles/Text from the word doc. But facing some problem when I am trying to get a Text with Bullets/numbering. It Only displays the text, but i want to get the bullet number also from the text line.

I am not getting the different methods available in Paragraphs(like Range, Style....etc). How do i get the list of methods? Please Help me.
Here is my code :
Expand|Select|Wrap|Line Numbers
  1. use Win32::OLE;
  2. use Win32::OLE::Enum;
  3. use File::Copy;
  4. use strict;
  5.  
  6. my $fileName = "C:\\FileName.doc";
  7. my $document = Win32::OLE -> GetObject($fileName);
  8. #Creating a new excel sheet
  9. my $xl_app=Win32::OLE->new('Excel.Application','Quit');
  10.  
  11.  
  12. my $paragraphs = $document->Paragraphs();
  13. my $enumerate = new Win32::OLE::Enum($paragraphs);
  14.  
  15. while(defined($paragraph = $enumerate->Next()))
  16. {
  17.     $style = $paragraph->{Style}->{NameLocal};
  18.     $text = $paragraph->{Range}->{Text};
  19.     $text =~ s/[\n\r]//g;
  20.     $text =~ s/\x0b/\n/g;
  21.     $text =~ s/\x07//g;
  22.     print "\nStyle = $style";
  23.     print "\nText = $text";
  24. }
  25.  
Thanks & Regards
Pramod
You were provided some reading material on this over at Dev Shed . Did that help at all?

Regards,

Jeff
Mar 12 '08 #2

P: 23
Yes I did go through the Documentation. But looks like that is outdated. I use Office2003.
After some googling i came to know that all the methods are same as that of VB. So I created a macro in MSWord and Recorded some actions like adding Headings, Text, bullets etc.I checked the source code of the Macro(which is in VB) and found out the following information:
Expand|Select|Wrap|Line Numbers
  1. Sub BulletMacro()
  2. '
  3. ' BulletMacro Macro
  4. ' Macro recorded 2008-03-13 by ing03125
  5. '
  6.     Selection.TypeParagraph
  7.     Selection.Style = ActiveDocument.Styles("Heading 2,Paragraph Title,l2")
  8.     Selection.TypeText Text:="TE_Testcase: test"
  9.     Selection.TypeParagraph
  10.     Selection.TypeParagraph
  11.     Selection.Style = ActiveDocument.Styles("Heading 2,Paragraph Title,l2")
  12.     Selection.TypeText Text:=" TE_Testcase: Test1"
  13.     Selection.TypeParagraph
  14.     Selection.TypeParagraph
  15.     With ListGalleries(wdBulletGallery).ListTemplates(1).ListLevels(1)
  16.         .NumberFormat = ChrW(61623)
  17.         .TrailingCharacter = wdTrailingTab
  18.         .NumberStyle = wdListNumberStyleBullet
  19.         .NumberPosition = InchesToPoints(0.25)
  20.         .Alignment = wdListLevelAlignLeft
  21.         .TextPosition = InchesToPoints(0.5)
  22.         .TabPosition = InchesToPoints(0.5)
  23.         .ResetOnHigher = 0
  24.         .StartAt = 1
  25.         With .Font
  26.             .Bold = wdUndefined
  27.             .Italic = wdUndefined
  28.             .StrikeThrough = wdUndefined
  29.             .Subscript = wdUndefined
  30.             .Superscript = wdUndefined
  31.             .Shadow = wdUndefined
  32.             .Outline = wdUndefined
  33.             .Emboss = wdUndefined
  34.             .Engrave = wdUndefined
  35.             .AllCaps = wdUndefined
  36.             .Hidden = wdUndefined
  37.             .Underline = wdUndefined
  38.             .Color = wdUndefined
  39.             .Size = wdUndefined
  40.             .Animation = wdUndefined
  41.             .DoubleStrikeThrough = wdUndefined
  42.             .Name = "Symbol"
  43.         End With
  44.         .LinkedStyle = ""
  45.     End With
  46.     ListGalleries(wdBulletGallery).ListTemplates(1).Name = ""
  47.     Selection.Range.ListFormat.ApplyListTemplate ListTemplate:=ListGalleries( _
  48.         wdBulletGallery).ListTemplates(1), ContinuePreviousList:=False, ApplyTo:= _
  49.         wdListApplyToWholeList, DefaultListBehavior:=wdWord10ListBehavior
  50.     Selection.TypeText Text:="first"
  51.     Selection.TypeParagraph
  52.     Selection.TypeText Text:="second"
  53.     Selection.TypeParagraph
  54.     Selection.TypeText Text:="third"
  55.     Selection.TypeParagraph
  56.     Selection.TypeBackspace
  57. End Sub
  58.  
But looks like this info is also not useful for me. Because when I try to print the Numbering associated with Headings/Bullets, Perl will print only the Style and does not give the value associated with it. like this:

Style = Heading 1,Chapter Title,l1,TOC,1

Please let me know if anyone knows about it.

Thanks
Pramod
Mar 13 '08 #3

Post your reply

Sign in to post your reply or Sign up for a free account.