473,396 Members | 1,997 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Parsing word Doc using PERL in Windows

Hi All

I am parsing a word doc using perl. I am using Win32::OLE module for this.

I am able to get the Paragraphs/styles/Text from the word doc. But facing some problem when I am trying to get a Text with Bullets/numbering. It Only displays the text, but i want to get the bullet number also from the text line.

I am not getting the different methods available in Paragraphs(like Range, Style....etc). How do i get the list of methods? Please Help me.
Here is my code :
Expand|Select|Wrap|Line Numbers
  1. use Win32::OLE;
  2. use Win32::OLE::Enum;
  3. use File::Copy;
  4. use strict;
  5.  
  6. my $fileName = "C:\\FileName.doc";
  7. my $document = Win32::OLE -> GetObject($fileName);
  8. #Creating a new excel sheet
  9. my $xl_app=Win32::OLE->new('Excel.Application','Quit');
  10.  
  11.  
  12. my $paragraphs = $document->Paragraphs();
  13. my $enumerate = new Win32::OLE::Enum($paragraphs);
  14.  
  15. while(defined($paragraph = $enumerate->Next()))
  16. {
  17.     $style = $paragraph->{Style}->{NameLocal};
  18.     $text = $paragraph->{Range}->{Text};
  19.     $text =~ s/[\n\r]//g;
  20.     $text =~ s/\x0b/\n/g;
  21.     $text =~ s/\x07//g;
  22.     print "\nStyle = $style";
  23.     print "\nText = $text";
  24. }
  25.  
Thanks & Regards
Pramod
Mar 12 '08 #1
2 5809
numberwhun
3,509 Expert Mod 2GB
Hi All

I am parsing a word doc using perl. I am using Win32::OLE module for this.

I am able to get the Paragraphs/styles/Text from the word doc. But facing some problem when I am trying to get a Text with Bullets/numbering. It Only displays the text, but i want to get the bullet number also from the text line.

I am not getting the different methods available in Paragraphs(like Range, Style....etc). How do i get the list of methods? Please Help me.
Here is my code :
Expand|Select|Wrap|Line Numbers
  1. use Win32::OLE;
  2. use Win32::OLE::Enum;
  3. use File::Copy;
  4. use strict;
  5.  
  6. my $fileName = "C:\\FileName.doc";
  7. my $document = Win32::OLE -> GetObject($fileName);
  8. #Creating a new excel sheet
  9. my $xl_app=Win32::OLE->new('Excel.Application','Quit');
  10.  
  11.  
  12. my $paragraphs = $document->Paragraphs();
  13. my $enumerate = new Win32::OLE::Enum($paragraphs);
  14.  
  15. while(defined($paragraph = $enumerate->Next()))
  16. {
  17.     $style = $paragraph->{Style}->{NameLocal};
  18.     $text = $paragraph->{Range}->{Text};
  19.     $text =~ s/[\n\r]//g;
  20.     $text =~ s/\x0b/\n/g;
  21.     $text =~ s/\x07//g;
  22.     print "\nStyle = $style";
  23.     print "\nText = $text";
  24. }
  25.  
Thanks & Regards
Pramod
You were provided some reading material on this over at Dev Shed . Did that help at all?

Regards,

Jeff
Mar 12 '08 #2
Yes I did go through the Documentation. But looks like that is outdated. I use Office2003.
After some googling i came to know that all the methods are same as that of VB. So I created a macro in MSWord and Recorded some actions like adding Headings, Text, bullets etc.I checked the source code of the Macro(which is in VB) and found out the following information:
Expand|Select|Wrap|Line Numbers
  1. Sub BulletMacro()
  2. '
  3. ' BulletMacro Macro
  4. ' Macro recorded 2008-03-13 by ing03125
  5. '
  6.     Selection.TypeParagraph
  7.     Selection.Style = ActiveDocument.Styles("Heading 2,Paragraph Title,l2")
  8.     Selection.TypeText Text:="TE_Testcase: test"
  9.     Selection.TypeParagraph
  10.     Selection.TypeParagraph
  11.     Selection.Style = ActiveDocument.Styles("Heading 2,Paragraph Title,l2")
  12.     Selection.TypeText Text:=" TE_Testcase: Test1"
  13.     Selection.TypeParagraph
  14.     Selection.TypeParagraph
  15.     With ListGalleries(wdBulletGallery).ListTemplates(1).ListLevels(1)
  16.         .NumberFormat = ChrW(61623)
  17.         .TrailingCharacter = wdTrailingTab
  18.         .NumberStyle = wdListNumberStyleBullet
  19.         .NumberPosition = InchesToPoints(0.25)
  20.         .Alignment = wdListLevelAlignLeft
  21.         .TextPosition = InchesToPoints(0.5)
  22.         .TabPosition = InchesToPoints(0.5)
  23.         .ResetOnHigher = 0
  24.         .StartAt = 1
  25.         With .Font
  26.             .Bold = wdUndefined
  27.             .Italic = wdUndefined
  28.             .StrikeThrough = wdUndefined
  29.             .Subscript = wdUndefined
  30.             .Superscript = wdUndefined
  31.             .Shadow = wdUndefined
  32.             .Outline = wdUndefined
  33.             .Emboss = wdUndefined
  34.             .Engrave = wdUndefined
  35.             .AllCaps = wdUndefined
  36.             .Hidden = wdUndefined
  37.             .Underline = wdUndefined
  38.             .Color = wdUndefined
  39.             .Size = wdUndefined
  40.             .Animation = wdUndefined
  41.             .DoubleStrikeThrough = wdUndefined
  42.             .Name = "Symbol"
  43.         End With
  44.         .LinkedStyle = ""
  45.     End With
  46.     ListGalleries(wdBulletGallery).ListTemplates(1).Name = ""
  47.     Selection.Range.ListFormat.ApplyListTemplate ListTemplate:=ListGalleries( _
  48.         wdBulletGallery).ListTemplates(1), ContinuePreviousList:=False, ApplyTo:= _
  49.         wdListApplyToWholeList, DefaultListBehavior:=wdWord10ListBehavior
  50.     Selection.TypeText Text:="first"
  51.     Selection.TypeParagraph
  52.     Selection.TypeText Text:="second"
  53.     Selection.TypeParagraph
  54.     Selection.TypeText Text:="third"
  55.     Selection.TypeParagraph
  56.     Selection.TypeBackspace
  57. End Sub
  58.  
But looks like this info is also not useful for me. Because when I try to print the Numbering associated with Headings/Bullets, Perl will print only the Style and does not give the value associated with it. like this:

Style = Heading 1,Chapter Title,l1,TOC,1

Please let me know if anyone knows about it.

Thanks
Pramod
Mar 13 '08 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: Scott | last post by:
I am new to perl, and have not found any good examples of parsing to help me out. I have a text file that I am reading into an array that has to be parsed out and put into another file. I have not...
8
by: Jean-Marie Vaneskahian | last post by:
Reading - Parsing Records From An LDAP LDIF File In .Net? I am in need of a .Net class that will allow for the parsing of a LDAP LDIF file. An LDIF file is the standard format for representing...
9
by: ankitdesai | last post by:
I would like to parse a couple of tables within an individual player's SHTML page. For example, I would like to get the "Actual Pitching Statistics" and the "Translated Pitching Statistics"...
4
by: Gary Wessle | last post by:
Hi how can I do this in C++ string myword; string get_word_from_this_url( url ){ bool flag = true; while flag; download this url and search for this word; if found; set flag to false;
3
by: aspineux | last post by:
My goal is to write a parser for these imaginary string from the SMTP protocol, regarding RFC 821 and 1869. I'm a little flexible with the BNF from these RFC :-) Any comment ? tests= def...
4
by: R Wood | last post by:
Greetings - A recent Perl experiment hasn't turned out so well, which has piqued my interest in Python. The project is this: take a Vcard file exported from Apple's Addressbook and use a...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
1
by: worlman385 | last post by:
I need to parse the following HTML page and extract TV listing data using VC++ http://tvlistings.zap2it.com/tvlistings/ZCGrid.do any good way to extract the data? is easy for VC++ to call...
1
by: andrewwan1980 | last post by:
I need help in parsing unicode webpages & downloading jpeg image files via Perl scripts. I read http://www.cs.utk.edu/cs594ipm/perl/crawltut.html about using LWP or HTTP or get($url) functions &...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.