473,322 Members | 1,409 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Unable to extract data from PDF file

Hi,

I unable to extract data from attached pdf file.Please help on this

Thanks,
Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. use CAM::PDF;
  4. use CAM::PDF::PageText;
  5. my $file_name='Planningapplications11March17March2013.txt';
  6. my $pdf = CAM::PDF->new("Planningapplications11March17March2013.pdf");
  7. my $page=$pdf->numPages();
  8. print "page :: $page \n";
  9. open my $fh, ">:encoding(utf-8)",  $file_name   or die "could not open $file_name : $!\n";
  10. print $fh "";
  11. close $fh;
  12. for(my $i=1;$i<=$page;$i++)
  13. {
  14.     my $pageone_tree = $pdf->getPageContentTree($i);
  15.     open my $fh, ">>:encoding(utf-8)",  $file_name   or die "could not open $file_name : $!\n";
  16.     eval{
  17.     print $fh CAM::PDF::PageText->render($pageone_tree);
  18.     };
  19.     close $fh;
  20. }
Sathish
Attached Files
File Type: pdf Planningapplications11March17March2013.pdf (75.5 KB, 376 views)
Apr 11 '13 #1
5 3755
numberwhun
3,509 Expert Mod 2GB
Before anyone goes downloading your pdf and attempting to make calculated guesses, why not reply to this please, and let us know what your script is supposed to do, what output you are expecting and what you are seeing instead. If there are any error messages, we will definitely have to know about those as well so that they can be recreated.
May 25 '13 #2
Oralloy
985 Expert 512MB
sathishkumar se,

Not only do we need to know what you are trying to do, but if your code is failing and producing messages, we would like to know what they are...

Regards,
Oralloy
May 26 '13 #3
Hi numberwhun/Oralloy,

I need convert pdf content to txt file, But I unable convert this.I got the following error using below script. "Can't use an undefined value as an ARRAY reference at C:/Perl/site/lib/CAM/PDF/PageText.pm line 57."
Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. use CAM::PDF;
  4. use CAM::PDF::PageText;
  5. my $file_name='Planningapplications11March17March2013.txt';
  6. my $pdf = CAM::PDF->new("Planningapplications11March17March2013.pdf");
  7. my $page=$pdf->numPages();
  8. print "page :: $page \n";
  9. open my $fh, ">:encoding(utf-8)",  $file_name   or die "could not open $file_name : $!\n";
  10. print $fh "";
  11. close $fh;
  12. for(my $i=1;$i<=$page;$i++)
  13. {
  14. my $pageone_tree = $pdf->getPageContentTree($i);
  15. open my $fh, ">>:encoding(utf-8)",  $file_name   or die "could not open $file_name : $!\n";
  16.  
  17. print $fh CAM::PDF::PageText->render($pageone_tree);
  18.  
  19. close $fh;
  20. }
  21.  
May 29 '13 #4
Oralloy
985 Expert 512MB
sathishkumar se,

So the CAM::PDF module is having internal difficulties.

i did notice that the module said fully compatable with v1.5 PDF files. Could it be that your file is built to a more recent specification?

Also, I do know that PDF files have a number of methods for compressing data, and when I was using Perl to manipulate them, the module I was using did not handle any compression at all. Meaning that the compressed objects would not even show up in the object array, the entries for the compressed objects would just be empty.

Also, are you having difficulty with just one page, or every page? It would be interesting to see what sorts of objects are giving the module fits.

If worse comes to worst, you might try sending the document and code to the module's author and ask for help. I have had success by supplying complete information to the original author of one of the XML schema modules in the past. The (now obsolete) PDF module that I used ended up being completely unsupported, however.

Regards,
Oralloy
May 29 '13 #5
Thanks for your valuable comments Oralloy
May 30 '13 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

8
by: nick | last post by:
Hi all can any one please tell me what is wrong in this code?? I'm new to deal with text files and extract data. i'm trying to look for data in a text file (3~4 pages) some lines start with a...
2
by: missolsr | last post by:
hi, I am using jpcap to capture OLSR topology control (udp) packets. Does anyone know how to extract data (the way ethereal does it) from the olsr packet? There are methods to extract data...
2
by: tgmcnaughton | last post by:
I'm brand new to server-side scripting. I don't even know if javascript can do this. I would like a script running on my server to periodically login to and check my email account and each time...
5
by: vshalpatel | last post by:
Hi I want to use SQL*Loader , an Oracle-supplied utility to load data from a flat file into one database tables. for this I have write the scripts in the SQL*LOADER control file named ...
3
by: learningvbnet | last post by:
Hi, I am trying to extract zipped files using Winzip in my VB.net application and I ran into 2 stone walls. 1. How do you handle file names with spaces. See psiProcess.Arguments For...
2
by: someusernamehere | last post by:
Hey, I need to create an application wich extract data from a .csv text (delimited by commas), the problem is that I only need some relevant data, its posssible extract what I want (may be with...
1
by: manishabh77 | last post by:
I will be obliged if anybody can help me with this problem: I am trying to extract data from an excel sheet that matches IDs given in column 4 of the excel sheet.I have stored those query IDs in an...
1
by: veer | last post by:
Hi i am making a program in which i want to extract data from html file . Actually there are two dates on html file i want to extract these dates but the main probleum is that these dates are...
3
by: =?Utf-8?B?YzY3NjIyOA==?= | last post by:
Hi all, I have a question for you. I have a .csv file which has many lines of data. Each line has many data fields which are delimited by ",". Now I need to extract part of data from this...
3
by: lenniekuah | last post by:
I have been asked to extract data from XML File using NODE NAME. I am new to XML and I do not know how to identify or understand what is NODE NAME. This is the XML File content, Please show me...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.