How to use this module File::Extract::PDF to extract the text from pdf. Need the guidance in writing the program.
thank you
5 7507
How to use this module File::Extract::PDF to extract the text from pdf. Need the guidance in writing the program.
thank you
I do not have that module loaded, and there is not a lot of documentation on it.
But from taking a look at the source, it seems this would print each line in the entire file: -
-
#!/usr/bin/perl
-
-
use strict;
-
use warnings;
-
-
use File::Extract::PDF;
-
-
my $target = new File::Extract::PDF;
-
-
$target->extract(FH, "pdfdocument.pdf") or die;
-
-
while <FH> {
-
print "$_\n";
-
}
-
-
close(FH);
-
-
Unfortunately I can't test it.
I am only hoping to get the ball rolling, and hope to learn from this myself.
If (or when) this doesn't work, post any errors you may get.
goodday
I do not have that module loaded, and there is not a lot of documentation on it.
But from taking a look at the source, it seems this would print each line in the entire file: -
-
#!/usr/bin/perl
-
-
use strict;
-
use warnings;
-
-
use File::Extract::PDF;
-
-
my $target = new File::Extract::PDF;
-
-
$target->extract(FH, "pdfdocument.pdf") or die;
-
-
while <FH> {
-
print "$_\n";
-
}
-
-
close(FH);
-
-
Unfortunately I can't test it.
I am only hoping to get the ball rolling, and hope to learn from this myself.
If (or when) this doesn't work, post any errors you may get.
goodday
And, in addition, if you wanted to write each line to its own text file, then just use the open() function to open the text file and then add the file handle to the print statement, like so: -
-
#!/usr/bin/perl
-
-
use strict;
-
use warnings;
-
-
use File::Extract::PDF;
-
-
open(NEWFILE, ">./newfile.txt");
-
my $target = new File::Extract::PDF;
-
-
$target->extract(FH, "pdfdocument.pdf") or die;
-
-
while <FH> {
-
print NEWFILE "$_\n";
-
}
-
-
close(FH);
-
close(NEWFILE);
-
Regards,
Jeff
Thank you for your reply.
If I execute this code I am getting following error. how to clear this?
Bareword "FH" not allowed
waiting for ur reply
regs,
kamalatanvi
Thank you for your reply.
If I execute this code I am getting following error. how to clear this?
Bareword "FH" not allowed
waiting for ur reply
regs,
kamalatanvi
I would have to say that this module (being version .06, which is WELL below version 1.0) probably has many issues as it looks to be fairly new. It may be that the extract function is not completely debugged to work correctly.
You have a couple of options here.
1. I would go through the module code and ensure that the way you are using it is completely correct.
2. If it is, you could always email the author and see what their input is.
3. You could always implement your own solution to this ( a lot longer route).
This is generally the problem with modules that are so very new. They tend to be "not ready for primetime" but are available on CPAN. If you check, there is NO documentation on CPAN for this module either.
Regards,
Jeff
Hi, I'm quite new to Perl world but I think I can help you somehow, though using CAM::PDF module. I found I could extract text from pdf pages with the following sentences: -
.........
-
use CAM::PDF;
-
-
.........
-
-
my $pdf = CAM::PDF->new($filename);
-
-
print ARCHIVO ( CAM::PDF::PageText->render($pdf->getPageContentTree($numpage)));
-
-
........
-
-
This should print into the Filehandle ARCHIVO, associated to a *.txt file in my program, the text in the pdf page as plain text, as the method returns a string, allowing you further processing. Hope this helps.
Sign in to post your reply or Sign up for a free account.
Similar topics
by: Mike |
last post by:
Hello,
I'm looking to create a PHP script that will automatically generate an
index/menu/list (whatever) based on the PDF files that are within a
particular directory. I would like the script...
|
by: Miki Tebeka |
last post by:
Hello All,
I'm looking for a PDF parser.
Any pointers?
10x.
Miki
|
by: B P |
last post by:
Is there a way via Python or even Perl to capture records from a pdf and
output a delimited text file? My work has a situation with a trunk
load of data forms that were scanned as pdfs.
The...
|
by: david |
last post by:
hi:
The file can be PDF or Word format. Any help?
thx
|
by: Rukmal Fernando |
last post by:
Hi,
I'm working on a tool to do text indexing on documents and want to include
support to index PDF files as well.
Does anyone know any tool or method of extracting the text from PDF files
into...
|
by: Jay |
last post by:
Let's say, for instance, that one was programming a spell checker or
some other function where the contents of a string from a text-editor's
text box needed to be split so that the resulting array...
|
by: Vyz |
last post by:
I am looking for a PDF to text script. I am working with multibyte
language PDFs on Windows Xp. I need to batch convert them to text and
feed into an encoding converter program
Thanks for any...
|
by: grey |
last post by:
does anyone suggest me how to write a windows application for comparing two
pdf content. The requirement is very easy... i only need to inform user two
pdf are differnet, no need to spot where the...
|
by: SteveB |
last post by:
I have posted this question in the Visual Basic 2005 and Visual
Basic .Net 2005 discussion groups, also.
Hi. I am developing an application/web page with VB.Net that will
populate a SQL...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: af34tf |
last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
| |