PDF Parser? - Python

Hello All,

I'm looking for a PDF parser.
Any pointers?

10x.
Miki

Jul 18 '05 #1

8789

>>>>> "Miki" == Miki Tebeka <te****@cs.bgu.ac.il> writes:

Miki> Hello All, I'm looking for a PDF parser. Any pointers?

A little more info would be helpful: do you need access to all the pdf
structures or just the text? AFAIK, there is no full pdf parser in
python. The subject has come up several times before, so check the
google.groups archives

http://groups.google.com/groups?q=pd...=Google+Search

Things people have suggested before:

1) use pdftotext and parse the text
2) wrap xpdf's parser.

For example, if you have pdftotext, the following will give you a
python file-like handle to the source:

def pdf2txt(fname):
return os.popen('pdftotext -raw -ascii7 %s -' % fname)

If you just want to search and index pdf, see
http://pdfsearch.sourceforge.net.

John Hunter

Jul 18 '05 #2

Adam Twardoch

"John Hunter" <jd******@ace.bsd.uchicago.edu>

A little more info would be helpful: do you need access to all the pdf
structures or just the text? AFAIK, there is no full pdf parser in
python.

If you need to access the graphical elements, you may use pstoedit to
convert the PDF into SVG (Structured Vector Graphics). Since SVG is XML, you
can then use any Python-based XML toolkit to parse the data.
http://www.pstoedit.net/pstoedit

Adam

Jul 18 '05 #3

Similar topics

package similar to XML::Simple

by: Paulo Pinto | last post by:

Hi, does anyone know of a Python package that is able to load XML like the XML::Simple Perl package does? For those that don't know it, this package maps the XML file to a dictionary.

Python

Choosing the right parser for parsing C headers

by: Jean de Largentaye | last post by:

Hi, I need to parse a subset of C (a header file), and generate some unit tests for the functions listed in it. I thus need to parse the code, then rewrite function calls with wrong parameters....

Python

Python 2.3.5 make: *** [Parser/pgen] Error 1 Parser/grammar.o: I

by: Karalius, Joseph | last post by:

Can anyone explain what is happening here? I haven't found any useful info on Google yet. Thanks in advance. mmagnet:/home/jkaralius/src/zopeplone/Python-2.3.5 # make gcc -pthread -c...

Python

Where to look for source of HTML::Parser

by: Himanshu Garg | last post by:

Hello, I am trying to pinpoint an apparent bug in HTML::Parser. The encoding of the text seems to change incorrectly if the locale isn't set properly. However Parser.pm in the directory...

Perl

import parser does not import parser.py in same dir on win

by: Joel Hedlund | last post by:

Hi! I have a possibly dumb question about imports. I've written two python modules: parser.py ------------------------------------ class Parser(object): "my parser"...

Python

PEAR::HTML_BBCodeParser Parser Issue

by: thewarden | last post by:

I've come into a situation where I require to have BBCode parsed, this includes the standard tags supported by PEAR package HTML_BBCodeParser and custom BBCode tags I've added myself. My problem...

PHP

Linq; expression parser?

by: Marc Gravell | last post by:

In Linq, you can apparently get a meaningful body from and expression's .ToString(); random question - does anybody know if linq also includes a parser? It just seemed it might be a handy way to...

C# / C Sharp

Neophyte having trouble Installing XML::Parser module on OS X

by: UncleRic | last post by:

Environment: Mac OS X (10.4.10) on MacBook Pro I'm a Perl Neophyte. I've downloaded the XML::Parser module and am attempting to install it in my working directory (referenced via PERL5LIB env): ...

Perl

Is pyparsing really a recursive descent parser?

by: Just Another Victim of the Ambient Morality | last post by:

Is pyparsing really a recursive descent parser? I ask this because there are grammars it can't parse that my recursive descent parser would parse, should I have written one. For instance: ...

Python

Clearing parser state of a bison generated parser

by: arvindkgs | last post by:

Iam using c lexer that is flex generated and a c++ parser that is bison generated. i have modified the parser to acccept only string input. I am calling the parser function yyparse in a loop and...

C / C++

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++