473,396 Members | 1,770 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Library for parsing pdf docs?

Hi,
I'm looking for a c/C++ library (suitable for Linux and Windows) to
parse pdf documents. I only need the plain text and defined page breaks
as output.
Parsing trough a system ("pdftotext ..."); does this for me, but I'm
looking for a more elegant way using a library.
xpdf does not come with any library nor API documentation.

Thanks for any hints, Bernd

--
BM Computer-Services, Bergmannstr. 66, 10961 Berlin
Webdesign, Internet, Layout und Grafik
Tel.: 030/20649400, mobil 0175/7419517, Fax: 030/20649401
Web: http://www.bmservices.de, eMail: ko*****@bmservices.de
Feb 16 '06 #1
3 5747
Bernd Muent wrote:
I'm looking for a c/C++ library (suitable for Linux and Windows) to
parse pdf documents. [..]

Thanks for any hints, Bernd


Hint: www.google.com

V
--
Please remove capital As from my address when replying by mail
Feb 16 '06 #2
Victor Bazarov schrieb:
Hint: www.google.com


Ha Ha.
I spended 1.5 hours to do that. I found only libraries for creating pdf
files, but none for parsing them to extract words and the pages they
were on.

B.

--
BM Computer-Services, Bergmannstr. 66, 10961 Berlin
Webdesign, Internet, Layout und Grafik
Tel.: 030/20649400, mobil 0175/7419517, Fax: 030/20649401
Web: http://www.bmservices.de, eMail: ko*****@bmservices.de
Feb 17 '06 #3

Bernd Muent wrote:
Victor Bazarov schrieb:
Hint: www.google.com


Ha Ha.
I spended 1.5 hours to do that. I found only libraries for creating pdf
files, but none for parsing them to extract words and the pages they
were on.

B.


Also remember to search code repositories like sourceforge and
freshmeat.

http://freshmeat.net/search/?q=pdf&t...&Go.x=0&Go.y=0

Feb 17 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: bugbear | last post by:
Subject pretty much says it all. I'd like to parse XML (duh!) using Xerces (because its fast, and reliable, and comprehensive, and supports lots of features). I'd like to conform to standards...
6
by: Tuang | last post by:
I've been looking all over in the docs, but I can't figure out how you're *supposed* to parse formatted strings into numbers (and other data types, for that matter) in Python. In C#, you can say...
2
by: Todd Moyer | last post by:
I would like to use Python to parse a *python-like* data description language. That is, it would have it's own keywords, but would have a syntax like Python. For instance: Ob1 ('A'): Ob2...
3
by: Willem Ligtenberg | last post by:
I decided to use SAX to parse my xml file. But the parser crashes on: File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception...
13
by: mateuszb | last post by:
Is there any opensource implementation of such library which can be used to parse HTTP headers received from server, or can be used to construct such HTTP headers ??
3
by: Benjamin Rutt | last post by:
Does anyone have a database of C standard library functions? or even clean header files with just the prototypes? when I look at my system's header files, I see a bunch of "implementation...
3
by: MMiGG | last post by:
Hi Our project need parse JAVA serialized object string in C, has any library? Thanx
3
by: albert.neu | last post by:
Hello! The C99 library is part of C++ TR1. From: http://www.dinkumware.com/tr1.aspx comes this information: "C99 library, including all the numerous functions added to the C Standard with...
0
by: Mitchel Haas | last post by:
I've noticed several inquiries in the past for libraries/toolkits to generate or parse xhtml. Although there are already a few libraries available for this purpose, I'd like to announce a new...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.