473,327 Members | 1,920 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

PDF to text script

Vyz
I am looking for a PDF to text script. I am working with multibyte
language PDFs on Windows Xp. I need to batch convert them to text and
feed into an encoding converter program

Thanks for any help in this regard

Nov 10 '06 #1
2 2484
In article <11********************@e3g2000cwe.googlegroups.co m>,
Vyz <vy*******@gmail.comwrote:
>I am looking for a PDF to text script. I am working with multibyte
language PDFs on Windows Xp. I need to batch convert them to text and
feed into an encoding converter program

Thanks for any help in this regard
<URL: http://phaseit.net/claird/comp.text....s.html#pdf2txt >

Nov 10 '06 #2
Vyz wrote:
I am looking for a PDF to text script. I am working with multibyte
language PDFs on Windows Xp. I need to batch convert them to text and
feed into an encoding converter program

Thanks for any help in this regard
Multibyte languages are not easy. I do text extraction from PDF but 1)
I do it on Linux and 2) I only need English text. The utility I use is
pdftotext that comes as part of XPDF *nix package.

The other problem however, is not with the extraction but with the fact
that after you extract the text, it might not look very good. In other
words, the extraction program will never complain but will nevertheless
produce garbage. Then you have to process the result yourself. For
example, whitespace is not consistent, sometimes there will be extra
whitespace -- sometimes there won't be enough for example " S o m e
w ordsloo l i k e t his" and so on...

The real answer is that pdf text extraction is pretty hard. It is a
1000x better to get a hold of the original source...

Nick V.

Nov 11 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Jennie Friesen | last post by:
Hello-- I would like to display a different line of text (32 different messages) on refresh, BUT with no repeats. The script I am currently using is: <script language=javascript...
4
by: Dan | last post by:
Can anyone offer suggestions on how to do this or if it is possible? I have a form that uses a drop down box and 2 text fields. What I am trying to do is have the value of each text box set by...
8
by: Jakej | last post by:
I've been using a javascript in an html file for a banner slider, and it works as desired. But I'd like to use it on more than one page and it would be great if I could transfer the code to a .js...
28
by: Randy Starkey | last post by:
Hi, Does anyone know where I can get a script that show a little plus sign after a line of text, that when you click the plus sign, more text is revealed on that same page, like a continuing...
3
by: Alex | last post by:
Hello. First, with AJAX I will get a remote web page into a string. Thus, a string will contain HTML tags and such. I will need to extract text from one <span> for which I know the ID the inner...
5
by: PythonistL | last post by:
I am a newbie with Javascript. I have this simple script for scrolling text <HTML> <HEAD> <TITLE>Scrolling Message Script</TITLE> <SCRIPT language="JavaScript"><!-- var msg = 'My scrolling...
9
by: Erwin Moller | last post by:
Hi, Can anybody comment on this? In comp.lang.php I advised somebody to skip using: <script language="javascript"> and use: <script type="text/javascript"> And mr. Dunlop gave this response:
2
by: Slain | last post by:
I have a big list of HTML files, which need to be updated with a common text. <script language=JavaScript src="./highlight.js"></script> I need to add the above line in each of the html...
1
by: littlealex | last post by:
IE6 not displaying text correctly - IE 7 & Firefox 3 are fine! Need some help with this as fairly new to CSS! In IE6 the text for the following page doesn't display properly - rather than being...
16
by: Wayne | last post by:
I've read that one method of repairing a misbehaving database is to save all database objects as text and then rebuild them from the text files. I've used the following code posted by Lyle...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.