473,383 Members | 1,748 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Convert pdf to csv or xls

170 100+
Does anyone know of a php script to convert a pdf document to csv or xls? The pdf is a table of data and the data needs to stay in its columns. I can get the data to txt but there are variable numbers of spaces between fields in each record. If not a script, a way to get the job done or some ideas?

Thanks
May 24 '08 #1
11 16079
hsriat
1,654 Expert 1GB
Does anyone know of a php script to convert a pdf document to csv or xls? The pdf is a table of data and the data needs to stay in its columns. I can get the data to txt but there are variable numbers of spaces between fields in each record. If not a script, a way to get the job done or some ideas?

Thanks
pdf can't be converted to any of these formats. Actually it can't be converted accurately to any format.
May 24 '08 #2
Atli
5,058 Expert 4TB
How about using explode and preg_split on that text output to split them into useful data?

Assuming you get every row in a new line, you could start by exploding them by the \n newline char and then splitting every line into fields using regex. Something like "/\s+/" should work.
May 24 '08 #3
hsriat
1,654 Expert 1GB
How about using explode and preg_split on that text output to split them into useful data?

Assuming you get every row in a new line, you could start by exploding them by the \n newline char and then splitting every line into fields using regex. Something like "/\s+/" should work.
Do you think that would work on pdf?
May 24 '08 #4
Atli
5,058 Expert 4TB
Do you think that would work on pdf?
Probably not no, but the OP said he could get the data in a text file.
Unless I'm misunderstanding something :P
May 24 '08 #5
beary
170 100+
Probably not no, but the OP said he could get the data in a text file.
Unless I'm misunderstanding something :P
No Atli, you're not misunderstanding. I can get the data into a text file. Well at least into a Word file. The problem is it's not in columns. It looks like this:

A bad case of stripes [some spaces] Shannon, David 1-2
A bad spell for the worst witch [some spaces] Murphy, Jill 3-4
A bad week for the three bears [some spaces] Bradman, Tony 1-2

Between the 3 "columns" is a number of spaces. Problem is the number of spaces is variable, and it depends on the length of the title of the book, and then in the 2nd set of spaces, the length of the author. So I can't just use convert text to table. (I tried to use spaces and to treat consecutive spaces as one space, but then I'm going to get lots of columns, depending on the numbe of words in the title.

So, if there was some way to convert this list (in a word doc) to a table with 3 columns (title, author, grade) I'd be in the clear.

I hope this doesn't cloud the issue. Do I have any hope of doing this?

[The reason I want all this done is because I need to use php to write a script to compare one set of books with another set of books. But I need the book info in an array first. I can get it into an array once the data is safely in excel...]
May 24 '08 #6
hsriat
1,654 Expert 1GB
You can try to use regular expression to separate the data units.

Like this..[php]<?php
$text_content = "A bad case of stripes Shannon, David 1-2
A bad spell for the worst witch Murphy, Jill 3-4
A bad week for the three bears Bradman, Tony 1-2";

$rows = explode("\n", $text_content);
$books = array("name"=>array(), "author"=>array(), "grade"=>array());
foreach ($rows as $row)
{
$row = preg_replace('/\s\s+/', '#|#', $row);
array_push($books["name"], strtok($row, '#|#'));
$row = strtok('#|#');
$i = preg_match('/^[\w]+,[\s][\w]+/', $row, $author);
if (count($author)==0)
$i = preg_match('/^[\w]+[\s]/', $row, $author);
array_push($books["author"], $author[0]);
array_push($books["grade"], trim(str_replace($author[0], "", $row)));
}
unset($rows);

echo "<pre>";
print_r($books);
echo "</pre>";

?>[/php]
May 25 '08 #7
hsriat
1,654 Expert 1GB
Output of above is..
Expand|Select|Wrap|Line Numbers
  1. Array
  2. (
  3.     [name] => Array
  4.         (
  5.             [0] => A bad case of stripes
  6.             [1] => A bad spell for the worst witch
  7.             [2] => A bad week for the three bears
  8.         )
  9.  
  10.     [author] => Array
  11.         (
  12.             [0] => Shannon, David
  13.             [1] => Murphy, Jill
  14.             [2] => Bradman, Tony
  15.         )
  16.  
  17.     [grade] => Array
  18.         (
  19.             [0] => 1-2
  20.             [1] => 3-4
  21.             [2] => 1-2
  22.         )
  23.  
  24. )
May 25 '08 #8
beary
170 100+
THANKYOU hsriat!!!!! You have no idea how much this has helped. I had no idea php could do this so simply. I don't understand preg_match etc, but I better look into it as it seems pretty powerful. I made a small mod to the code so it now looks like...
Expand|Select|Wrap|Line Numbers
  1.       $rows = explode("\n", $text_content);
  2.       $books = array("name"=>array(), "author"=>array(), "grade"=>array());
  3.       foreach ($rows as $row)
  4.       {
  5.             array_push($books["grade"],substr($row,-4));
  6.     $row=str_replace('substr($row,-4)','',$row);
  7.           $row = preg_replace('/\s\s+/', '#|#', $row);
  8.           array_push($books["name"], strtok($row, '#|#'));
  9.     $row=str_replace("strtok($row, '#|#')","",$row);
  10.           $row = strtok('#|#');
  11.           array_push($books["author"], $row);
  12.       }
  13.      unset($rows);
  14.  
It produces exactly what I needed: arrays with the matching book titles, authors and grades.

Again, thanks heaps for posting your solution. Very very much appreciated!
May 25 '08 #9
hsriat
1,654 Expert 1GB
This is better, as you won't need to match the author's name. But I could not guess exact form of grading, so gave you that solution.

But one change you can do, don't do subsrt and strtok twice. Do it once and save it as a variable. Use the variable next time. Would be a little bit faster.

You could also do preg_split'ing instead of preg_replace'ing and then strtok'ing.

Too glad to know it really helped you...



Regards,
Harpreet
May 25 '08 #10
hsriat
1,654 Expert 1GB
Again, thanks heaps for posting your solution. Very very much appreciated!
heaps ... what does that mean?...

I thought heap means pile or something like that...

must be a new slang... never heard that....
May 25 '08 #11
beary
170 100+
heaps ... what does that mean?...

I thought heap means pile or something like that...

must be a new slang... never heard that....
Well in land down under it means lots and lots and lots!So like I said: Thanks heaps!
May 25 '08 #12

Sign in to post your reply or Sign up for a free account.

Similar topics

19
by: Lauren Quantrell | last post by:
I have a stored procedure using Convert where the exact same Convert string works in the SELECT portion of the procedure but fails in the WHERE portion. The entire SP is listed below....
1
by: Logan X via .NET 247 | last post by:
It's official....Convert blows. I ran a number of tests converting a double to an integer usingboth Convert & CType. I *ASSUMED* that CType would piggy-back ontop of Convert, and that performance...
4
by: Eric Lilja | last post by:
Hello, I've made a templated class Option (a child of the abstract base class OptionBase) that stores an option name (in the form someoption=) and the value belonging to that option. The value is...
1
by: johnlim20088 | last post by:
Hi, Currently I have 6 web projects located in Visual Source Safe 6.0, as usual, everytime I will open solution file located in my local computer, connected to source safe, then check out/check in...
0
Debadatta Mishra
by: Debadatta Mishra | last post by:
Introduction In this article I will provide you an approach to manipulate an image file. This article gives you an insight into some tricks in java so that you can conceal sensitive information...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.