473,396 Members | 1,775 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

pull data from a pdf file to store in sql

Is it possible to pull data( text contents and the file attribues, like filename ) from a pdf file and store in sql?
..using c#

I have web app with 100+ pdf files that I need keyword search capability for. It would produce results with link(s) to the corresponding pdf file. Not sure if this is possible.

thanks!
Dec 20 '07 #1
6 4390
Shashi Sadasivan
1,435 Expert 1GB
it will be possible.
Though do you also want to read text contained inside the PDF file?

You would have to create a seperate program (could be console or windows application based, or Asp .Net also if you want) use the DirectoryInfo Class and fetch all the files contained in the directory using FileInfo.
Once you have all the files you wamt, you can insert them into your datatbase table
Dec 20 '07 #2
it will be possible.
Though do you also want to read text contained inside the PDF file?
Yes I would like to read the actual text inside the pdf. I found some info on how to convert to a text file. Perhaps I can do that and import to sql. I would just need a filename column that corresponds to the exported (text) from the pdf. What do you think?
Dec 20 '07 #3
Shashi Sadasivan
1,435 Expert 1GB
Hi,
Since you have a lot of PDF files, and there would be significant amount of text in it, i think that storing all the text in the database, and searching for text within that will take a lot of time.

Have you looked any of the desktop search API's ?

Google provides one, but I havent looked into it, and am not sure on how you would integrate, but it would be a easier way out (You would have to keep all the PDF files within the same folder, or atleast should be within the same root path.
Dec 20 '07 #4
hi Shashi, thank you for the replies. I'll take a look at those APIs.
merry Christmas!
Dec 21 '07 #5
Is it possible to pull data( text contents and the file attribues, like filename ) from a pdf file and store in sql?
..using c#

I have web app with 100+ pdf files that I need keyword search capability for. It would produce results with link(s) to the corresponding pdf file. Not sure if this is possible.

thanks!
you can easily get text from PDF files using PDFBox library.
use google to find out how to use it in .NET2.0 because natively it's Java library.
you will also need IKVM.GNU

try this
how to use pdfbox with c#
Dec 21 '07 #6
Diego thank you very much! very helpful.
Dec 21 '07 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

0
by: Michael Probst | last post by:
Hi all, I am new to .NET and the way XML data is handled in .NET I wrote a small application with .NET forms in C++ The application reads data from an XML file to fill-in the fields of the...
14
by: Luiz Antonio Gomes Pican?o | last post by:
How i can store a variable length data in file ? I want to do it using pure C, without existing databases. I'm thinking to use pages to store data. Anyone has idea for the file format ? I...
0
by: Harley | last post by:
I am trying to write a personal app to keep a bank balance and history. The problem I'm haveing is finding a decent way to store the data on a pocketpc under .net compact framewok useing vb.net....
0
by: sonu | last post by:
I have following client side code which i have used in my asp.net project SummaryFeatured Resources from the IBM Business Values Solution Center WHITEPAPER : CRM Done Right Improve the...
11
by: mesut demir | last post by:
Hi All, When I create fields (in files) I need assign a data type like char, varchar, money etc. I have some questions about the data types when you create fields in a file. What is the...
5
by: The Cool Geek | last post by:
I'm building a dynamic site that has 3 data bases. One DB contains all of my store info ID#, Name, Address, Phone. Another DB contains member info ID, Name, address, email,phone,etc... The 3rd DB...
0
by: kbutterly | last post by:
Good afternoon, all! Our security standards require all passwords to be changed regularly. Since there is a password in the web.config file for the connnection string, the following question has...
1
by: katy07 | last post by:
Hello! I'm hoping someone might be able to help me. I am writing a program that pulls info from a csv file and imports into an oracle table. Right now all I'm trying to do is connect to the csv...
6
by: RoomfulExpress | last post by:
Here's the problem that I'm having- I'm trying to pull in 2 fields from my database and place them in the title of the HTML. I'm connecting to the db and selecting everything exactly the same as I am...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.