473,748 Members | 9,416 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

pull data from a pdf file to store in sql

8 New Member
Is it possible to pull data( text contents and the file attribues, like filename ) from a pdf file and store in sql?
..using c#

I have web app with 100+ pdf files that I need keyword search capability for. It would produce results with link(s) to the corresponding pdf file. Not sure if this is possible.

thanks!
Dec 20 '07 #1
6 4415
Shashi Sadasivan
1,435 Recognized Expert Top Contributor
it will be possible.
Though do you also want to read text contained inside the PDF file?

You would have to create a seperate program (could be console or windows application based, or Asp .Net also if you want) use the DirectoryInfo Class and fetch all the files contained in the directory using FileInfo.
Once you have all the files you wamt, you can insert them into your datatbase table
Dec 20 '07 #2
kthequeen
8 New Member
it will be possible.
Though do you also want to read text contained inside the PDF file?
Yes I would like to read the actual text inside the pdf. I found some info on how to convert to a text file. Perhaps I can do that and import to sql. I would just need a filename column that corresponds to the exported (text) from the pdf. What do you think?
Dec 20 '07 #3
Shashi Sadasivan
1,435 Recognized Expert Top Contributor
Hi,
Since you have a lot of PDF files, and there would be significant amount of text in it, i think that storing all the text in the database, and searching for text within that will take a lot of time.

Have you looked any of the desktop search API's ?

Google provides one, but I havent looked into it, and am not sure on how you would integrate, but it would be a easier way out (You would have to keep all the PDF files within the same folder, or atleast should be within the same root path.
Dec 20 '07 #4
kthequeen
8 New Member
hi Shashi, thank you for the replies. I'll take a look at those APIs.
merry Christmas!
Dec 21 '07 #5
diegomaradona21
4 New Member
Is it possible to pull data( text contents and the file attribues, like filename ) from a pdf file and store in sql?
..using c#

I have web app with 100+ pdf files that I need keyword search capability for. It would produce results with link(s) to the corresponding pdf file. Not sure if this is possible.

thanks!
you can easily get text from PDF files using PDFBox library.
use google to find out how to use it in .NET2.0 because natively it's Java library.
you will also need IKVM.GNU

try this
how to use pdfbox with c#
Dec 21 '07 #6
kthequeen
8 New Member
Diego thank you very much! very helpful.
Dec 21 '07 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

0
1081
by: Michael Probst | last post by:
Hi all, I am new to .NET and the way XML data is handled in .NET I wrote a small application with .NET forms in C++ The application reads data from an XML file to fill-in the fields of the form. This works fine but when I try to add new data to the XML file it does not comply with the XML schema file I am using.
14
5848
by: Luiz Antonio Gomes Pican?o | last post by:
How i can store a variable length data in file ? I want to do it using pure C, without existing databases. I'm thinking to use pages to store data. Anyone has idea for the file format ? I want to store data like a database: ---------------------------------- Custumer:
0
2217
by: Harley | last post by:
I am trying to write a personal app to keep a bank balance and history. The problem I'm haveing is finding a decent way to store the data on a pocketpc under .net compact framewok useing vb.net. I need to store type, date, description, and total for each check/debit transaction as well as a balance and user settings. My problem is that I am learning vb.net now and I'm not ready to move into datbases yet so xml seems my only viable option,...
0
12086
by: sonu | last post by:
I have following client side code which i have used in my asp.net project SummaryFeatured Resources from the IBM Business Values Solution Center WHITEPAPER : CRM Done Right Improve the likelihood of CRM success from less than 20 percent to 60 percent. WHITEPAPER :
11
3438
by: mesut demir | last post by:
Hi All, When I create fields (in files) I need assign a data type like char, varchar, money etc. I have some questions about the data types when you create fields in a file. What is the difference between data type 'CHAR' and 'TEXT'? When do you use 'VAR' in your datatype word? e.g. VARCHAR ?
5
1768
by: The Cool Geek | last post by:
I'm building a dynamic site that has 3 data bases. One DB contains all of my store info ID#, Name, Address, Phone. Another DB contains member info ID, Name, address, email,phone,etc... The 3rd DB Tracks when a user logs in and logs out at a store. This DB has the following columns Store ID, Member ID, log in time,log out time, time in store. The stores log in in the morning that starts a session-I need the session to pull the store's...
0
1150
by: kbutterly | last post by:
Good afternoon, all! Our security standards require all passwords to be changed regularly. Since there is a password in the web.config file for the connnection string, the following question has been raised: Can we store the password in a database table somewhere and have the web.config pull it in from there? The thinking is that this will avoid changing production code once it
1
5658
by: katy07 | last post by:
Hello! I'm hoping someone might be able to help me. I am writing a program that pulls info from a csv file and imports into an oracle table. Right now all I'm trying to do is connect to the csv file (it's on my computer) so that I can pull the data out. But no matter what I try I get an error. Can u please tell me what I'm doing wrong? Everything commented out is code I've tried and hasn't worked. private static void PullRecords (string...
6
2290
by: RoomfulExpress | last post by:
Here's the problem that I'm having- I'm trying to pull in 2 fields from my database and place them in the title of the HTML. I'm connecting to the db and selecting everything exactly the same as I am doing below, and it works fine. For some reason it's not pulling in the fields. Any ideas? Here's the link to the actual page I'm working on. http://www.roomfulexpress.com/newsite/php/familyprofile.php?FAMILY_CD=558167959 Please see below...
0
9363
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9312
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9238
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8237
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6793
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6073
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3300
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2775
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2206
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.