473,500 Members | 1,963 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

PDF, DOC, RTF File parser

I need to read resumes from PDF, DOC, RTF and text file and fill in
the relevent fields in database.
My application is based on dotnetnuke (asp.net)
can anyone help me if something is available.
Jul 19 '05 #1
3 7435
What exactly are you trying to do?

If you just want to convert these different file formats into plain-text
files that can be manipulated, that's possible (but complicated). The DOC
format is proprietary, so you'd have to programatically open the document in
Word and either copy the document's text into the clipboard, or
programatically do a "Save As" to a plain text file. You can do this using
automation (VBA), but you'll have to have Word running on the server. You
can convert an RTF file by opening the file in an RTF control and then
retrieving the plain text from that box. I'm not sure about PDF, but I
believe there are third-party components available for translating PDF
files.

If you want to have the program automagically interpret the relevant
information and fill it into the correct database field without human
intervention, good luck -- computers just aren't very good at parsing
natural languages. Resumes will be particularly hard to parse because the
information may be structued in any number of ways and they tend to be
written in short sentence fragments. If you really want to try, do some
research on context-free (CF) parsers. Two good, recent textbooks on the
subject are Jurafsky & Martin, "Speech & Language Processing," and Allen,
"Natural Language Understanding." (Both available from Amazon.com.)

A much, much better alternative would be to ask people to submit their
resume information through a structured format -- such as by filling in
fields on a Web form. Or hiring clerical help to take regular resumes and
copy/paste the information into the database.

--Robert Jaccobson

"Harry" <ha*****@yahoo.co.uk> wrote in message
news:f5**************************@posting.google.c om...
I need to read resumes from PDF, DOC, RTF and text file and fill in
the relevent fields in database.
My application is based on dotnetnuke (asp.net)
can anyone help me if something is available.

Jul 19 '05 #2
Robert,
thanks for detailed reply.
I am looking for second one --context-free (CF) parsers -- if not then
we can go for structured format.
is there any thirdparty parser available for resume which I can use in
asp.net application.
Jul 19 '05 #3
I'm not aware of any such parsers, so you'll have to roll your own. Let me
reemphasize, though, that I think doing so would be a waste of effort --
parsers are not very capable at parsing English documents, especially
specialized documents like resumes.
"Harry" <ha*****@yahoo.co.uk> wrote in message
news:f5**************************@posting.google.c om...
Robert,
thanks for detailed reply.
I am looking for second one --context-free (CF) parsers -- if not then
we can go for structured format.
is there any thirdparty parser available for resume which I can use in
asp.net application.

Jul 19 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
4291
by: Dale | last post by:
How to recognize whether file has XML format or not? Here is the code segment: XmlDocument* pDomDocument = new XmlDocument(); try { pDomDocument->Load(strFileName ) ; } catch(Exception* e) {
17
1964
by: News | last post by:
Hi everyone, My goal is to pull command switches/options from a file and then assign the values to select variables which would eventually be included in a class object. The data file looks...
4
2359
by: News | last post by:
Hi Everyone, The attached code creates client connections to websphere queue managers and then processes an inquiry against them. The program functions when it gets options from the command...
8
3488
by: Andrew Robert | last post by:
Hi Everyone. I tried the following to get input into optionparser from either a file or command line. The code below detects the passed file argument and prints the file contents but the...
7
10255
by: christian.eickhoff | last post by:
Hi Everyone, I am currently implementing an XercesDOMParser to parse an XML file and to validate this file against its XSD Schema file which are both located on my local HD drive. For this...
3
1477
by: Carroll, Barry | last post by:
Greetings: Please forgive me if this is the wrong place for this post. I couldn't find a more acceptable forum. If there is one, please point me in the right direction. I am part of a small...
2
9738
by: sherihan2007 | last post by:
Hi while am running perl script which parses an XML file in AIX following error is getting:(i have given use XML::parser in the script) Can't load...
3
3530
by: jinendrashankar | last post by:
i am getting following error in my code help me to slove this issue $ gcc -Wall -g -I/usr/include/libxml2/libxml -c create_xml.c In file included from create_xml.c:2:...
1
1365
by: reddyth | last post by:
Dear All, I wanted to parse an XML file and print the element's content. I have the following code for the same. I have printed the ourput too. The problem is it is printing unwanted spaces and...
5
1520
by: Luis Zarrabeitia | last post by:
I need to parse a file, text file. The format is something like that: TYPE1 metadata data line 1 data line 2 .... data line N TYPE2 metadata data line 1 ....
0
7182
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7232
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6906
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5490
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
4923
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4611
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3106
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1430
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
316
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.