473,395 Members | 1,571 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

PDF, DOC, RTF File parser

I need to read resumes from PDF, DOC, RTF and text file and fill in
the relevent fields in database.
My application is based on dotnetnuke (asp.net)
can anyone help me if something is available.
Jul 19 '05 #1
3 7431
What exactly are you trying to do?

If you just want to convert these different file formats into plain-text
files that can be manipulated, that's possible (but complicated). The DOC
format is proprietary, so you'd have to programatically open the document in
Word and either copy the document's text into the clipboard, or
programatically do a "Save As" to a plain text file. You can do this using
automation (VBA), but you'll have to have Word running on the server. You
can convert an RTF file by opening the file in an RTF control and then
retrieving the plain text from that box. I'm not sure about PDF, but I
believe there are third-party components available for translating PDF
files.

If you want to have the program automagically interpret the relevant
information and fill it into the correct database field without human
intervention, good luck -- computers just aren't very good at parsing
natural languages. Resumes will be particularly hard to parse because the
information may be structued in any number of ways and they tend to be
written in short sentence fragments. If you really want to try, do some
research on context-free (CF) parsers. Two good, recent textbooks on the
subject are Jurafsky & Martin, "Speech & Language Processing," and Allen,
"Natural Language Understanding." (Both available from Amazon.com.)

A much, much better alternative would be to ask people to submit their
resume information through a structured format -- such as by filling in
fields on a Web form. Or hiring clerical help to take regular resumes and
copy/paste the information into the database.

--Robert Jaccobson

"Harry" <ha*****@yahoo.co.uk> wrote in message
news:f5**************************@posting.google.c om...
I need to read resumes from PDF, DOC, RTF and text file and fill in
the relevent fields in database.
My application is based on dotnetnuke (asp.net)
can anyone help me if something is available.

Jul 19 '05 #2
Robert,
thanks for detailed reply.
I am looking for second one --context-free (CF) parsers -- if not then
we can go for structured format.
is there any thirdparty parser available for resume which I can use in
asp.net application.
Jul 19 '05 #3
I'm not aware of any such parsers, so you'll have to roll your own. Let me
reemphasize, though, that I think doing so would be a waste of effort --
parsers are not very capable at parsing English documents, especially
specialized documents like resumes.
"Harry" <ha*****@yahoo.co.uk> wrote in message
news:f5**************************@posting.google.c om...
Robert,
thanks for detailed reply.
I am looking for second one --context-free (CF) parsers -- if not then
we can go for structured format.
is there any thirdparty parser available for resume which I can use in
asp.net application.

Jul 19 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Dale | last post by:
How to recognize whether file has XML format or not? Here is the code segment: XmlDocument* pDomDocument = new XmlDocument(); try { pDomDocument->Load(strFileName ) ; } catch(Exception* e) {
17
by: News | last post by:
Hi everyone, My goal is to pull command switches/options from a file and then assign the values to select variables which would eventually be included in a class object. The data file looks...
4
by: News | last post by:
Hi Everyone, The attached code creates client connections to websphere queue managers and then processes an inquiry against them. The program functions when it gets options from the command...
8
by: Andrew Robert | last post by:
Hi Everyone. I tried the following to get input into optionparser from either a file or command line. The code below detects the passed file argument and prints the file contents but the...
7
by: christian.eickhoff | last post by:
Hi Everyone, I am currently implementing an XercesDOMParser to parse an XML file and to validate this file against its XSD Schema file which are both located on my local HD drive. For this...
3
by: Carroll, Barry | last post by:
Greetings: Please forgive me if this is the wrong place for this post. I couldn't find a more acceptable forum. If there is one, please point me in the right direction. I am part of a small...
2
by: sherihan2007 | last post by:
Hi while am running perl script which parses an XML file in AIX following error is getting:(i have given use XML::parser in the script) Can't load...
3
by: jinendrashankar | last post by:
i am getting following error in my code help me to slove this issue $ gcc -Wall -g -I/usr/include/libxml2/libxml -c create_xml.c In file included from create_xml.c:2:...
1
by: reddyth | last post by:
Dear All, I wanted to parse an XML file and print the element's content. I have the following code for the same. I have printed the ourput too. The problem is it is printing unwanted spaces and...
5
by: Luis Zarrabeitia | last post by:
I need to parse a file, text file. The format is something like that: TYPE1 metadata data line 1 data line 2 .... data line N TYPE2 metadata data line 1 ....
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.