By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,264 Members | 1,056 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,264 IT Pros & Developers. It's quick & easy.

PDF, DOC, RTF File parser

P: n/a
I need to read resumes from PDF, DOC, RTF and text file and fill in
the relevent fields in database.
My application is based on dotnetnuke (asp.net)
can anyone help me if something is available.
Jul 19 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
What exactly are you trying to do?

If you just want to convert these different file formats into plain-text
files that can be manipulated, that's possible (but complicated). The DOC
format is proprietary, so you'd have to programatically open the document in
Word and either copy the document's text into the clipboard, or
programatically do a "Save As" to a plain text file. You can do this using
automation (VBA), but you'll have to have Word running on the server. You
can convert an RTF file by opening the file in an RTF control and then
retrieving the plain text from that box. I'm not sure about PDF, but I
believe there are third-party components available for translating PDF
files.

If you want to have the program automagically interpret the relevant
information and fill it into the correct database field without human
intervention, good luck -- computers just aren't very good at parsing
natural languages. Resumes will be particularly hard to parse because the
information may be structued in any number of ways and they tend to be
written in short sentence fragments. If you really want to try, do some
research on context-free (CF) parsers. Two good, recent textbooks on the
subject are Jurafsky & Martin, "Speech & Language Processing," and Allen,
"Natural Language Understanding." (Both available from Amazon.com.)

A much, much better alternative would be to ask people to submit their
resume information through a structured format -- such as by filling in
fields on a Web form. Or hiring clerical help to take regular resumes and
copy/paste the information into the database.

--Robert Jaccobson

"Harry" <ha*****@yahoo.co.uk> wrote in message
news:f5**************************@posting.google.c om...
I need to read resumes from PDF, DOC, RTF and text file and fill in
the relevent fields in database.
My application is based on dotnetnuke (asp.net)
can anyone help me if something is available.

Jul 19 '05 #2

P: n/a
Robert,
thanks for detailed reply.
I am looking for second one --context-free (CF) parsers -- if not then
we can go for structured format.
is there any thirdparty parser available for resume which I can use in
asp.net application.
Jul 19 '05 #3

P: n/a
I'm not aware of any such parsers, so you'll have to roll your own. Let me
reemphasize, though, that I think doing so would be a waste of effort --
parsers are not very capable at parsing English documents, especially
specialized documents like resumes.
"Harry" <ha*****@yahoo.co.uk> wrote in message
news:f5**************************@posting.google.c om...
Robert,
thanks for detailed reply.
I am looking for second one --context-free (CF) parsers -- if not then
we can go for structured format.
is there any thirdparty parser available for resume which I can use in
asp.net application.

Jul 19 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.