Connecting Tech Pros Worldwide Forums | Help | Site Map

Need parser for PDF and DOC files

Sergey Sedzyalo
Guest
 
Posts: n/a
#1: Jul 17 '05
Hello All
I do implementation of search engine. And I need indexing PDF & DOC files.
Somebody known solutions for parse this files formats?




Raymond DeCampo
Guest
 
Posts: n/a
#2: Jul 17 '05

re: Need parser for PDF and DOC files


Sergey Sedzyalo wrote:[color=blue]
> Hello All
> I do implementation of search engine. And I need indexing PDF & DOC files.
> Somebody known solutions for parse this files formats?
>
>
>[/color]
Sergey,

What you require is a full text search engine. There are many such
implementations available. Apache Lucene is the only free one I am aware
of, but I do not think they have (native) MS Word or PDF support (i.e.,
you have to write a parser for it and it will do the indexing). But
there are commercial alternatives as well.

Ray
Closed Thread