470,648 Members | 1,607 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,648 developers. It's quick & easy.

best way to index numerical data ?

Hi I have a lot of data that is in a TEXT file which are numbers does
anyone have a good suggestion for indexing TEXT numbers (zip codes,
other codes, dollar amounts, quantities, etc). since Lucene and other
indexers are really optimized for Alpha character indexing. What
approaches are typically taken in computer science for example to index
text numbers..hash maps or something else ??

Thanks,

Jack

Mar 31 '06 #1
4 1022
What do you want to search for in the file?
how big is the file?
What format is the data in the file?

- Paddy.

Mar 31 '06 #2
here is a sample of a .txt file :
I want to search for the whole number. If possible, fuzzy search would
be nice too, but not mandatory..
1975|Y|35136|72|1927|||3|005503|003|19870301|19950 301|14416887|151|20000301|100039292|N|84|F|50||10| A|100|Y|037|Y|89005|3042|M|S|P|

Thanks!
Jack

Mar 31 '06 #3
here is a sample of a .txt file :
I want to search for the whole number. If possible, fuzzy search would
be nice too, but not mandatory..
1975|Y|35136|72|1927|||3|005503|003|19870301|19950 301|14416887|151|20000301|100039292|N|84|F|50||10| A|100|Y|037|Y|89005|3042|M|S|P|

Thanks!
Jack

Mar 31 '06 #4
>>>>> "Jack" == Jack <ja***********@yahoo.com> writes:
Hi I have a lot of data that is in a TEXT file which are numbers
does anyone have a good suggestion for indexing TEXT numbers
(zip codes, other codes, dollar amounts, quantities, etc). since
Lucene and other indexers are really optimized for Alpha
character indexing. What approaches are typically taken in
computer science for example to index text numbers..hash maps or
something else ??


Lucene is not optimized for Alpha character indexing. It's for natural
language indexing. The assumption is that the dictionary is relatively
small (say, <1M words for English), and doesn't grow linearly with the
amount of text being indexed. If your data fits into this model,
Lucene can effeciently index it, no matter what the characters are.

Regards,
Liu Jin
Apr 1 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

14 posts views Thread by 2mc | last post: by
5 posts views Thread by Daniel Pryde | last post: by
2 posts views Thread by Martin v. LŲwis | last post: by
1 post views Thread by Jack | last post: by
1 post views Thread by Korara | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.