473,386 Members | 1,827 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Fast search for all positions in a string

Hi,
what is the fastest way to search for all occurences of a string in a
very long string (from several KB to MB). What I want to get is an
array of numbers with positions of each occurence.
thanks
Jun 27 '08 #1
5 2211
Also I would like to mention that I need case insensitive search.
Jun 27 '08 #2
czechboy wrote:
what is the fastest way to search for all occurences of a string in a
very long string (from several KB to MB). What I want to get is an
array of numbers with positions of each occurence.
var str = 'fdda54kokDdggfA54lf8754a544ro9a54ypx9Wa54z'
var mat = 'a54'
var arr = str.split(mat)
var res = new Array()
var r = 0
for (var i=0; i<arr.length-1; ++i) {
res[i] = 1 + r + arr[i].length
r += arr[i].length + mat.length
}
alert(res)

Takes ca. 1 second runtime on my Firefox 2.0.0.14. for a 1MB string.
MSIE is slower as expected. I don't see relevant possible performance
boosts unless finding a way to not loop the array (which I don't think
is possible here). Generally I would recommend to avoid such heavy
strings.

The search is case-sensitive.

Hope this helps,

--
Bart
Jun 27 '08 #3
On Jun 4, 11:58*am, czechboy <oldrich.s...@gmail.comwrote:
Also I would like to mention that I need case insensitive search.
<html><head><style>text { font-size: 16px; }</style>
<script>
window.onload= function () {
var txt= "", b= [], i, d= document, t,
y= function (p) { return d.createElement(p) },
getMS= function (p) { return (new Date()).getTime()-p };

txt+= " Lorem ipsum dolor sit amet, consectetuer adipiscing.";
txt+= " Nullam consequat lectus sit amet enim. Proin , ipsum";
txt+= " eu tincidunt iaculis, massa enim quam, at molestie ";
txt+= "enim tellus ac dolor. Aenean tempor tortor. ";
txt+= "Vestibulum sed sem. Curabitur sed erat a nibh tristique.";
for (i=0; i<6; i++) { txt+= txt+txt+txt+txt }

t= "txt.length= " + txt.length;
t+= " : \""+txt.substr(0,30)+"...\"<br><br>";
(d.body.appendChild(y('text'))).innerHTML= t;

//Replace this f() with others' versions to compare them.
function search (whatTxt, inTxt) {
var l, p, a= [], i= 0;
inTxt= inTxt.toLowerCase();
l= (whatTxt= whatTxt.toLowerCase()).length;
while ((p= inTxt.indexOf(whatTxt,i)) >= 0) {
a.push(p);
i= p+l;
}
return a;
}

b= ["lorem","IPSUM","veni","iMac","CoNsEqUaT",".", " ","r","iN"];
b.push(" LoReM iPsUm DoLoR ");
setTimeout(function () {
var t, a, mS, c= b.shift();
mS= getMS(0);
a= search(c,txt);
mS= getMS(mS);
t= "\""+c+"\" : "+a.length+" : "+mS+" mS<br>";
(d.body.appendChild(y('text'))).innerHTML= t;
if (b.length) { setTimeout(arguments.callee, 111) } else {
(d.body.appendChild(y('text'))).innerHTML= "Done.";
}
}, 1);
};
</script></head><body></body></html>

I just turn it all to lowercase before matching,
so it's case-insensitive. But that's not very smart : there
has to be a better way. (RegExps /i ?).

Replace search () with other versions to compare.

Gives me this results in Safari/Mac :
The test string is ~4Mb long.

txt.length= 4031250 : " Lorem ipsum dolor sit amet, c..."

"lorem" : 15625 : 121 mS
"IPSUM" : 31250 : 144 mS
"veni" : 0 : 128 mS
"iMac" : 0 : 125 mS
"CoNsEqUaT" : 15625 : 130 mS
"." : 93750 : 173 mS
" " : 625000 : 648 mS
"r" : 187500 : 258 mS
"iN" : 46875 : 156 mS
" LoReM iPsUm DoLoR " : 15625 : 146 mS
Done.

And this in IE8b/WinXP :

txt.length= 4031250 : " Lorem ipsum dolor sit amet, c..."

"lorem" : 15625 : 190 mS
"IPSUM" : 31250 : 310 mS
"veni" : 0 : 140 mS
"iMac" : 0 : 150 mS
"CoNsEqUaT" : 15625 : 230 mS
"." : 93750 : 491 mS
" " : 625000 : 2804 mS
"r" : 187500 : 530 mS
"iN" : 46875 : 331 mS
" LoReM iPsUm DoLoR " : 15625 : 250 mS
Done.

HTH,
--Jorge.
Jun 27 '08 #4
On Jun 4, 5:09*pm, Bart Van der Donck <b...@nijlen.comwrote:
>
* *var str = 'fdda54kokDdggfA54lf8754a544ro9a54ypx9Wa54z'
* *var mat = 'a54'
* *var arr = str.split(mat)
* *var res = new Array()
* *var r = 0
* *for (var i=0; i<arr.length-1; ++i) *{
* * *res[i] = 1 + r + arr[i].length
* * *r += arr[i].length + mat.length
* *}
* *alert(res)
Pasted in as :

//Replace this f() with others' versions to compare them.
function search (whatTxt, inTxt) {
var i, arr= inTxt.split(whatTxt),
res= [], r= 0;

for (i=0; i<arr.length-1; ++i) {
res[i]= 1 + r + arr[i].length;
r+= arr[i].length + whatTxt.length;
}

return res;
}

Gives :

Safari/Mac :
txt.length= 4031250 : " Lorem ipsum dolor sit amet, c..."

"Lorem" : 15625 : 21 mS
"ipsum" : 31250 : 52 mS
"veni" : 0 : 10 mS
"iMac" : 0 : 19 mS
"consequat" : 15625 : 39 mS
"." : 93750 : 109 mS
" " : 625000 : 706 mS
"r" : 187500 : 278 mS
"in" : 46875 : 85 mS
" Lorem ipsum dolor " : 15625 : 49 mS
Done.

IE8b/WinXP :

txt.length= 4031250 : " Lorem ipsum dolor sit amet, c..."

"Lorem" : 15625 : 150 mS
"ipsum" : 31250 : 281 mS
"veni" : 0 : 90 mS
"iMac" : 0 : 100 mS
"consequat" : 15625 : 171 mS
"." : 93750 : 601 mS
" " : 625000 : 3695 mS
"r" : 187500 : 1112 mS
"in" : 46875 : 190 mS
" Lorem ipsum dolor " : 15625 : 180 mS
Done.

You win... :-)

Although that's case-sensitive.

--Jorge.
Jun 27 '08 #5
In comp.lang.javascript message <ad6736ef-bbc7-4e37-a70e-2e65c3f286f0@r6
6g2000hsg.googlegroups.com>, Wed, 4 Jun 2008 02:53:29, czechboy
<ol**********@gmail.composted:
>what is the fastest way to search for all occurences of a string in a
very long string (from several KB to MB). What I want to get is an
array of numbers with positions of each occurence.
This could perhaps be developed into a clean method :-

S = "0aaaa1bbbb2222ccccccc3dd"
A = []
J = 10 // safety-belt
R = new RegExp("\\d+", "gi")

while (J--) {
M = R.exec(S)
if (!M) break
A.push(RegExp.lastIndex-RegExp.lastMatch.length)
}
A // Result - [0,5,10,21]

It's a good idea to read the newsgroup c.l.j and its FAQ. See below.

--
(c) John Stockton, nr London UK. ?@merlyn.demon.co.uk IE7 FF2 Op9 Sf3
news:comp.lang.javascript FAQ <URL:http://www.jibbering.com/faq/index.html>.
<URL:http://www.merlyn.demon.co.uk/js-index.htmjscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/TP/BP/Delphi/jscr/&c, FAQ items, links.
Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Andre | last post by:
Hello, I am working on refactoring my companies search engine, and a critical component of the search is that the results be displayed very quickly. We have done enourmous work on making the...
15
by: SK | last post by:
Hey folks, I am searching for a string (say "ABC") backwards in a file. First I seek to the end. Then I try to make a check like - do { file.clear (); file.get(c); file.seekg(-2,...
4
by: Thomas Christmann | last post by:
Hi! First let me apologize for asking this question when there are so many answers to it on Google, but most of them are really contradicting, and making what I want to do very performant is...
19
by: Dave | last post by:
I'm building a research application that needs to be a super speed demon. I decided that one way to do this is to use goto loops instead of while() loops when I need them. that way, instead of...
6
by: DC | last post by:
Hi, I am programming a search catalogue with 200000 items (and growing). I am currently using the SQL Server 2000 fulltext engine for this task but it does not fit the requirements anymore. ...
6
by: G.Esmeijer | last post by:
Friends, I would like to read a text file (fixed length formaated) really fast and store the data into an Access database (2003). Using the streamreader and reading line by line, separating the...
4
by: Anon | last post by:
Hello All! I have a long string that I need to make sense out of. I have the documentation about what info is between which characters, I just need to somehow parse each 94 character string into...
6
by: aka_eu | last post by:
I have this problem. I'm trying to get all the valid combination C(N,K) that pass one condition. For example C(5,3) can use in any combination this numbers {1,2,3,4,5} But let's say that I...
4
by: Daryl Lee | last post by:
I am trying to locate all lines in a suite of files with quoted strings of particular lengths. A search pattern like r'".{15}"' finds 15-character strings very nicely. But I have some very long...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.