Connecting Tech Pros Worldwide Help | Site Map

Web site spider

ruso
Guest
 
Posts: n/a
#1: Nov 16 '05
i am writing a program web site spider i am getting all pages of a
site to local
after that what i want to do is that i have about 5000 keywords which
i want to search them in the website which download from a site.I am
doing this search by regexpression but it is slow is there any faster
search algorithm to suggest me.

Thanks
Nicholas Paldino [.NET/C# MVP]
Guest
 
Posts: n/a
#2: Nov 16 '05

re: Web site spider


Ruso,

A regex is probably the fastest way. How large are the files, and are
you passing them as complete strings through the RegEx classes? Is there
any way you can break them up into smaller pieces?

The Match method only takes a string, so its probably the loading of all
the content into the string which is causing slowdown (all of the strings
contents loaded into memory).

If you can break the files into smaller pieces, then it would help, as
you wouldn't have to load such large strings into memory.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- mvp@spam.guard.caspershouse.com

"ruso" <rustem40@yahoo.com> wrote in message
news:5bf8bf9b.0408190919.2f7784cd@posting.google.c om...[color=blue]
>i am writing a program web site spider i am getting all pages of a
> site to local
> after that what i want to do is that i have about 5000 keywords which
> i want to search them in the website which download from a site.I am
> doing this search by regexpression but it is slow is there any faster
> search algorithm to suggest me.
>
> Thanks[/color]


ruso
Guest
 
Posts: n/a
#3: Nov 16 '05

re: Web site spider


how can i take the strings into memory

"Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard.caspershouse.com> wrote in message news:<ezlAGjhhEHA.3928@TK2MSFTNGP11.phx.gbl>...[color=blue]
> Ruso,
>
> A regex is probably the fastest way. How large are the files, and are
> you passing them as complete strings through the RegEx classes? Is there
> any way you can break them up into smaller pieces?
>
> The Match method only takes a string, so its probably the loading of all
> the content into the string which is causing slowdown (all of the strings
> contents loaded into memory).
>
> If you can break the files into smaller pieces, then it would help, as
> you wouldn't have to load such large strings into memory.
>
> Hope this helps.
>
> --
> - Nicholas Paldino [.NET/C# MVP]
> - mvp@spam.guard.caspershouse.com
>
> "ruso" <rustem40@yahoo.com> wrote in message
> news:5bf8bf9b.0408190919.2f7784cd@posting.google.c om...[color=green]
> >i am writing a program web site spider i am getting all pages of a
> > site to local
> > after that what i want to do is that i have about 5000 keywords which
> > i want to search them in the website which download from a site.I am
> > doing this search by regexpression but it is slow is there any faster
> > search algorithm to suggest me.
> >
> > Thanks[/color][/color]
Mark Harris
Guest
 
Posts: n/a
#4: Nov 16 '05

re: Web site spider


Using a MemoryStream ;)

--
Mark Harris
Head Developer
GameHost CP

On 19 Aug 2004 23:01:42 -0700, ruso <rustem40@yahoo.com> wrote:
[color=blue]
> how can i take the strings into memory
>
> "Nicholas Paldino [.NET/C# MVP]" <mvp@spam.guard.caspershouse.com> wrote
> in message news:<ezlAGjhhEHA.3928@TK2MSFTNGP11.phx.gbl>...[color=green]
>> Ruso,
>>
>> A regex is probably the fastest way. How large are the files, and
>> are
>> you passing them as complete strings through the RegEx classes? Is
>> there
>> any way you can break them up into smaller pieces?
>>
>> The Match method only takes a string, so its probably the loading
>> of all
>> the content into the string which is causing slowdown (all of the
>> strings
>> contents loaded into memory).
>>
>> If you can break the files into smaller pieces, then it would help,
>> as
>> you wouldn't have to load such large strings into memory.
>>
>> Hope this helps.
>>
>> --
>> - Nicholas Paldino [.NET/C# MVP]
>> - mvp@spam.guard.caspershouse.com
>>
>> "ruso" <rustem40@yahoo.com> wrote in message
>> news:5bf8bf9b.0408190919.2f7784cd@posting.google.c om...[color=darkred]
>> >i am writing a program web site spider i am getting all pages of a
>> > site to local
>> > after that what i want to do is that i have about 5000 keywords which
>> > i want to search them in the website which download from a site.I am
>> > doing this search by regexpression but it is slow is there any faster
>> > search algorithm to suggest me.
>> >
>> > Thanks[/color][/color][/color]
Closed Thread