By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,963 Members | 1,778 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,963 IT Pros & Developers. It's quick & easy.

Can anyone tell me of any optimations I could do to this to make it faster?

P: n/a
I know there are ways to make this a lot faster. Any
newsreader does this in seconds. I don't know how they do
it and I am very new to c#. If anyone knows a faster way
please let me know. All I am doing is quering the db for
all the headers for a certain group and then going through
them to find all the parts of each post. I only want ones
that are complete. Meaning all segments for that one file
posted are there.

using System;
using System.Collections;
using System.Text;
using MySql.Data;
using System.Text.RegularExpressions;

namespace createfiles
{
class Program
{
static MySql.Data.MySqlClient.MySqlConnection conn
= new MySql.Data.MySqlClient.MySqlConnection();
static MySql.Data.MySqlClient.MySqlCommand cmd =
new MySql.Data.MySqlClient.MySqlCommand();
static string myConnectionString = "server=
127.0.0.1;uid=root;pwd=password;database=test;";
static ArrayList master;
static string group;
static string table;
static string[] groups = {
"alt.binaries.games.xbox", "alt.binaries.games.xbox360",
"alt.binaries.vcd" };
static Regex reg = new Regex("\\.");
static Regex seg = new Regex("\\([0-9]*/[0-9]*
\\)",RegexOptions.IgnoreCase);
struct Header
{
public string numb;
public string subject;
public string date;
public string from;
public string msg_id;
public string bytes;
}

static void Main(string[] args)
{
for (int x = 1; x < 2; x++)
{
table = reg.Replace(groups[x], "");
group = groups[x];
getheaders();
Console.WriteLine("Have this many headers
{0}", master.Count);
Header one = (Header)master[0];
Console.WriteLine("first one {0} {1}",
one.numb, one.subject);
find();
master.Clear();
}

}
static void getheaders()
{
conn.ConnectionString = myConnectionString;
conn.Open();
cmd.Connection = conn;
cmd.CommandText = "select * from " + table + "
where subject like '%(%/%)%'";
MySql.Data.MySqlClient.MySqlDataReader reader;
reader = cmd.ExecuteReader();
Header h = new Header();
master = new ArrayList();
while (reader.Read())
{
h.numb = reader.GetValue(0).ToString();
h.subject = reader.GetValue(1).ToString();
h.from = reader.GetValue(2).ToString();
h.date = reader.GetValue(3).ToString();
h.msg_id = reader.GetValue(4).ToString();
h.bytes = reader.GetValue(5).ToString();
master.Add(h);
}
reader.Close();
conn.Close();

}
static void find()
{
while (master.Count > 0)
{
Header start = (Header)master[0];
master.RemoveAt(0);
Match m = seg.Match(start.subject);
string segsplit = m.ToString();
segsplit = segsplit.Replace("(", "");
segsplit = segsplit.Replace(")", "");
string[] segments = segsplit.Split('/');
int max = int.Parse(segments[1]);
max += 1;
int counter = 1;
Header[] found = new Header[max];
string testsubject = seg.Replace
(start.subject, "");
int index = int.Parse(segments[0]);
//int temp = master.Count;
if (index < max)
{
found[index] = start;
for (int x = 0; x < master.Count; x++)
{
Header test = (Header)master[x];
if (test.subject.Contains
(testsubject))
{
//master.Remove(test);
master.RemoveAt(x);
x = x - 1;
Match t = seg.Match
(test.subject);
string tsplit = t.ToString();
string tsegsplit =
tsplit.Replace("(", "");
tsegsplit = tsegsplit.Replace
(")", "");
string[] tsegments =
tsegsplit.Split('/');
index = int.Parse(tsegments
[0]);
//Console.WriteLine(counter);
if (index < max)
{
found[index] = test;
counter++;
}
}

}
//Console.WriteLine("counter = {0}",
counter);
int testmax = max-1;
if (counter == testmax)
{
master.TrimToSize();
Console.WriteLine("We Have a Match
{0}", found[1].subject);
}
}
}
}

}
}
--
----------------------------------------------
Posted with NewsLeecher v3.0 Final
* Binary Usenet Leeching Made Easy
* http://www.newsleecher.com/?usenet
----------------------------------------------

May 31 '06 #1
Share this Question
Share on Google+
10 Replies


P: n/a
Extremest,

There are a few things I can see you doing here.

First though, I have to ask about your database structure. You are
storing the different headers in different tables with the name of the group
as the table. I don't know that this is necessarily a good idea. The
reason is that all of the tables share the same structure, and they are all
related, the only thing differentiating messages being the group that they
are in.

Because of that, I think that you should have one single table with
messages in them, and add a column which has the name of the group that the
message is in. Of course, the message could be in multiple groups (because
of crossposting). In this case, you would have another table which would
have a group id in it, as well as the name of the table that the message was
in. Doing this, you would then have a record in the main table which had
the message details, as well as another table saying which groups the
message was in.

Doing it like this also fixes an error in your code. You were removing
the periods from the group names in your tables. This brings up the
following situation. Hypothetically, you could have two groups:

alt.my.stuff
alt.mystuff

In your algorithm, they are treated the same way, and are in the same
table. In MySql, you should be able to use some sort of escape mechanism to
allow periods in your table names (something like square brackets in SQL
Server).

Moving on, I would not use regular expressions to perform basic
replacement functions as you are doing. I would use the Replace method on
the string class to do this. I think you will find this MUCH faster. The
same goes for the finding of a string (you match on the subject), as well as
the split functionality. All of this is offered on the string class, and
since you are not using wildcards or patterns, there is no reason to use the
regular expression classes.

When reading from the data reader, you don't have to call ToString. You
can cast the results to string directly.

Finally, I would recommend selecting out all of the messages from all of
the groups out at once, then processing them in order. You can sort the
results by group name, and then process them. This will save you from
having to make repeat trips to the database.

Hope ths helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Extremest" <Ex*******@extremest.com> wrote in message
news:mc*********************@fe01.usenetserver.com ...
I know there are ways to make this a lot faster. Any
newsreader does this in seconds. I don't know how they do
it and I am very new to c#. If anyone knows a faster way
please let me know. All I am doing is quering the db for
all the headers for a certain group and then going through
them to find all the parts of each post. I only want ones
that are complete. Meaning all segments for that one file
posted are there.

using System;
using System.Collections;
using System.Text;
using MySql.Data;
using System.Text.RegularExpressions;

namespace createfiles
{
class Program
{
static MySql.Data.MySqlClient.MySqlConnection conn
= new MySql.Data.MySqlClient.MySqlConnection();
static MySql.Data.MySqlClient.MySqlCommand cmd =
new MySql.Data.MySqlClient.MySqlCommand();
static string myConnectionString = "server=
127.0.0.1;uid=root;pwd=password;database=test;";
static ArrayList master;
static string group;
static string table;
static string[] groups = {
"alt.binaries.games.xbox", "alt.binaries.games.xbox360",
"alt.binaries.vcd" };
static Regex reg = new Regex("\\.");
static Regex seg = new Regex("\\([0-9]*/[0-9]*
\\)",RegexOptions.IgnoreCase);
struct Header
{
public string numb;
public string subject;
public string date;
public string from;
public string msg_id;
public string bytes;
}

static void Main(string[] args)
{
for (int x = 1; x < 2; x++)
{
table = reg.Replace(groups[x], "");
group = groups[x];
getheaders();
Console.WriteLine("Have this many headers
{0}", master.Count);
Header one = (Header)master[0];
Console.WriteLine("first one {0} {1}",
one.numb, one.subject);
find();
master.Clear();
}

}
static void getheaders()
{
conn.ConnectionString = myConnectionString;
conn.Open();
cmd.Connection = conn;
cmd.CommandText = "select * from " + table + "
where subject like '%(%/%)%'";
MySql.Data.MySqlClient.MySqlDataReader reader;
reader = cmd.ExecuteReader();
Header h = new Header();
master = new ArrayList();
while (reader.Read())
{
h.numb = reader.GetValue(0).ToString();
h.subject = reader.GetValue(1).ToString();
h.from = reader.GetValue(2).ToString();
h.date = reader.GetValue(3).ToString();
h.msg_id = reader.GetValue(4).ToString();
h.bytes = reader.GetValue(5).ToString();
master.Add(h);
}
reader.Close();
conn.Close();

}
static void find()
{
while (master.Count > 0)
{
Header start = (Header)master[0];
master.RemoveAt(0);
Match m = seg.Match(start.subject);
string segsplit = m.ToString();
segsplit = segsplit.Replace("(", "");
segsplit = segsplit.Replace(")", "");
string[] segments = segsplit.Split('/');
int max = int.Parse(segments[1]);
max += 1;
int counter = 1;
Header[] found = new Header[max];
string testsubject = seg.Replace
(start.subject, "");
int index = int.Parse(segments[0]);
//int temp = master.Count;
if (index < max)
{
found[index] = start;
for (int x = 0; x < master.Count; x++)
{
Header test = (Header)master[x];
if (test.subject.Contains
(testsubject))
{
//master.Remove(test);
master.RemoveAt(x);
x = x - 1;
Match t = seg.Match
(test.subject);
string tsplit = t.ToString();
string tsegsplit =
tsplit.Replace("(", "");
tsegsplit = tsegsplit.Replace
(")", "");
string[] tsegments =
tsegsplit.Split('/');
index = int.Parse(tsegments
[0]);
//Console.WriteLine(counter);
if (index < max)
{
found[index] = test;
counter++;
}
}

}
//Console.WriteLine("counter = {0}",
counter);
int testmax = max-1;
if (counter == testmax)
{
master.TrimToSize();
Console.WriteLine("We Have a Match
{0}", found[1].subject);
}
}
}
}

}
}
--
----------------------------------------------
Posted with NewsLeecher v3.0 Final
* Binary Usenet Leeching Made Easy
* http://www.newsleecher.com/?usenet
----------------------------------------------

May 31 '06 #2

P: n/a
the tables that it grabs the headers from is temporary. I don't have
the rest of the prog wrote yet. it will remove the headers from the db
that are complete for a single post. Also I am only doing specific
groups so that part on the periods is not an issue yet. Will redo that
later mainly just want to get this to work faster at the moment. There
are at least 1 million headers in each table right now if I just pull
from one of them it will take up around 500megs of ram and about the
same for VM. As far as the regex I am not sure what you mean. It is
finding a pattern in the subjects that are unique to each post and vary
in size. If there is a way to make that better please tell me.

Nicholas Paldino [.NET/C# MVP] wrote:
Extremest,

There are a few things I can see you doing here.

First though, I have to ask about your database structure. You are
storing the different headers in different tables with the name of the group
as the table. I don't know that this is necessarily a good idea. The
reason is that all of the tables share the same structure, and they are all
related, the only thing differentiating messages being the group that they
are in.

Because of that, I think that you should have one single table with
messages in them, and add a column which has the name of the group that the
message is in. Of course, the message could be in multiple groups (because
of crossposting). In this case, you would have another table which would
have a group id in it, as well as the name of the table that the message was
in. Doing this, you would then have a record in the main table which had
the message details, as well as another table saying which groups the
message was in.

Doing it like this also fixes an error in your code. You were removing
the periods from the group names in your tables. This brings up the
following situation. Hypothetically, you could have two groups:

alt.my.stuff
alt.mystuff

In your algorithm, they are treated the same way, and are in the same
table. In MySql, you should be able to use some sort of escape mechanism to
allow periods in your table names (something like square brackets in SQL
Server).

Moving on, I would not use regular expressions to perform basic
replacement functions as you are doing. I would use the Replace method on
the string class to do this. I think you will find this MUCH faster. The
same goes for the finding of a string (you match on the subject), as well as
the split functionality. All of this is offered on the string class, and
since you are not using wildcards or patterns, there is no reason to use the
regular expression classes.

When reading from the data reader, you don't have to call ToString. You
can cast the results to string directly.

Finally, I would recommend selecting out all of the messages from all of
the groups out at once, then processing them in order. You can sort the
results by group name, and then process them. This will save you from
having to make repeat trips to the database.

Hope ths helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Extremest" <Ex*******@extremest.com> wrote in message
news:mc*********************@fe01.usenetserver.com ...
I know there are ways to make this a lot faster. Any
newsreader does this in seconds. I don't know how they do
it and I am very new to c#. If anyone knows a faster way
please let me know. All I am doing is quering the db for
all the headers for a certain group and then going through
them to find all the parts of each post. I only want ones
that are complete. Meaning all segments for that one file
posted are there.

using System;
using System.Collections;
using System.Text;
using MySql.Data;
using System.Text.RegularExpressions;

namespace createfiles
{
class Program
{
static MySql.Data.MySqlClient.MySqlConnection conn
= new MySql.Data.MySqlClient.MySqlConnection();
static MySql.Data.MySqlClient.MySqlCommand cmd =
new MySql.Data.MySqlClient.MySqlCommand();
static string myConnectionString = "server=
127.0.0.1;uid=root;pwd=password;database=test;";
static ArrayList master;
static string group;
static string table;
static string[] groups = {
"alt.binaries.games.xbox", "alt.binaries.games.xbox360",
"alt.binaries.vcd" };
static Regex reg = new Regex("\\.");
static Regex seg = new Regex("\\([0-9]*/[0-9]*
\\)",RegexOptions.IgnoreCase);
struct Header
{
public string numb;
public string subject;
public string date;
public string from;
public string msg_id;
public string bytes;
}

static void Main(string[] args)
{
for (int x = 1; x < 2; x++)
{
table = reg.Replace(groups[x], "");
group = groups[x];
getheaders();
Console.WriteLine("Have this many headers
{0}", master.Count);
Header one = (Header)master[0];
Console.WriteLine("first one {0} {1}",
one.numb, one.subject);
find();
master.Clear();
}

}
static void getheaders()
{
conn.ConnectionString = myConnectionString;
conn.Open();
cmd.Connection = conn;
cmd.CommandText = "select * from " + table + "
where subject like '%(%/%)%'";
MySql.Data.MySqlClient.MySqlDataReader reader;
reader = cmd.ExecuteReader();
Header h = new Header();
master = new ArrayList();
while (reader.Read())
{
h.numb = reader.GetValue(0).ToString();
h.subject = reader.GetValue(1).ToString();
h.from = reader.GetValue(2).ToString();
h.date = reader.GetValue(3).ToString();
h.msg_id = reader.GetValue(4).ToString();
h.bytes = reader.GetValue(5).ToString();
master.Add(h);
}
reader.Close();
conn.Close();

}
static void find()
{
while (master.Count > 0)
{
Header start = (Header)master[0];
master.RemoveAt(0);
Match m = seg.Match(start.subject);
string segsplit = m.ToString();
segsplit = segsplit.Replace("(", "");
segsplit = segsplit.Replace(")", "");
string[] segments = segsplit.Split('/');
int max = int.Parse(segments[1]);
max += 1;
int counter = 1;
Header[] found = new Header[max];
string testsubject = seg.Replace
(start.subject, "");
int index = int.Parse(segments[0]);
//int temp = master.Count;
if (index < max)
{
found[index] = start;
for (int x = 0; x < master.Count; x++)
{
Header test = (Header)master[x];
if (test.subject.Contains
(testsubject))
{
//master.Remove(test);
master.RemoveAt(x);
x = x - 1;
Match t = seg.Match
(test.subject);
string tsplit = t.ToString();
string tsegsplit =
tsplit.Replace("(", "");
tsegsplit = tsegsplit.Replace
(")", "");
string[] tsegments =
tsegsplit.Split('/');
index = int.Parse(tsegments
[0]);
//Console.WriteLine(counter);
if (index < max)
{
found[index] = test;
counter++;
}
}

}
//Console.WriteLine("counter = {0}",
counter);
int testmax = max-1;
if (counter == testmax)
{
master.TrimToSize();
Console.WriteLine("We Have a Match
{0}", found[1].subject);
}
}
}
}

}
}
--
----------------------------------------------
Posted with NewsLeecher v3.0 Final
* Binary Usenet Leeching Made Easy
* http://www.newsleecher.com/?usenet
----------------------------------------------


May 31 '06 #3

P: n/a
In regards to the regex, why not just use the IndexOf method on the
string class? What are you gaining from using a regex? The regex
performance is undoubtedly going to be slower (as well as the split
operation as well).

--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

<dn**********@charter.net> wrote in message
news:11*********************@c74g2000cwc.googlegro ups.com...
the tables that it grabs the headers from is temporary. I don't have
the rest of the prog wrote yet. it will remove the headers from the db
that are complete for a single post. Also I am only doing specific
groups so that part on the periods is not an issue yet. Will redo that
later mainly just want to get this to work faster at the moment. There
are at least 1 million headers in each table right now if I just pull
from one of them it will take up around 500megs of ram and about the
same for VM. As far as the regex I am not sure what you mean. It is
finding a pattern in the subjects that are unique to each post and vary
in size. If there is a way to make that better please tell me.

Nicholas Paldino [.NET/C# MVP] wrote:
Extremest,

There are a few things I can see you doing here.

First though, I have to ask about your database structure. You are
storing the different headers in different tables with the name of the
group
as the table. I don't know that this is necessarily a good idea. The
reason is that all of the tables share the same structure, and they are
all
related, the only thing differentiating messages being the group that
they
are in.

Because of that, I think that you should have one single table with
messages in them, and add a column which has the name of the group that
the
message is in. Of course, the message could be in multiple groups
(because
of crossposting). In this case, you would have another table which would
have a group id in it, as well as the name of the table that the message
was
in. Doing this, you would then have a record in the main table which had
the message details, as well as another table saying which groups the
message was in.

Doing it like this also fixes an error in your code. You were
removing
the periods from the group names in your tables. This brings up the
following situation. Hypothetically, you could have two groups:

alt.my.stuff
alt.mystuff

In your algorithm, they are treated the same way, and are in the same
table. In MySql, you should be able to use some sort of escape mechanism
to
allow periods in your table names (something like square brackets in SQL
Server).

Moving on, I would not use regular expressions to perform basic
replacement functions as you are doing. I would use the Replace method
on
the string class to do this. I think you will find this MUCH faster.
The
same goes for the finding of a string (you match on the subject), as well
as
the split functionality. All of this is offered on the string class, and
since you are not using wildcards or patterns, there is no reason to use
the
regular expression classes.

When reading from the data reader, you don't have to call ToString.
You
can cast the results to string directly.

Finally, I would recommend selecting out all of the messages from all
of
the groups out at once, then processing them in order. You can sort the
results by group name, and then process them. This will save you from
having to make repeat trips to the database.

Hope ths helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Extremest" <Ex*******@extremest.com> wrote in message
news:mc*********************@fe01.usenetserver.com ...
>I know there are ways to make this a lot faster. Any
> newsreader does this in seconds. I don't know how they do
> it and I am very new to c#. If anyone knows a faster way
> please let me know. All I am doing is quering the db for
> all the headers for a certain group and then going through
> them to find all the parts of each post. I only want ones
> that are complete. Meaning all segments for that one file
> posted are there.
>
> using System;
> using System.Collections;
> using System.Text;
> using MySql.Data;
> using System.Text.RegularExpressions;
>
> namespace createfiles
> {
> class Program
> {
> static MySql.Data.MySqlClient.MySqlConnection conn
> = new MySql.Data.MySqlClient.MySqlConnection();
> static MySql.Data.MySqlClient.MySqlCommand cmd =
> new MySql.Data.MySqlClient.MySqlCommand();
> static string myConnectionString = "server=
> 127.0.0.1;uid=root;pwd=password;database=test;";
> static ArrayList master;
> static string group;
> static string table;
> static string[] groups = {
> "alt.binaries.games.xbox", "alt.binaries.games.xbox360",
> "alt.binaries.vcd" };
> static Regex reg = new Regex("\\.");
> static Regex seg = new Regex("\\([0-9]*/[0-9]*
> \\)",RegexOptions.IgnoreCase);
> struct Header
> {
> public string numb;
> public string subject;
> public string date;
> public string from;
> public string msg_id;
> public string bytes;
> }
>
> static void Main(string[] args)
> {
> for (int x = 1; x < 2; x++)
> {
> table = reg.Replace(groups[x], "");
> group = groups[x];
> getheaders();
> Console.WriteLine("Have this many headers
> {0}", master.Count);
> Header one = (Header)master[0];
> Console.WriteLine("first one {0} {1}",
> one.numb, one.subject);
> find();
> master.Clear();
> }
>
> }
> static void getheaders()
> {
> conn.ConnectionString = myConnectionString;
> conn.Open();
> cmd.Connection = conn;
> cmd.CommandText = "select * from " + table + "
> where subject like '%(%/%)%'";
> MySql.Data.MySqlClient.MySqlDataReader reader;
> reader = cmd.ExecuteReader();
> Header h = new Header();
> master = new ArrayList();
> while (reader.Read())
> {
> h.numb = reader.GetValue(0).ToString();
> h.subject = reader.GetValue(1).ToString();
> h.from = reader.GetValue(2).ToString();
> h.date = reader.GetValue(3).ToString();
> h.msg_id = reader.GetValue(4).ToString();
> h.bytes = reader.GetValue(5).ToString();
> master.Add(h);
> }
> reader.Close();
> conn.Close();
>
> }
> static void find()
> {
> while (master.Count > 0)
> {
> Header start = (Header)master[0];
> master.RemoveAt(0);
> Match m = seg.Match(start.subject);
> string segsplit = m.ToString();
> segsplit = segsplit.Replace("(", "");
> segsplit = segsplit.Replace(")", "");
> string[] segments = segsplit.Split('/');
> int max = int.Parse(segments[1]);
> max += 1;
> int counter = 1;
> Header[] found = new Header[max];
> string testsubject = seg.Replace
> (start.subject, "");
> int index = int.Parse(segments[0]);
> //int temp = master.Count;
> if (index < max)
> {
> found[index] = start;
> for (int x = 0; x < master.Count; x++)
> {
> Header test = (Header)master[x];
> if (test.subject.Contains
> (testsubject))
> {
> //master.Remove(test);
> master.RemoveAt(x);
> x = x - 1;
> Match t = seg.Match
> (test.subject);
> string tsplit = t.ToString();
> string tsegsplit =
> tsplit.Replace("(", "");
> tsegsplit = tsegsplit.Replace
> (")", "");
> string[] tsegments =
> tsegsplit.Split('/');
> index = int.Parse(tsegments
> [0]);
> //Console.WriteLine(counter);
> if (index < max)
> {
> found[index] = test;
> counter++;
> }
> }
>
> }
> //Console.WriteLine("counter = {0}",
> counter);
> int testmax = max-1;
> if (counter == testmax)
> {
> master.TrimToSize();
> Console.WriteLine("We Have a Match
> {0}", found[1].subject);
> }
> }
> }
> }
>
> }
> }
> --
> ----------------------------------------------
> Posted with NewsLeecher v3.0 Final
> * Binary Usenet Leeching Made Easy
> * http://www.newsleecher.com/?usenet
> ----------------------------------------------
>

May 31 '06 #4

P: n/a
ok I am not following all the way here. I am very new to c#. I can
see how the indexof would eliminate the need for the match variable.
As far as the index I am not sure how it would help there. I have to
search the original subject for (xx/xx) Where xx are numbers don't
know how many or where exactly it may be in the subject. Then use the
first number in that sequence for the index number cause that is the
number in the post sequence then use the last one to know how many
there are to find for it to be complete.
Nicholas Paldino [.NET/C# MVP] wrote:
In regards to the regex, why not just use the IndexOf method on the
string class? What are you gaining from using a regex? The regex
performance is undoubtedly going to be slower (as well as the split
operation as well).

--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

<dn**********@charter.net> wrote in message
news:11*********************@c74g2000cwc.googlegro ups.com...
the tables that it grabs the headers from is temporary. I don't have
the rest of the prog wrote yet. it will remove the headers from the db
that are complete for a single post. Also I am only doing specific
groups so that part on the periods is not an issue yet. Will redo that
later mainly just want to get this to work faster at the moment. There
are at least 1 million headers in each table right now if I just pull
from one of them it will take up around 500megs of ram and about the
same for VM. As far as the regex I am not sure what you mean. It is
finding a pattern in the subjects that are unique to each post and vary
in size. If there is a way to make that better please tell me.

Nicholas Paldino [.NET/C# MVP] wrote:
Extremest,

There are a few things I can see you doing here.

First though, I have to ask about your database structure. You are
storing the different headers in different tables with the name of the
group
as the table. I don't know that this is necessarily a good idea. The
reason is that all of the tables share the same structure, and they are
all
related, the only thing differentiating messages being the group that
they
are in.

Because of that, I think that you should have one single table with
messages in them, and add a column which has the name of the group that
the
message is in. Of course, the message could be in multiple groups
(because
of crossposting). In this case, you would have another table which would
have a group id in it, as well as the name of the table that the message
was
in. Doing this, you would then have a record in the main table which had
the message details, as well as another table saying which groups the
message was in.

Doing it like this also fixes an error in your code. You were
removing
the periods from the group names in your tables. This brings up the
following situation. Hypothetically, you could have two groups:

alt.my.stuff
alt.mystuff

In your algorithm, they are treated the same way, and are in the same
table. In MySql, you should be able to use some sort of escape mechanism
to
allow periods in your table names (something like square brackets in SQL
Server).

Moving on, I would not use regular expressions to perform basic
replacement functions as you are doing. I would use the Replace method
on
the string class to do this. I think you will find this MUCH faster.
The
same goes for the finding of a string (you match on the subject), as well
as
the split functionality. All of this is offered on the string class, and
since you are not using wildcards or patterns, there is no reason to use
the
regular expression classes.

When reading from the data reader, you don't have to call ToString.
You
can cast the results to string directly.

Finally, I would recommend selecting out all of the messages from all
of
the groups out at once, then processing them in order. You can sort the
results by group name, and then process them. This will save you from
having to make repeat trips to the database.

Hope ths helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Extremest" <Ex*******@extremest.com> wrote in message
news:mc*********************@fe01.usenetserver.com ...
>I know there are ways to make this a lot faster. Any
> newsreader does this in seconds. I don't know how they do
> it and I am very new to c#. If anyone knows a faster way
> please let me know. All I am doing is quering the db for
> all the headers for a certain group and then going through
> them to find all the parts of each post. I only want ones
> that are complete. Meaning all segments for that one file
> posted are there.
>
> using System;
> using System.Collections;
> using System.Text;
> using MySql.Data;
> using System.Text.RegularExpressions;
>
> namespace createfiles
> {
> class Program
> {
> static MySql.Data.MySqlClient.MySqlConnection conn
> = new MySql.Data.MySqlClient.MySqlConnection();
> static MySql.Data.MySqlClient.MySqlCommand cmd =
> new MySql.Data.MySqlClient.MySqlCommand();
> static string myConnectionString = "server=
> 127.0.0.1;uid=root;pwd=password;database=test;";
> static ArrayList master;
> static string group;
> static string table;
> static string[] groups = {
> "alt.binaries.games.xbox", "alt.binaries.games.xbox360",
> "alt.binaries.vcd" };
> static Regex reg = new Regex("\\.");
> static Regex seg = new Regex("\\([0-9]*/[0-9]*
> \\)",RegexOptions.IgnoreCase);
> struct Header
> {
> public string numb;
> public string subject;
> public string date;
> public string from;
> public string msg_id;
> public string bytes;
> }
>
> static void Main(string[] args)
> {
> for (int x = 1; x < 2; x++)
> {
> table = reg.Replace(groups[x], "");
> group = groups[x];
> getheaders();
> Console.WriteLine("Have this many headers
> {0}", master.Count);
> Header one = (Header)master[0];
> Console.WriteLine("first one {0} {1}",
> one.numb, one.subject);
> find();
> master.Clear();
> }
>
> }
> static void getheaders()
> {
> conn.ConnectionString = myConnectionString;
> conn.Open();
> cmd.Connection = conn;
> cmd.CommandText = "select * from " + table + "
> where subject like '%(%/%)%'";
> MySql.Data.MySqlClient.MySqlDataReader reader;
> reader = cmd.ExecuteReader();
> Header h = new Header();
> master = new ArrayList();
> while (reader.Read())
> {
> h.numb = reader.GetValue(0).ToString();
> h.subject = reader.GetValue(1).ToString();
> h.from = reader.GetValue(2).ToString();
> h.date = reader.GetValue(3).ToString();
> h.msg_id = reader.GetValue(4).ToString();
> h.bytes = reader.GetValue(5).ToString();
> master.Add(h);
> }
> reader.Close();
> conn.Close();
>
> }
> static void find()
> {
> while (master.Count > 0)
> {
> Header start = (Header)master[0];
> master.RemoveAt(0);
> Match m = seg.Match(start.subject);
> string segsplit = m.ToString();
> segsplit = segsplit.Replace("(", "");
> segsplit = segsplit.Replace(")", "");
> string[] segments = segsplit.Split('/');
> int max = int.Parse(segments[1]);
> max += 1;
> int counter = 1;
> Header[] found = new Header[max];
> string testsubject = seg.Replace
> (start.subject, "");
> int index = int.Parse(segments[0]);
> //int temp = master.Count;
> if (index < max)
> {
> found[index] = start;
> for (int x = 0; x < master.Count; x++)
> {
> Header test = (Header)master[x];
> if (test.subject.Contains
> (testsubject))
> {
> //master.Remove(test);
> master.RemoveAt(x);
> x = x - 1;
> Match t = seg.Match
> (test.subject);
> string tsplit = t.ToString();
> string tsegsplit =
> tsplit.Replace("(", "");
> tsegsplit = tsegsplit.Replace
> (")", "");
> string[] tsegments =
> tsegsplit.Split('/');
> index = int.Parse(tsegments
> [0]);
> //Console.WriteLine(counter);
> if (index < max)
> {
> found[index] = test;
> counter++;
> }
> }
>
> }
> //Console.WriteLine("counter = {0}",
> counter);
> int testmax = max-1;
> if (counter == testmax)
> {
> master.TrimToSize();
> Console.WriteLine("We Have a Match
> {0}", found[1].subject);
> }
> }
> }
> }
>
> }
> }
> --
> ----------------------------------------------
> Posted with NewsLeecher v3.0 Final
> * Binary Usenet Leeching Made Easy
> * http://www.newsleecher.com/?usenet
> ----------------------------------------------
>


May 31 '06 #5

P: n/a
I am not understanding what you are saying....I don't see how indexof
is going to help me. If someone can show me an example that would
remove the need for one of my regex's then I will do it. Here is my
code up till now.

using System;
using System.Collections;
using System.Text;
using MySql.Data;
using System.Text.RegularExpressions;

namespace createfiles
{
class Program
{
static MySql.Data.MySqlClient.MySqlConnection conn = new
MySql.Data.MySqlClient.MySqlConnection();
static MySql.Data.MySqlClient.MySqlCommand cmd = new
MySql.Data.MySqlClient.MySqlCommand();
static string myConnectionString =
"server=127.0.0.1;uid=root;pwd=password;database=t est;";
static ArrayList master;
static string group;
static string table;
static string[] groups = { "alt.binaries.games.xbox",
"alt.binaries.games.xbox360", "alt.binaries.vcd" };
static Regex reg = new Regex("\\.");
static Regex seg = new
Regex("\\([0-9]*/[0-9]*\\)",RegexOptions.IgnoreCase);
struct Header
{
public string numb;
public string subject;
public string date;
public string from;
public string msg_id;
public string bytes;
}

static void Main(string[] args)
{
for (int x = 1; x < 2; x++)
{
table = reg.Replace(groups[x], "");
group = groups[x];
getheaders();
Console.WriteLine("Have this many headers {0}",
master.Count);
Header one = (Header)master[0];
Console.WriteLine("first one {0} {1}", one.numb,
one.subject);
find();
master.Clear();
}

}
static void getheaders()
{
conn.ConnectionString = myConnectionString;
conn.Open();
cmd.Connection = conn;
cmd.CommandText = "select * from " + table + " where
subject like '%(%/%)%'";
MySql.Data.MySqlClient.MySqlDataReader reader;
reader = cmd.ExecuteReader();
Header h = new Header();
master = new ArrayList();
while (reader.Read())
{
h.numb = reader.GetValue(0).ToString();
h.subject = reader.GetValue(1).ToString();
h.from = reader.GetValue(2).ToString();
h.date = reader.GetValue(3).ToString();
h.msg_id = reader.GetValue(4).ToString();
h.bytes = reader.GetValue(5).ToString();
master.Add(h);
}
reader.Close();
conn.Close();

}
static void find()
{
while (master.Count > 0)
{
Header start = (Header)master[0];
master.RemoveAt(0);
Match m = seg.Match(start.subject);
string segsplit = m.ToString();
segsplit = segsplit.Replace("(", "").Replace(")", "");
string[] segments = segsplit.Split('/');
int max = int.Parse(segments[1]);
max += 1;
int counter = 1;
Header[] found = new Header[max];
string testsubject = seg.Replace(start.subject, "");
int index = int.Parse(segments[0]);
int temp = master.Count;
if (index < max)
{
found[index] = start;
for (int x = 0; x < master.Count; x++)
{
Header test = (Header)master[x];
if (test.subject.Contains(testsubject))
{
//master.Remove(test);
master.RemoveAt(x);
x = x - 1;
Match t = seg.Match(test.subject);
string tsplit = t.ToString();
string tsegsplit = tsplit.Replace("(",
"").Replace(")", "");
string[] tsegments = tsegsplit.Split('/');
index = int.Parse(tsegments[0]);
//Console.WriteLine(counter);
if (index < max)
{
found[index] = test;
counter++;
}
}

}
//Console.WriteLine("counter = {0}", counter);
int testmax = max-1;
if (counter == testmax)
{
master.TrimToSize();
Console.WriteLine("We Have a Match {0}",
found[1].subject);
}
}
}
}

}
}

May 31 '06 #6

P: n/a

"Extremest" wrote...
I know there are ways to make this a lot faster.
Even the elimination of a single statement counts? ;-)

I won't get into the possible overuse of regex and splits, but just make a
comment on the pattern of "removing elements from a collection while looping
through it". As it moves the remaining elements up one position, that type
of loop is generally better to do in reverse.

As I only skimmed hastily through the code, I don't know whether there would
be any other side effects, but by reversing the loop, you shouldn't need to
decrement x within the loop as well.
static void find()
{
while (master.Count > 0)
{ [snip] if (index < max)
{
found[index] = start;
This one -----------------v
for (int x = 0; x < master.Count; x++)
{


[snip]

for (int x = master.Count-1; x >= 0 ; x--)
{
Header test = (Header)master[x];
if (test.subject.Contains(testsubject))
{
master.RemoveAt(x);

// x = x - 1; <- Not necessary...

Match t = seg.Match (test.subject);

...etc...
/// Bjorn A
May 31 '06 #7

P: n/a
ok I used what you said about the end for the loop. Also I redid the
main loop so that it starts off by taking the last element fromt he
arraylist instead of the first. By doing this it helped to speed it up
with having matches closer together. I do not know how to remove any
of the regex. That is the only way I know of to find what I want. If
you guys need I can post a couple of subjects that it would be parsing
to let yea know what it is actually going to be looking at.

May 31 '06 #8

P: n/a
ok I jsut went through the mysql manual and it does not allow "."
periods in the table or db names. I will be ok for now. Only indexing
tables that i want so won't be a problem for a while. So far the prog
is doing really good. Have done a lot of changes. Going to work on
the grouping thing next if I can figure it out.

Jun 1 '06 #9

P: n/a
Ex*******@extremest.com wrote:
I know there are ways to make this a lot faster. Any
newsreader does this in seconds. I don't know how they do
it and I am very new to c#. If anyone knows a faster way
please let me know. All I am doing is quering the db for
all the headers for a certain group and then going through
them to find all the parts of each post. I only want ones
that are complete. Meaning all segments for that one file
posted are there.


Rule number one of optimizing code is going back to your algorithm and
see if that's optimal. Often you can spot 'hotspots' in an algorithm
quite easily, for example if you have a part of the algorithm which has
to be performed a lot of times. It's then best to invent a NEW
algorithm which does things more efficient. Often this requires to
start from scratch and do things completely different.

After you've optimized your algorithm, modify your code so it matches
the new algorithm.

Rule number two is measuring, with software performance measuring this
means: profiling. Download a .NET profiler and measure your code. Only
then you'll KNOW which parts are slow and which parts arent. If you
don't measure / profile your code, you will never be able to optimize a
slow piece of code, as chances are you'll have to guess which parts are
slow and will then optimize things which aren't slow or not significant
in the whole process.

Rule number three is that you have to avoid micro-optimizations. This
means that you always, in all cases, have to start from rule number 1,
and then do rule number 2. Micro-optimizations are what is done in this
thread, no offence to the people who helped you out as all they have is
your code. When you're doing micro-optimizations you look at the code
and guess which parts are slow, and then try to change them with what
you think are faster constructs.

Often this doesn't make a difference or makes things worse. The thing
is: if you have a slow piece of code but it is run once and takes 0.3
seconds to complete and you have a tight loop which takes 0.01 seconds
to complete but is run 10,000 times, which part of the code is
significant for the run of your program? The tight loop might look
fast, but in the end it's the bottleneck, not that piece of code which
takes 0.3 seconds.

Though as I said before, people in this newsgroup only have your code
snippet, so have to fall back on micro-optimizing, as there's no
explanation of the algorithm nor are there design decision motivations
available.

The thing which to me looks really slow is the LIKE predicate in your
query. LIKE is slow, especially when you use wildcards like the way you
do it.

To give you a hint of how an algorithm change can help you
tremendously here (as an example of rule nr. 1): what will your app do
the most: reading or writing? My guess: reading. So it will spend say
90% of its time reading data over and over again and 10% of its time
saving data.

This thus means that you have to optimize for reading, not writing.
This thus means that you have to avoid doing as much operations as
possible when you read data. It should be as firing a select, and
dumping the results on the screen, simplisticly said. So you should
move the processing of what's inside the DB to the WRITER logic. There,
you know what's going to be saved or better: you can analyse it and
retrieve extra information from it when it's written. THIS information
is then also stored in the DB.

When you READ data, you then simply use a couple of JOINs and simple
WHERE predicates (so no LIKE's) to fetch the data you need and you can
completely avoid the processing of data read.

This is an algorithm change, but it will beat any code-optimization
hands down.

Always pre-calculate as much as you can to avoid stalls in often used
code. Process data to get info over and over again? Do it once and
create a lookup table at runtime, saves you processing in all
subsequential reads. Simple, yet very effective.

Good luck :)

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Jun 1 '06 #10

P: n/a
ok I think I am getting what you are talking about. redo my header
prog that gets the headers and have it go ahead and find the max and
the segment number and realsubject. Pretty much redo my struct in my
header prog to match my new header class I have in the sort prog. then
remove a couple of things fromt he sort. and bam sort would be real
quick. getheader prog prolly won't slow down to much since only have
3mbit connection to get them. Also add the new column's to the db so
that they are there. I get it will implement it immediately.

Jun 1 '06 #11

This discussion thread is closed

Replies have been disabled for this discussion.