I know there are ways to make this a lot faster. Any
newsreader does this in seconds. I don't know how they do
it and I am very new to c#. If anyone knows a faster way
please let me know. All I am doing is quering the db for
all the headers for a certain group and then going through
them to find all the parts of each post. I only want ones
that are complete. Meaning all segments for that one file
posted are there.
using System;
using System.Collections;
using System.Text;
using MySql.Data;
using System.Text.RegularExpressions;
namespace createfiles
{
class Program
{
static MySql.Data.MySqlClient.MySqlConnection conn
= new MySql.Data.MySqlClient.MySqlConnection();
static MySql.Data.MySqlClient.MySqlCommand cmd =
new MySql.Data.MySqlClient.MySqlCommand();
static string myConnectionString = "server=
127.0.0.1;uid=root;pwd=password;database=test;";
static ArrayList master;
static string group;
static string table;
static string[] groups = {
"alt.binaries.games.xbox", "alt.binaries.games.xbox360",
"alt.binaries.vcd" };
static Regex reg = new Regex("\\.");
static Regex seg = new Regex("\\([0-9]*/[0-9]*
\\)",RegexOptions.IgnoreCase);
struct Header
{
public string numb;
public string subject;
public string date;
public string from;
public string msg_id;
public string bytes;
}
static void Main(string[] args)
{
for (int x = 1; x < 2; x++)
{
table = reg.Replace(groups[x], "");
group = groups[x];
getheaders();
Console.WriteLine("Have this many headers
{0}", master.Count);
Header one = (Header)master[0];
Console.WriteLine("first one {0} {1}",
one.numb, one.subject);
find();
master.Clear();
}
}
static void getheaders()
{
conn.ConnectionString = myConnectionString;
conn.Open();
cmd.Connection = conn;
cmd.CommandText = "select * from " + table + "
where subject like '%(%/%)%'";
MySql.Data.MySqlClient.MySqlDataReader reader;
reader = cmd.ExecuteReader();
Header h = new Header();
master = new ArrayList();
while (reader.Read())
{
h.numb = reader.GetValue(0).ToString();
h.subject = reader.GetValue(1).ToString();
h.from = reader.GetValue(2).ToString();
h.date = reader.GetValue(3).ToString();
h.msg_id = reader.GetValue(4).ToString();
h.bytes = reader.GetValue(5).ToString();
master.Add(h);
}
reader.Close();
conn.Close();
}
static void find()
{
while (master.Count > 0)
{
Header start = (Header)master[0];
master.RemoveAt(0);
Match m = seg.Match(start.subject);
string segsplit = m.ToString();
segsplit = segsplit.Replace("(", "");
segsplit = segsplit.Replace(")", "");
string[] segments = segsplit.Split('/');
int max = int.Parse(segments[1]);
max += 1;
int counter = 1;
Header[] found = new Header[max];
string testsubject = seg.Replace
(start.subject, "");
int index = int.Parse(segments[0]);
//int temp = master.Count;
if (index < max)
{
found[index] = start;
for (int x = 0; x < master.Count; x++)
{
Header test = (Header)master[x];
if (test.subject.Contains
(testsubject))
{
//master.Remove(test);
master.RemoveAt(x);
x = x - 1;
Match t = seg.Match
(test.subject);
string tsplit = t.ToString();
string tsegsplit =
tsplit.Replace("(", "");
tsegsplit = tsegsplit.Replace
(")", "");
string[] tsegments =
tsegsplit.Split('/');
index = int.Parse(tsegments
[0]);
//Console.WriteLine(counter);
if (index < max)
{
found[index] = test;
counter++;
}
}
}
//Console.WriteLine("counter = {0}",
counter);
int testmax = max-1;
if (counter == testmax)
{
master.TrimToSize();
Console.WriteLine("We Have a Match
{0}", found[1].subject);
}
}
}
}
}
}
--
----------------------------------------------
Posted with NewsLeecher v3.0 Final
* Binary Usenet Leeching Made Easy
* http://www.newsleecher.com/?usenet
---------------------------------------------- 10 2104
Extremest,
There are a few things I can see you doing here.
First though, I have to ask about your database structure. You are
storing the different headers in different tables with the name of the group
as the table. I don't know that this is necessarily a good idea. The
reason is that all of the tables share the same structure, and they are all
related, the only thing differentiating messages being the group that they
are in.
Because of that, I think that you should have one single table with
messages in them, and add a column which has the name of the group that the
message is in. Of course, the message could be in multiple groups (because
of crossposting). In this case, you would have another table which would
have a group id in it, as well as the name of the table that the message was
in. Doing this, you would then have a record in the main table which had
the message details, as well as another table saying which groups the
message was in.
Doing it like this also fixes an error in your code. You were removing
the periods from the group names in your tables. This brings up the
following situation. Hypothetically, you could have two groups:
alt.my.stuff
alt.mystuff
In your algorithm, they are treated the same way, and are in the same
table. In MySql, you should be able to use some sort of escape mechanism to
allow periods in your table names (something like square brackets in SQL
Server).
Moving on, I would not use regular expressions to perform basic
replacement functions as you are doing. I would use the Replace method on
the string class to do this. I think you will find this MUCH faster. The
same goes for the finding of a string (you match on the subject), as well as
the split functionality. All of this is offered on the string class, and
since you are not using wildcards or patterns, there is no reason to use the
regular expression classes.
When reading from the data reader, you don't have to call ToString. You
can cast the results to string directly.
Finally, I would recommend selecting out all of the messages from all of
the groups out at once, then processing them in order. You can sort the
results by group name, and then process them. This will save you from
having to make repeat trips to the database.
Hope ths helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com
"Extremest" <Ex*******@extremest.com> wrote in message
news:mc*********************@fe01.usenetserver.com ... I know there are ways to make this a lot faster. Any newsreader does this in seconds. I don't know how they do it and I am very new to c#. If anyone knows a faster way please let me know. All I am doing is quering the db for all the headers for a certain group and then going through them to find all the parts of each post. I only want ones that are complete. Meaning all segments for that one file posted are there.
using System; using System.Collections; using System.Text; using MySql.Data; using System.Text.RegularExpressions;
namespace createfiles { class Program { static MySql.Data.MySqlClient.MySqlConnection conn = new MySql.Data.MySqlClient.MySqlConnection(); static MySql.Data.MySqlClient.MySqlCommand cmd = new MySql.Data.MySqlClient.MySqlCommand(); static string myConnectionString = "server= 127.0.0.1;uid=root;pwd=password;database=test;"; static ArrayList master; static string group; static string table; static string[] groups = { "alt.binaries.games.xbox", "alt.binaries.games.xbox360", "alt.binaries.vcd" }; static Regex reg = new Regex("\\."); static Regex seg = new Regex("\\([0-9]*/[0-9]* \\)",RegexOptions.IgnoreCase); struct Header { public string numb; public string subject; public string date; public string from; public string msg_id; public string bytes; }
static void Main(string[] args) { for (int x = 1; x < 2; x++) { table = reg.Replace(groups[x], ""); group = groups[x]; getheaders(); Console.WriteLine("Have this many headers {0}", master.Count); Header one = (Header)master[0]; Console.WriteLine("first one {0} {1}", one.numb, one.subject); find(); master.Clear(); }
} static void getheaders() { conn.ConnectionString = myConnectionString; conn.Open(); cmd.Connection = conn; cmd.CommandText = "select * from " + table + " where subject like '%(%/%)%'"; MySql.Data.MySqlClient.MySqlDataReader reader; reader = cmd.ExecuteReader(); Header h = new Header(); master = new ArrayList(); while (reader.Read()) { h.numb = reader.GetValue(0).ToString(); h.subject = reader.GetValue(1).ToString(); h.from = reader.GetValue(2).ToString(); h.date = reader.GetValue(3).ToString(); h.msg_id = reader.GetValue(4).ToString(); h.bytes = reader.GetValue(5).ToString(); master.Add(h); } reader.Close(); conn.Close();
} static void find() { while (master.Count > 0) { Header start = (Header)master[0]; master.RemoveAt(0); Match m = seg.Match(start.subject); string segsplit = m.ToString(); segsplit = segsplit.Replace("(", ""); segsplit = segsplit.Replace(")", ""); string[] segments = segsplit.Split('/'); int max = int.Parse(segments[1]); max += 1; int counter = 1; Header[] found = new Header[max]; string testsubject = seg.Replace (start.subject, ""); int index = int.Parse(segments[0]); //int temp = master.Count; if (index < max) { found[index] = start; for (int x = 0; x < master.Count; x++) { Header test = (Header)master[x]; if (test.subject.Contains (testsubject)) { //master.Remove(test); master.RemoveAt(x); x = x - 1; Match t = seg.Match (test.subject); string tsplit = t.ToString(); string tsegsplit = tsplit.Replace("(", ""); tsegsplit = tsegsplit.Replace (")", ""); string[] tsegments = tsegsplit.Split('/'); index = int.Parse(tsegments [0]); //Console.WriteLine(counter); if (index < max) { found[index] = test; counter++; } }
} //Console.WriteLine("counter = {0}", counter); int testmax = max-1; if (counter == testmax) { master.TrimToSize(); Console.WriteLine("We Have a Match {0}", found[1].subject); } } } }
} } -- ---------------------------------------------- Posted with NewsLeecher v3.0 Final * Binary Usenet Leeching Made Easy * http://www.newsleecher.com/?usenet ----------------------------------------------
the tables that it grabs the headers from is temporary. I don't have
the rest of the prog wrote yet. it will remove the headers from the db
that are complete for a single post. Also I am only doing specific
groups so that part on the periods is not an issue yet. Will redo that
later mainly just want to get this to work faster at the moment. There
are at least 1 million headers in each table right now if I just pull
from one of them it will take up around 500megs of ram and about the
same for VM. As far as the regex I am not sure what you mean. It is
finding a pattern in the subjects that are unique to each post and vary
in size. If there is a way to make that better please tell me.
Nicholas Paldino [.NET/C# MVP] wrote: Extremest,
There are a few things I can see you doing here.
First though, I have to ask about your database structure. You are storing the different headers in different tables with the name of the group as the table. I don't know that this is necessarily a good idea. The reason is that all of the tables share the same structure, and they are all related, the only thing differentiating messages being the group that they are in.
Because of that, I think that you should have one single table with messages in them, and add a column which has the name of the group that the message is in. Of course, the message could be in multiple groups (because of crossposting). In this case, you would have another table which would have a group id in it, as well as the name of the table that the message was in. Doing this, you would then have a record in the main table which had the message details, as well as another table saying which groups the message was in.
Doing it like this also fixes an error in your code. You were removing the periods from the group names in your tables. This brings up the following situation. Hypothetically, you could have two groups:
alt.my.stuff alt.mystuff
In your algorithm, they are treated the same way, and are in the same table. In MySql, you should be able to use some sort of escape mechanism to allow periods in your table names (something like square brackets in SQL Server).
Moving on, I would not use regular expressions to perform basic replacement functions as you are doing. I would use the Replace method on the string class to do this. I think you will find this MUCH faster. The same goes for the finding of a string (you match on the subject), as well as the split functionality. All of this is offered on the string class, and since you are not using wildcards or patterns, there is no reason to use the regular expression classes.
When reading from the data reader, you don't have to call ToString. You can cast the results to string directly.
Finally, I would recommend selecting out all of the messages from all of the groups out at once, then processing them in order. You can sort the results by group name, and then process them. This will save you from having to make repeat trips to the database.
Hope ths helps.
-- - Nicholas Paldino [.NET/C# MVP] - mv*@spam.guard.caspershouse.com
"Extremest" <Ex*******@extremest.com> wrote in message news:mc*********************@fe01.usenetserver.com ...I know there are ways to make this a lot faster. Any newsreader does this in seconds. I don't know how they do it and I am very new to c#. If anyone knows a faster way please let me know. All I am doing is quering the db for all the headers for a certain group and then going through them to find all the parts of each post. I only want ones that are complete. Meaning all segments for that one file posted are there.
using System; using System.Collections; using System.Text; using MySql.Data; using System.Text.RegularExpressions;
namespace createfiles { class Program { static MySql.Data.MySqlClient.MySqlConnection conn = new MySql.Data.MySqlClient.MySqlConnection(); static MySql.Data.MySqlClient.MySqlCommand cmd = new MySql.Data.MySqlClient.MySqlCommand(); static string myConnectionString = "server= 127.0.0.1;uid=root;pwd=password;database=test;"; static ArrayList master; static string group; static string table; static string[] groups = { "alt.binaries.games.xbox", "alt.binaries.games.xbox360", "alt.binaries.vcd" }; static Regex reg = new Regex("\\."); static Regex seg = new Regex("\\([0-9]*/[0-9]* \\)",RegexOptions.IgnoreCase); struct Header { public string numb; public string subject; public string date; public string from; public string msg_id; public string bytes; }
static void Main(string[] args) { for (int x = 1; x < 2; x++) { table = reg.Replace(groups[x], ""); group = groups[x]; getheaders(); Console.WriteLine("Have this many headers {0}", master.Count); Header one = (Header)master[0]; Console.WriteLine("first one {0} {1}", one.numb, one.subject); find(); master.Clear(); }
} static void getheaders() { conn.ConnectionString = myConnectionString; conn.Open(); cmd.Connection = conn; cmd.CommandText = "select * from " + table + " where subject like '%(%/%)%'"; MySql.Data.MySqlClient.MySqlDataReader reader; reader = cmd.ExecuteReader(); Header h = new Header(); master = new ArrayList(); while (reader.Read()) { h.numb = reader.GetValue(0).ToString(); h.subject = reader.GetValue(1).ToString(); h.from = reader.GetValue(2).ToString(); h.date = reader.GetValue(3).ToString(); h.msg_id = reader.GetValue(4).ToString(); h.bytes = reader.GetValue(5).ToString(); master.Add(h); } reader.Close(); conn.Close();
} static void find() { while (master.Count > 0) { Header start = (Header)master[0]; master.RemoveAt(0); Match m = seg.Match(start.subject); string segsplit = m.ToString(); segsplit = segsplit.Replace("(", ""); segsplit = segsplit.Replace(")", ""); string[] segments = segsplit.Split('/'); int max = int.Parse(segments[1]); max += 1; int counter = 1; Header[] found = new Header[max]; string testsubject = seg.Replace (start.subject, ""); int index = int.Parse(segments[0]); //int temp = master.Count; if (index < max) { found[index] = start; for (int x = 0; x < master.Count; x++) { Header test = (Header)master[x]; if (test.subject.Contains (testsubject)) { //master.Remove(test); master.RemoveAt(x); x = x - 1; Match t = seg.Match (test.subject); string tsplit = t.ToString(); string tsegsplit = tsplit.Replace("(", ""); tsegsplit = tsegsplit.Replace (")", ""); string[] tsegments = tsegsplit.Split('/'); index = int.Parse(tsegments [0]); //Console.WriteLine(counter); if (index < max) { found[index] = test; counter++; } }
} //Console.WriteLine("counter = {0}", counter); int testmax = max-1; if (counter == testmax) { master.TrimToSize(); Console.WriteLine("We Have a Match {0}", found[1].subject); } } } }
} } -- ---------------------------------------------- Posted with NewsLeecher v3.0 Final * Binary Usenet Leeching Made Easy * http://www.newsleecher.com/?usenet ----------------------------------------------
In regards to the regex, why not just use the IndexOf method on the
string class? What are you gaining from using a regex? The regex
performance is undoubtedly going to be slower (as well as the split
operation as well).
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com
<dn**********@charter.net> wrote in message
news:11*********************@c74g2000cwc.googlegro ups.com... the tables that it grabs the headers from is temporary. I don't have the rest of the prog wrote yet. it will remove the headers from the db that are complete for a single post. Also I am only doing specific groups so that part on the periods is not an issue yet. Will redo that later mainly just want to get this to work faster at the moment. There are at least 1 million headers in each table right now if I just pull from one of them it will take up around 500megs of ram and about the same for VM. As far as the regex I am not sure what you mean. It is finding a pattern in the subjects that are unique to each post and vary in size. If there is a way to make that better please tell me.
Nicholas Paldino [.NET/C# MVP] wrote: Extremest,
There are a few things I can see you doing here.
First though, I have to ask about your database structure. You are storing the different headers in different tables with the name of the group as the table. I don't know that this is necessarily a good idea. The reason is that all of the tables share the same structure, and they are all related, the only thing differentiating messages being the group that they are in.
Because of that, I think that you should have one single table with messages in them, and add a column which has the name of the group that the message is in. Of course, the message could be in multiple groups (because of crossposting). In this case, you would have another table which would have a group id in it, as well as the name of the table that the message was in. Doing this, you would then have a record in the main table which had the message details, as well as another table saying which groups the message was in.
Doing it like this also fixes an error in your code. You were removing the periods from the group names in your tables. This brings up the following situation. Hypothetically, you could have two groups:
alt.my.stuff alt.mystuff
In your algorithm, they are treated the same way, and are in the same table. In MySql, you should be able to use some sort of escape mechanism to allow periods in your table names (something like square brackets in SQL Server).
Moving on, I would not use regular expressions to perform basic replacement functions as you are doing. I would use the Replace method on the string class to do this. I think you will find this MUCH faster. The same goes for the finding of a string (you match on the subject), as well as the split functionality. All of this is offered on the string class, and since you are not using wildcards or patterns, there is no reason to use the regular expression classes.
When reading from the data reader, you don't have to call ToString. You can cast the results to string directly.
Finally, I would recommend selecting out all of the messages from all of the groups out at once, then processing them in order. You can sort the results by group name, and then process them. This will save you from having to make repeat trips to the database.
Hope ths helps.
-- - Nicholas Paldino [.NET/C# MVP] - mv*@spam.guard.caspershouse.com
"Extremest" <Ex*******@extremest.com> wrote in message news:mc*********************@fe01.usenetserver.com ... >I know there are ways to make this a lot faster. Any > newsreader does this in seconds. I don't know how they do > it and I am very new to c#. If anyone knows a faster way > please let me know. All I am doing is quering the db for > all the headers for a certain group and then going through > them to find all the parts of each post. I only want ones > that are complete. Meaning all segments for that one file > posted are there. > > using System; > using System.Collections; > using System.Text; > using MySql.Data; > using System.Text.RegularExpressions; > > namespace createfiles > { > class Program > { > static MySql.Data.MySqlClient.MySqlConnection conn > = new MySql.Data.MySqlClient.MySqlConnection(); > static MySql.Data.MySqlClient.MySqlCommand cmd = > new MySql.Data.MySqlClient.MySqlCommand(); > static string myConnectionString = "server= > 127.0.0.1;uid=root;pwd=password;database=test;"; > static ArrayList master; > static string group; > static string table; > static string[] groups = { > "alt.binaries.games.xbox", "alt.binaries.games.xbox360", > "alt.binaries.vcd" }; > static Regex reg = new Regex("\\."); > static Regex seg = new Regex("\\([0-9]*/[0-9]* > \\)",RegexOptions.IgnoreCase); > struct Header > { > public string numb; > public string subject; > public string date; > public string from; > public string msg_id; > public string bytes; > } > > static void Main(string[] args) > { > for (int x = 1; x < 2; x++) > { > table = reg.Replace(groups[x], ""); > group = groups[x]; > getheaders(); > Console.WriteLine("Have this many headers > {0}", master.Count); > Header one = (Header)master[0]; > Console.WriteLine("first one {0} {1}", > one.numb, one.subject); > find(); > master.Clear(); > } > > } > static void getheaders() > { > conn.ConnectionString = myConnectionString; > conn.Open(); > cmd.Connection = conn; > cmd.CommandText = "select * from " + table + " > where subject like '%(%/%)%'"; > MySql.Data.MySqlClient.MySqlDataReader reader; > reader = cmd.ExecuteReader(); > Header h = new Header(); > master = new ArrayList(); > while (reader.Read()) > { > h.numb = reader.GetValue(0).ToString(); > h.subject = reader.GetValue(1).ToString(); > h.from = reader.GetValue(2).ToString(); > h.date = reader.GetValue(3).ToString(); > h.msg_id = reader.GetValue(4).ToString(); > h.bytes = reader.GetValue(5).ToString(); > master.Add(h); > } > reader.Close(); > conn.Close(); > > } > static void find() > { > while (master.Count > 0) > { > Header start = (Header)master[0]; > master.RemoveAt(0); > Match m = seg.Match(start.subject); > string segsplit = m.ToString(); > segsplit = segsplit.Replace("(", ""); > segsplit = segsplit.Replace(")", ""); > string[] segments = segsplit.Split('/'); > int max = int.Parse(segments[1]); > max += 1; > int counter = 1; > Header[] found = new Header[max]; > string testsubject = seg.Replace > (start.subject, ""); > int index = int.Parse(segments[0]); > //int temp = master.Count; > if (index < max) > { > found[index] = start; > for (int x = 0; x < master.Count; x++) > { > Header test = (Header)master[x]; > if (test.subject.Contains > (testsubject)) > { > //master.Remove(test); > master.RemoveAt(x); > x = x - 1; > Match t = seg.Match > (test.subject); > string tsplit = t.ToString(); > string tsegsplit = > tsplit.Replace("(", ""); > tsegsplit = tsegsplit.Replace > (")", ""); > string[] tsegments = > tsegsplit.Split('/'); > index = int.Parse(tsegments > [0]); > //Console.WriteLine(counter); > if (index < max) > { > found[index] = test; > counter++; > } > } > > } > //Console.WriteLine("counter = {0}", > counter); > int testmax = max-1; > if (counter == testmax) > { > master.TrimToSize(); > Console.WriteLine("We Have a Match > {0}", found[1].subject); > } > } > } > } > > } > } > -- > ---------------------------------------------- > Posted with NewsLeecher v3.0 Final > * Binary Usenet Leeching Made Easy > * http://www.newsleecher.com/?usenet > ---------------------------------------------- >
ok I am not following all the way here. I am very new to c#. I can
see how the indexof would eliminate the need for the match variable.
As far as the index I am not sure how it would help there. I have to
search the original subject for (xx/xx) Where xx are numbers don't
know how many or where exactly it may be in the subject. Then use the
first number in that sequence for the index number cause that is the
number in the post sequence then use the last one to know how many
there are to find for it to be complete.
Nicholas Paldino [.NET/C# MVP] wrote: In regards to the regex, why not just use the IndexOf method on the string class? What are you gaining from using a regex? The regex performance is undoubtedly going to be slower (as well as the split operation as well).
-- - Nicholas Paldino [.NET/C# MVP] - mv*@spam.guard.caspershouse.com
<dn**********@charter.net> wrote in message news:11*********************@c74g2000cwc.googlegro ups.com... the tables that it grabs the headers from is temporary. I don't have the rest of the prog wrote yet. it will remove the headers from the db that are complete for a single post. Also I am only doing specific groups so that part on the periods is not an issue yet. Will redo that later mainly just want to get this to work faster at the moment. There are at least 1 million headers in each table right now if I just pull from one of them it will take up around 500megs of ram and about the same for VM. As far as the regex I am not sure what you mean. It is finding a pattern in the subjects that are unique to each post and vary in size. If there is a way to make that better please tell me.
Nicholas Paldino [.NET/C# MVP] wrote: Extremest,
There are a few things I can see you doing here.
First though, I have to ask about your database structure. You are storing the different headers in different tables with the name of the group as the table. I don't know that this is necessarily a good idea. The reason is that all of the tables share the same structure, and they are all related, the only thing differentiating messages being the group that they are in.
Because of that, I think that you should have one single table with messages in them, and add a column which has the name of the group that the message is in. Of course, the message could be in multiple groups (because of crossposting). In this case, you would have another table which would have a group id in it, as well as the name of the table that the message was in. Doing this, you would then have a record in the main table which had the message details, as well as another table saying which groups the message was in.
Doing it like this also fixes an error in your code. You were removing the periods from the group names in your tables. This brings up the following situation. Hypothetically, you could have two groups:
alt.my.stuff alt.mystuff
In your algorithm, they are treated the same way, and are in the same table. In MySql, you should be able to use some sort of escape mechanism to allow periods in your table names (something like square brackets in SQL Server).
Moving on, I would not use regular expressions to perform basic replacement functions as you are doing. I would use the Replace method on the string class to do this. I think you will find this MUCH faster. The same goes for the finding of a string (you match on the subject), as well as the split functionality. All of this is offered on the string class, and since you are not using wildcards or patterns, there is no reason to use the regular expression classes.
When reading from the data reader, you don't have to call ToString. You can cast the results to string directly.
Finally, I would recommend selecting out all of the messages from all of the groups out at once, then processing them in order. You can sort the results by group name, and then process them. This will save you from having to make repeat trips to the database.
Hope ths helps.
-- - Nicholas Paldino [.NET/C# MVP] - mv*@spam.guard.caspershouse.com
"Extremest" <Ex*******@extremest.com> wrote in message news:mc*********************@fe01.usenetserver.com ... >I know there are ways to make this a lot faster. Any > newsreader does this in seconds. I don't know how they do > it and I am very new to c#. If anyone knows a faster way > please let me know. All I am doing is quering the db for > all the headers for a certain group and then going through > them to find all the parts of each post. I only want ones > that are complete. Meaning all segments for that one file > posted are there. > > using System; > using System.Collections; > using System.Text; > using MySql.Data; > using System.Text.RegularExpressions; > > namespace createfiles > { > class Program > { > static MySql.Data.MySqlClient.MySqlConnection conn > = new MySql.Data.MySqlClient.MySqlConnection(); > static MySql.Data.MySqlClient.MySqlCommand cmd = > new MySql.Data.MySqlClient.MySqlCommand(); > static string myConnectionString = "server= > 127.0.0.1;uid=root;pwd=password;database=test;"; > static ArrayList master; > static string group; > static string table; > static string[] groups = { > "alt.binaries.games.xbox", "alt.binaries.games.xbox360", > "alt.binaries.vcd" }; > static Regex reg = new Regex("\\."); > static Regex seg = new Regex("\\([0-9]*/[0-9]* > \\)",RegexOptions.IgnoreCase); > struct Header > { > public string numb; > public string subject; > public string date; > public string from; > public string msg_id; > public string bytes; > } > > static void Main(string[] args) > { > for (int x = 1; x < 2; x++) > { > table = reg.Replace(groups[x], ""); > group = groups[x]; > getheaders(); > Console.WriteLine("Have this many headers > {0}", master.Count); > Header one = (Header)master[0]; > Console.WriteLine("first one {0} {1}", > one.numb, one.subject); > find(); > master.Clear(); > } > > } > static void getheaders() > { > conn.ConnectionString = myConnectionString; > conn.Open(); > cmd.Connection = conn; > cmd.CommandText = "select * from " + table + " > where subject like '%(%/%)%'"; > MySql.Data.MySqlClient.MySqlDataReader reader; > reader = cmd.ExecuteReader(); > Header h = new Header(); > master = new ArrayList(); > while (reader.Read()) > { > h.numb = reader.GetValue(0).ToString(); > h.subject = reader.GetValue(1).ToString(); > h.from = reader.GetValue(2).ToString(); > h.date = reader.GetValue(3).ToString(); > h.msg_id = reader.GetValue(4).ToString(); > h.bytes = reader.GetValue(5).ToString(); > master.Add(h); > } > reader.Close(); > conn.Close(); > > } > static void find() > { > while (master.Count > 0) > { > Header start = (Header)master[0]; > master.RemoveAt(0); > Match m = seg.Match(start.subject); > string segsplit = m.ToString(); > segsplit = segsplit.Replace("(", ""); > segsplit = segsplit.Replace(")", ""); > string[] segments = segsplit.Split('/'); > int max = int.Parse(segments[1]); > max += 1; > int counter = 1; > Header[] found = new Header[max]; > string testsubject = seg.Replace > (start.subject, ""); > int index = int.Parse(segments[0]); > //int temp = master.Count; > if (index < max) > { > found[index] = start; > for (int x = 0; x < master.Count; x++) > { > Header test = (Header)master[x]; > if (test.subject.Contains > (testsubject)) > { > //master.Remove(test); > master.RemoveAt(x); > x = x - 1; > Match t = seg.Match > (test.subject); > string tsplit = t.ToString(); > string tsegsplit = > tsplit.Replace("(", ""); > tsegsplit = tsegsplit.Replace > (")", ""); > string[] tsegments = > tsegsplit.Split('/'); > index = int.Parse(tsegments > [0]); > //Console.WriteLine(counter); > if (index < max) > { > found[index] = test; > counter++; > } > } > > } > //Console.WriteLine("counter = {0}", > counter); > int testmax = max-1; > if (counter == testmax) > { > master.TrimToSize(); > Console.WriteLine("We Have a Match > {0}", found[1].subject); > } > } > } > } > > } > } > -- > ---------------------------------------------- > Posted with NewsLeecher v3.0 Final > * Binary Usenet Leeching Made Easy > * http://www.newsleecher.com/?usenet > ---------------------------------------------- >
I am not understanding what you are saying....I don't see how indexof
is going to help me. If someone can show me an example that would
remove the need for one of my regex's then I will do it. Here is my
code up till now.
using System;
using System.Collections;
using System.Text;
using MySql.Data;
using System.Text.RegularExpressions;
namespace createfiles
{
class Program
{
static MySql.Data.MySqlClient.MySqlConnection conn = new
MySql.Data.MySqlClient.MySqlConnection();
static MySql.Data.MySqlClient.MySqlCommand cmd = new
MySql.Data.MySqlClient.MySqlCommand();
static string myConnectionString =
"server=127.0.0.1;uid=root;pwd=password;database=t est;";
static ArrayList master;
static string group;
static string table;
static string[] groups = { "alt.binaries.games.xbox",
"alt.binaries.games.xbox360", "alt.binaries.vcd" };
static Regex reg = new Regex("\\.");
static Regex seg = new
Regex("\\([0-9]*/[0-9]*\\)",RegexOptions.IgnoreCase);
struct Header
{
public string numb;
public string subject;
public string date;
public string from;
public string msg_id;
public string bytes;
}
static void Main(string[] args)
{
for (int x = 1; x < 2; x++)
{
table = reg.Replace(groups[x], "");
group = groups[x];
getheaders();
Console.WriteLine("Have this many headers {0}",
master.Count);
Header one = (Header)master[0];
Console.WriteLine("first one {0} {1}", one.numb,
one.subject);
find();
master.Clear();
}
}
static void getheaders()
{
conn.ConnectionString = myConnectionString;
conn.Open();
cmd.Connection = conn;
cmd.CommandText = "select * from " + table + " where
subject like '%(%/%)%'";
MySql.Data.MySqlClient.MySqlDataReader reader;
reader = cmd.ExecuteReader();
Header h = new Header();
master = new ArrayList();
while (reader.Read())
{
h.numb = reader.GetValue(0).ToString();
h.subject = reader.GetValue(1).ToString();
h.from = reader.GetValue(2).ToString();
h.date = reader.GetValue(3).ToString();
h.msg_id = reader.GetValue(4).ToString();
h.bytes = reader.GetValue(5).ToString();
master.Add(h);
}
reader.Close();
conn.Close();
}
static void find()
{
while (master.Count > 0)
{
Header start = (Header)master[0];
master.RemoveAt(0);
Match m = seg.Match(start.subject);
string segsplit = m.ToString();
segsplit = segsplit.Replace("(", "").Replace(")", "");
string[] segments = segsplit.Split('/');
int max = int.Parse(segments[1]);
max += 1;
int counter = 1;
Header[] found = new Header[max];
string testsubject = seg.Replace(start.subject, "");
int index = int.Parse(segments[0]);
int temp = master.Count;
if (index < max)
{
found[index] = start;
for (int x = 0; x < master.Count; x++)
{
Header test = (Header)master[x];
if (test.subject.Contains(testsubject))
{
//master.Remove(test);
master.RemoveAt(x);
x = x - 1;
Match t = seg.Match(test.subject);
string tsplit = t.ToString();
string tsegsplit = tsplit.Replace("(",
"").Replace(")", "");
string[] tsegments = tsegsplit.Split('/');
index = int.Parse(tsegments[0]);
//Console.WriteLine(counter);
if (index < max)
{
found[index] = test;
counter++;
}
}
}
//Console.WriteLine("counter = {0}", counter);
int testmax = max-1;
if (counter == testmax)
{
master.TrimToSize();
Console.WriteLine("We Have a Match {0}",
found[1].subject);
}
}
}
}
}
}
"Extremest" wrote... I know there are ways to make this a lot faster.
Even the elimination of a single statement counts? ;-)
I won't get into the possible overuse of regex and splits, but just make a
comment on the pattern of "removing elements from a collection while looping
through it". As it moves the remaining elements up one position, that type
of loop is generally better to do in reverse.
As I only skimmed hastily through the code, I don't know whether there would
be any other side effects, but by reversing the loop, you shouldn't need to
decrement x within the loop as well.
static void find() { while (master.Count > 0) {
[snip] if (index < max) { found[index] = start;
This one -----------------v
for (int x = 0; x < master.Count; x++) {
[snip]
for (int x = master.Count-1; x >= 0 ; x--)
{
Header test = (Header)master[x];
if (test.subject.Contains(testsubject))
{
master.RemoveAt(x);
// x = x - 1; <- Not necessary...
Match t = seg.Match (test.subject);
...etc...
/// Bjorn A
ok I used what you said about the end for the loop. Also I redid the
main loop so that it starts off by taking the last element fromt he
arraylist instead of the first. By doing this it helped to speed it up
with having matches closer together. I do not know how to remove any
of the regex. That is the only way I know of to find what I want. If
you guys need I can post a couple of subjects that it would be parsing
to let yea know what it is actually going to be looking at.
ok I jsut went through the mysql manual and it does not allow "."
periods in the table or db names. I will be ok for now. Only indexing
tables that i want so won't be a problem for a while. So far the prog
is doing really good. Have done a lot of changes. Going to work on
the grouping thing next if I can figure it out. Ex*******@extremest.com wrote: I know there are ways to make this a lot faster. Any newsreader does this in seconds. I don't know how they do it and I am very new to c#. If anyone knows a faster way please let me know. All I am doing is quering the db for all the headers for a certain group and then going through them to find all the parts of each post. I only want ones that are complete. Meaning all segments for that one file posted are there.
Rule number one of optimizing code is going back to your algorithm and
see if that's optimal. Often you can spot 'hotspots' in an algorithm
quite easily, for example if you have a part of the algorithm which has
to be performed a lot of times. It's then best to invent a NEW
algorithm which does things more efficient. Often this requires to
start from scratch and do things completely different.
After you've optimized your algorithm, modify your code so it matches
the new algorithm.
Rule number two is measuring, with software performance measuring this
means: profiling. Download a .NET profiler and measure your code. Only
then you'll KNOW which parts are slow and which parts arent. If you
don't measure / profile your code, you will never be able to optimize a
slow piece of code, as chances are you'll have to guess which parts are
slow and will then optimize things which aren't slow or not significant
in the whole process.
Rule number three is that you have to avoid micro-optimizations. This
means that you always, in all cases, have to start from rule number 1,
and then do rule number 2. Micro-optimizations are what is done in this
thread, no offence to the people who helped you out as all they have is
your code. When you're doing micro-optimizations you look at the code
and guess which parts are slow, and then try to change them with what
you think are faster constructs.
Often this doesn't make a difference or makes things worse. The thing
is: if you have a slow piece of code but it is run once and takes 0.3
seconds to complete and you have a tight loop which takes 0.01 seconds
to complete but is run 10,000 times, which part of the code is
significant for the run of your program? The tight loop might look
fast, but in the end it's the bottleneck, not that piece of code which
takes 0.3 seconds.
Though as I said before, people in this newsgroup only have your code
snippet, so have to fall back on micro-optimizing, as there's no
explanation of the algorithm nor are there design decision motivations
available.
The thing which to me looks really slow is the LIKE predicate in your
query. LIKE is slow, especially when you use wildcards like the way you
do it.
To give you a hint of how an algorithm change can help you
tremendously here (as an example of rule nr. 1): what will your app do
the most: reading or writing? My guess: reading. So it will spend say
90% of its time reading data over and over again and 10% of its time
saving data.
This thus means that you have to optimize for reading, not writing.
This thus means that you have to avoid doing as much operations as
possible when you read data. It should be as firing a select, and
dumping the results on the screen, simplisticly said. So you should
move the processing of what's inside the DB to the WRITER logic. There,
you know what's going to be saved or better: you can analyse it and
retrieve extra information from it when it's written. THIS information
is then also stored in the DB.
When you READ data, you then simply use a couple of JOINs and simple
WHERE predicates (so no LIKE's) to fetch the data you need and you can
completely avoid the processing of data read.
This is an algorithm change, but it will beat any code-optimization
hands down.
Always pre-calculate as much as you can to avoid stalls in often used
code. Process data to get info over and over again? Do it once and
create a lookup table at runtime, saves you processing in all
subsequential reads. Simple, yet very effective.
Good luck :)
FB
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
ok I think I am getting what you are talking about. redo my header
prog that gets the headers and have it go ahead and find the max and
the segment number and realsubject. Pretty much redo my struct in my
header prog to match my new header class I have in the sort prog. then
remove a couple of things fromt he sort. and bam sort would be real
quick. getheader prog prolly won't slow down to much since only have
3mbit connection to get them. Also add the new column's to the db so
that they are there. I get it will implement it immediately. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: mike420 |
last post by:
In the context of LATEX, some Pythonista asked what the big
successes of Lisp were. I think there were at least three *big*
successes.
a. orbitz.com web site uses Lisp for algorithms, etc.
b....
|
by: Mark Harrison |
last post by:
I have indexed two columns in a table. Can somebody explain to me why
the first query below uses an Index Scan while the second uses a Seq
Scan?
Many TIA!
Mark
planb=# \d abcs
Table...
|
by: b0yce |
last post by:
Just a query to see if anyone can tell me. Which is the quicker/best way of
a handling Type-Safe collections/Dictionaries?
1) Create a Custom Collection/Dictionary derived from...
|
by: Roy Gourgi |
last post by:
Hi,
I am new to C#. I have the same time scheduling program written in C++ and
it is 5 times faster than my version in C#.
Why is it so slow as I thought that C# was only a little slower than...
|
by: Domino |
last post by:
http://www.phpnet.us/ - anyone had an experience?
Are they reliable, trustworthy?
Thanks,
|
by: 2Barter.net |
last post by:
" Given BACK what was freely GIVEN "
More options
2 messages - Expand all
2Barter.net
View profile
More options Dec 12, 9:48 pm
Blessing Are Country
|
by: JohnQ |
last post by:
(The "C++ Grammer" thread in comp.lang.c++.moderated prompted this post).
It would be more than a little bit nice if C++ was much "cleaner" (less
complex) so that it wasn't a major world wide...
|
by: Ken Fine |
last post by:
I am using VS.NET 2008 and like it a lot. One of the very few things I don't
like is a bug that seems to spawn literally thousands of strings, one
after the other, on design view changes....
|
by: Python Maniac |
last post by:
I am new to Python however I would like some feedback from those who
know more about Python than I do at this time.
def scrambleLine(line):
s = ''
for c in line:
s += chr(ord(c) | 0x80)...
|
by: Rina0 |
last post by:
Cybersecurity engineering is a specialized field that focuses on the design, development, and implementation of systems, processes, and technologies that protect against cyber threats and...
|
by: erikbower65 |
last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps:
1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal.
2. Connect to...
|
by: linyimin |
last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
|
by: erikbower65 |
last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA:
1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
|
by: Taofi |
last post by:
I try to insert a new record but the error message says the number of query names and destination fields are not the same
This are my field names
ID, Budgeted, Actual, Status and Differences
...
|
by: Rina0 |
last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
|
by: DJRhino |
last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer)
If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _
310030356 Or 310030359 Or 310030362 Or...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: DJRhino |
last post by:
Was curious if anyone else was having this same issue or not....
I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
| |