473,407 Members | 2,314 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

someone an idea how i can optimize this a bit more

The below code is to split a mysql file into a file that MSSQL can
understand ( only the DDL should be manually modified )

It does work however it is verry verry slow , i tried it on a AMD 64 3400 +
machine with 1,5 GB of physical memory installed and after 12 hours i had
SQL files worth of 150 MB
while the source file is 7 GB

is .Net really so slow on IO or am i missing something here

i compiled the executable ofcourse with optimizations and use framework 1.1
I tried to optimize the code by highing up the linecount var to 100.000 or
even a million but it looks like the higher this var gets the slower the app
becomes
i also tried to do the replace function on the sline var but this also has
no noticable effect
annyone some tips that might help me ?? i have the idea that the read is
terrible slow , another thingy that is strange is that when i used the
larger amounts of linecounts i saw the memory allocation switching between
+ _ 120 and 300 MB what i would like to see is that it took the hole + 1
GB of physical memory that is free on the machine ( it does take 99% CPU
also strange that systems is still a bit responsive but i see this as a pro
:-) )

Dim sread As New StreamReader("D:\rawmysqldump.sql") ' this is a file of 7
GB
Dim sline As String = ""
Dim sbodyDDL As String
Dim sBody As String = ""

Dim lineCount As Long
Dim dumpcount As Long
Label1.Text = "0"

Dim swrite As StreamWriter
Do Until sread.Peek = -1
sline = Trim(sread.ReadLine)
If sline.Length > 6 AndAlso sline.Substring(0, 6) = "INSERT"
Then
sBody = sBody.Concat(sBody, sline, vbCrLf)
lineCount += 1
If lineCount = 10000 Then ' generate a new sql file every
10000 records
sBody = sBody.Replace("`", "")
sBody = sBody.Replace("""", "'")
dumpcount += 1
swrite = New StreamWriter("D:\dump\" &
dumpcount.ToString & ".sql")
swrite.Write(sBody)
swrite.Close()
sBody = ""
lineCount = 0
Label1.Text = dumpcount.ToString
Label1.Refresh()
End If
Else
sbodyDDL = sbodyDDL.Concat(sbodyDDL, sline, vbCrLf)
TextBox1.Text = sbodyDDL
TextBox1.Refresh()
End If
Loop

If sBody.Length > 0 Then
dumpcount += 1
swrite = New StreamWriter("D:\dump\kamm" &
dumpcount.ToString & ".sql")
sBody = sBody.Replace("`", "")
sBody = sBody.Replace("""", "'")
swrite.Write(sBody)
swrite.Close()
sBody = ""
dumpcount = dumpcount + lineCount
Label1.Text = dumpcount.ToString
Label1.Refresh()
Application.DoEvents()
End If
swrite = New StreamWriter("D:\kamdump\DDL.sql")
swrite.Write(sbodyDDL)
swrite.Close()
MsgBox(" ready ! ")
Nov 21 '05 #1
8 967
Hi,

Try using a stringbuilder instead of string. Everytime you add to a
string it has to create a new string and destroy the old one.
http://msdn.microsoft.com/library/de...classtopic.asp

Ken
-------------------
"M. Posseth" <mi*****@nohausystems.nl> wrote in message
news:cu**********@reader10.wxs.nl...
The below code is to split a mysql file into a file that MSSQL can
understand ( only the DDL should be manually modified )

It does work however it is verry verry slow , i tried it on a AMD 64 3400 +
machine with 1,5 GB of physical memory installed and after 12 hours i had
SQL files worth of 150 MB
while the source file is 7 GB

is .Net really so slow on IO or am i missing something here

i compiled the executable ofcourse with optimizations and use framework 1.1
I tried to optimize the code by highing up the linecount var to 100.000 or
even a million but it looks like the higher this var gets the slower the app
becomes
i also tried to do the replace function on the sline var but this also has
no noticable effect
annyone some tips that might help me ?? i have the idea that the read is
terrible slow , another thingy that is strange is that when i used the
larger amounts of linecounts i saw the memory allocation switching between
+ _ 120 and 300 MB what i would like to see is that it took the hole + 1
GB of physical memory that is free on the machine ( it does take 99% CPU
also strange that systems is still a bit responsive but i see this as a pro
:-) )

Dim sread As New StreamReader("D:\rawmysqldump.sql") ' this is a file of 7
GB
Dim sline As String = ""
Dim sbodyDDL As String
Dim sBody As String = ""

Dim lineCount As Long
Dim dumpcount As Long
Label1.Text = "0"

Dim swrite As StreamWriter
Do Until sread.Peek = -1
sline = Trim(sread.ReadLine)
If sline.Length > 6 AndAlso sline.Substring(0, 6) = "INSERT"
Then
sBody = sBody.Concat(sBody, sline, vbCrLf)
lineCount += 1
If lineCount = 10000 Then ' generate a new sql file every
10000 records
sBody = sBody.Replace("`", "")
sBody = sBody.Replace("""", "'")
dumpcount += 1
swrite = New StreamWriter("D:\dump\" &
dumpcount.ToString & ".sql")
swrite.Write(sBody)
swrite.Close()
sBody = ""
lineCount = 0
Label1.Text = dumpcount.ToString
Label1.Refresh()
End If
Else
sbodyDDL = sbodyDDL.Concat(sbodyDDL, sline, vbCrLf)
TextBox1.Text = sbodyDDL
TextBox1.Refresh()
End If
Loop

If sBody.Length > 0 Then
dumpcount += 1
swrite = New StreamWriter("D:\dump\kamm" &
dumpcount.ToString & ".sql")
sBody = sBody.Replace("`", "")
sBody = sBody.Replace("""", "'")
swrite.Write(sBody)
swrite.Close()
sBody = ""
dumpcount = dumpcount + lineCount
Label1.Text = dumpcount.ToString
Label1.Refresh()
Application.DoEvents()
End If
swrite = New StreamWriter("D:\kamdump\DDL.sql")
swrite.Write(sbodyDDL)
swrite.Close()
MsgBox(" ready ! ")

Nov 21 '05 #2
Ken,

I was investigagint where you was meaning that, because I thought it was
just reading and writing lines. Therefore I extend your answer here.
Try using a stringbuilder instead of string. Everytime you add to a
string it has to create a new string and destroy the old one.


Instead of the Concat
sBody = sBody.Concat(sBody, sline, vbCrLf)

outside the loop
dim sBody as new stringbuilder

And then when you do it even this seperated (although that is about
nanoseconds)

sBody.Append(sline)
sBody.Append(vbCrLf)

Cor

Nov 21 '05 #3
"Cor Ligthert" <no************@planet.nl> wrote in message
news:OC*************@tk2msftngp13.phx.gbl...
Ken,

I was investigagint where you was meaning that, because I thought it was
just reading and writing lines. Therefore I extend your answer here.
Try using a stringbuilder instead of string. Everytime you add to
a
string it has to create a new string and destroy the old one.


Instead of the Concat
sBody = sBody.Concat(sBody, sline, vbCrLf)

outside the loop
dim sBody as new stringbuilder

And then when you do it even this seperated (although that is about
nanoseconds)

sBody.Append(sline)
sBody.Append(vbCrLf)

Cor


Did I miss something.
Am I being too simple...?

I thought there was an odbc driver for MySQL.
Wouldn't opening it up with a dataadaptor via odbc be better?

--
Regards,
Andy O'Neill
Nov 21 '05 #4
In addition to the other responses, updating and refreshing a label
will also slow down the process. It might help to delegate the code
that reads the file to its own thread.

Nov 21 '05 #5
Andy,

When the database is in Africa and the Files are in China.

There is nothing told about the database and what is the reason that this is
the way it should be done.

The question was "how i can optimize this a bit more". Not "how can I do
this better".

It can be that you are right, however the question is in my opinion not
pointing in this direction.

Just my thought,

Cor
Nov 21 '05 #6
Andy O'Neill wrote:
"Cor Ligthert" <no************@planet.nl> wrote in message
news:OC*************@tk2msftngp13.phx.gbl...
Ken,

I was investigagint where you was meaning that, because I thought it was
just reading and writing lines. Therefore I extend your answer here.

Try using a stringbuilder instead of string. Everytime you add to
a
string it has to create a new string and destroy the old one.


Instead of the Concat
sBody = sBody.Concat(sBody, sline, vbCrLf)

outside the loop
dim sBody as new stringbuilder

And then when you do it even this seperated (although that is about
nanoseconds)

sBody.Append(sline)
sBody.Append(vbCrLf)

Cor

Did I miss something.
Am I being too simple...?

I thought there was an odbc driver for MySQL.
Wouldn't opening it up with a dataadaptor via odbc be better?

euhh well

what if you only have a sql dump file of 7 gig and a high performance
MSSQL server to get it in


Nov 21 '05 #7
Ken Tucker [MVP] wrote:
Hi,

Try using a stringbuilder instead of string. Everytime you add to a
string it has to create a new string and destroy the old one.
http://msdn.microsoft.com/library/de...classtopic.asp

Ken
-------------------
"M. Posseth" <mi*****@nohausystems.nl> wrote in message
news:cu**********@reader10.wxs.nl...
The below code is to split a mysql file into a file that MSSQL can
understand ( only the DDL should be manually modified )

It does work however it is verry verry slow , i tried it on a AMD 64 3400 +
machine with 1,5 GB of physical memory installed and after 12 hours i had
SQL files worth of 150 MB
while the source file is 7 GB

is .Net really so slow on IO or am i missing something here

i compiled the executable ofcourse with optimizations and use framework 1.1
I tried to optimize the code by highing up the linecount var to 100.000 or
even a million but it looks like the higher this var gets the slower the app
becomes
i also tried to do the replace function on the sline var but this also has
no noticable effect
annyone some tips that might help me ?? i have the idea that the read is
terrible slow , another thingy that is strange is that when i used the
larger amounts of linecounts i saw the memory allocation switching between
+ _ 120 and 300 MB what i would like to see is that it took the hole + 1
GB of physical memory that is free on the machine ( it does take 99% CPU
also strange that systems is still a bit responsive but i see this as a pro
:-) )

Dim sread As New StreamReader("D:\rawmysqldump.sql") ' this is a file of 7
GB
Dim sline As String = ""
Dim sbodyDDL As String
Dim sBody As String = ""

Dim lineCount As Long
Dim dumpcount As Long
Label1.Text = "0"

Dim swrite As StreamWriter
Do Until sread.Peek = -1
sline = Trim(sread.ReadLine)
If sline.Length > 6 AndAlso sline.Substring(0, 6) = "INSERT"
Then
sBody = sBody.Concat(sBody, sline, vbCrLf)
lineCount += 1
If lineCount = 10000 Then ' generate a new sql file every
10000 records
sBody = sBody.Replace("`", "")
sBody = sBody.Replace("""", "'")
dumpcount += 1
swrite = New StreamWriter("D:\dump\" &
dumpcount.ToString & ".sql")
swrite.Write(sBody)
swrite.Close()
sBody = ""
lineCount = 0
Label1.Text = dumpcount.ToString
Label1.Refresh()
End If
Else
sbodyDDL = sbodyDDL.Concat(sbodyDDL, sline, vbCrLf)
TextBox1.Text = sbodyDDL
TextBox1.Refresh()
End If
Loop

If sBody.Length > 0 Then
dumpcount += 1
swrite = New StreamWriter("D:\dump\kamm" &
dumpcount.ToString & ".sql")
sBody = sBody.Replace("`", "")
sBody = sBody.Replace("""", "'")
swrite.Write(sBody)
swrite.Close()
sBody = ""
dumpcount = dumpcount + lineCount
Label1.Text = dumpcount.ToString
Label1.Refresh()
Application.DoEvents()
End If
swrite = New StreamWriter("D:\kamdump\DDL.sql")
swrite.Write(sbodyDDL)
swrite.Close()
MsgBox(" ready ! ")

it is now superior fast on my AMD machine it took 15 minutes !!! this
is great !! the stringbuilder is a cool thingy

while with the normall strings it took 12 hours to only parse 350 mb
worth of files ( on the same machine )

thank you for pointing me in the right direction

Nov 21 '05 #8
"M.Posseth " <mi*****@nohausystems.nl> wrote in message
news:er**************@TK2MSFTNGP10.phx.gbl...
Andy O'Neill wrote:
"Cor Ligthert" <no************@planet.nl> wrote in message
news:OC*************@tk2msftngp13.phx.gbl...
Ken,

I was investigagint where you was meaning that, because I thought it was
just reading and writing lines. Therefore I extend your answer here.
Try using a stringbuilder instead of string. Everytime you add to
a
string it has to create a new string and destroy the old one.

Instead of the Concat
sBody = sBody.Concat(sBody, sline, vbCrLf)

outside the loop
dim sBody as new stringbuilder

And then when you do it even this seperated (although that is about
nanoseconds)

sBody.Append(sline)
sBody.Append(vbCrLf)

Cor

Did I miss something.
Am I being too simple...?

I thought there was an odbc driver for MySQL.
Wouldn't opening it up with a dataadaptor via odbc be better?

euhh well

what if you only have a sql dump file of 7 gig and a high performance
MSSQL server to get it in


Fair enough.
That'd be a good reason.
--
Regards,
Andy O'Neill
Nov 21 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

42
by: Mike P. | last post by:
Hello I come from the world of C++ programming, and I'm used to writing programs that are actually executed by the CPU, and that run with some semblance of performance. I have taken the time to...
13
by: Bryan Parkoff | last post by:
You may notice that switch (...) is much faster than function that can gain a big improved performance because it only use JMP instruction however function is required to use CALL, PUSH, and POP...
12
by: Brian Genisio | last post by:
Hi all, I am developing some software, that creates a tree of information. For each node, currently I am overriding the new operator, because it is a requirement that after initialization, no...
110
by: Mr A | last post by:
Hi! I've been thinking about passing parameteras using references instead of pointers in order to emphasize that the parameter must be an object. Exemple: void func(Objec& object); //object...
7
by: Steve | last post by:
I have a SQL query I'm invoking via VB6 & ADO 2.8, that requires three "Left Outer Joins" in order to return every transaction for a specific set of criteria. Using three "Left Outer Joins"...
0
by: Daniel | last post by:
Hi there, I recently came across an interesting option when right clicking on a project (Right click on the project -> properties -> Configuration Properties ->Build -> Optimize) There is an...
12
by: wxs | last post by:
Many times we have a bunch of enums we have from either different enums or the same enum that will have various numeric values assigned. Rarely will there be collisions in numbering between the...
4
by: Huaer.XC | last post by:
>From the following MySQL command: EXPLAIN SELECT * FROM t1 JOIN t2 ON (t1.id = t2.id) JOIN t3 ON t3.name = t1.name WHERE t1.id IN(123, 124); which result is:...
14
by: andreyvul | last post by:
I have this loop (all variables are pointers): for (foo = bar; foo baz; foo--) *(foo+1) = *foo; How do I optimize the pointer swap so that it uses -- and ++ or unary +- instead of +1 (if possible...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.