473,785 Members | 2,761 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Load large files

I am trying to load large flat files; > 250 MB. What is the best practice
for this? I have already read through everything I could find on Google, to
no avail.

I have a SQL DTS process that will load the file in about 7 minutes. My
code takes about an hour.

Here is a snippet:

Dim objSR As StreamReader = System.IO.File. OpenText(objOri gFile)

Dim curEncoding As System.Text.Enc oding = objSR.CurrentEn coding

Dim loopX As Integer = 0
Do While objSR.Peek > -1
loopX += 1

Dim strRec As String = objSR.ReadLine
arrA = Split(strRec, coldel)

Try
If trailerSigHas Then
If getRecordData(t railerSigPos - 1).ToString =
trailerSig Then
If loopX < 2 Then
Throw New Exception("Empt y extract file")
Else
Exit Do
End If
End If
End If

Dim tmpLngth As Integer = arrA.Length
Dim tmpColCount As Integer =
iNode.Item("col umns").ChildNod es.Count ' objDS.Tables(0) .Columns.Count
If tmpLngth > tmpColCount Then
Throw New Exception("Too many columns in extract
row.")
ElseIf tmpLngth < tmpColCount Then
Throw New Exception("Not enough columns in extract
row.")
End If

Dim newRow As DataRow = objDS.Tables(0) .NewRow

For x As Integer = 0 To objDS.Tables(0) .Columns.Count -
1
Dim tmpSqlCol As String =
objDS.Tables(0) .Columns(x).Cap tion
Dim tmpNode As XmlNode =
iNode.SelectSin gleNode("//columns/column[@sqlcol='" & tmpSqlCol & "']")
'"//columns/column[sqlcol='" & tmpSqlCol & "']")
Dim attrID As String =
tmpNode.Attribu tes("id").Inner Text
Dim xmlCol As Integer = Convert.ToInt32 (attrID)
Dim arrVal As Object = getRecordData(x mlCol - 1)
newRow(x) = GetObjectValue( x, arrVal) ' Does a
Convert to the target column type
Next

objDS.Tables(0) .Rows.Add(newRo w)
Catch ex As Exception
Dim newRow As DataRow = objDS.Tables(1) .NewRow
Dim finalX As Integer = 0
For x As Integer = 0 To objDS.Tables(1) .Columns.Count -
1
Try
Dim strColCaption As String =
objDS.Tables(1) .Columns(x).Cap tion
Dim tmpINode As XmlNode =
iNode.SelectSin gleNode("//columns/column[@sqlcol='" & strColCaption & "']")

Dim attr2ID As String =
tmpINode.Attrib utes("id").Inne rText
Dim sqlColInt As Integer =
Convert.ToInt32 (attr2ID)
If x = 0 Then
newrow(x) = Left(getRecordD ata(sqlColInt -
1).ToString, 8000)
Else
newrow(x) = Left(getRecordD ata(sqlColInt -
1).ToString, 250)
End If
Catch ' non-existent column
Finally
finalX = x
End Try
l001f:
Next
newrow(finalX - 4) = Left(ex.Message , 150)
newrow(finalX - 3) = DateTime.Now
newrow(finalX - 2) = tblName
newrow(finalX - 1) = strFileName
newrow(finalX) = loopX
objDS.Tables(1) .Rows.Add(newRo w)
End Try
Loop
objSR.Close()
objSR = Nothing
End If

objConn.Open()
Dim rowCount As Integer = objDS.Tables(0) .Rows.Count
Dim rowCount2 As Integer = objDS.Tables(1) .Rows.Count
Dim updCount As Integer

If objConn.State <> ConnectionState .Closed Then objConn.Close()
updCount = objDAmain.Updat e(objDS, tblName)

If objConn.State <> ConnectionState .Closed Then objConn.Close()
updCount = objDAerr.Update (objDS, tblName & "_err")

objConn.Close()

objDS = Nothing

Thank you very much,

Steve
Nov 21 '05 #1
5 1563
Hi

I think you may try to use the FileStream to see if that helps you.
File Merger and Splitter in C#
http://www.c-sharpcorner.com/1/FileM...ndSplitter.asp

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #2
I am loading the datatable with the schema, so I can validate the input file
has the correct datatype, row by row, column by column.
"Cor Ligthert" <no************ @planet.nl> wrote in message
news:uM******** ******@tk2msftn gp13.phx.gbl...
Planinumbay.

Maybe I see something wrong, what gives me the question why you are creating a datatable and not just do a batch proccess,

In pseudo
\\\
Create Insert string
Set parameters
do
ReadLine
set parameter.value s
command.execute nonquery
loop
////

http://msdn.microsoft.com/library/de...querytopic.asp
Cor

Nov 21 '05 #3
Platiumbay,
I am loading the datatable with the schema, so I can validate the input
file
has the correct datatype, row by row, column by column.


That does the setting of the value in the SQLcommandparam eters as well.

However when the schema or that should check the correct datatype you are in
my opinion to late.

Cor
Nov 21 '05 #4
Unfortunately, the invalid data could be anything. A field overflow,
string in a boolean, too many column, too few columns, invalid date, etc.
Our backend is running legacy code to generate these extracts, and is prone
to failure. Our front end needs to be able to handle anything. Hence why I
am writing this code. I just need to figure out the quickest way to read
the flat file, validate the data, and load it. Validation failure needs to
happen on a row by row basis to load as much good data as possible.
"Cor Ligthert" <no************ @planet.nl> wrote in message
news:#E******** ******@TK2MSFTN GP10.phx.gbl...
Platiumbay,
I am loading the datatable with the schema, so I can validate the input
file
has the correct datatype, row by row, column by column.

That does the setting of the value in the SQLcommandparam eters as well.

However when the schema or that should check the correct datatype you are

in my opinion to late.

Cor

Nov 21 '05 #5
Platinumbay,

In my opinion stays the way straigt forward.

I would create from your legaly file fields, when I read them line by line
with Substring
Validate those fields and take the right decisions on that.

When the values of a line are correct I would add them to the
parametervalues as I said, and use an command.execute nonquiry to fill it.

I think that what you do, as far as I can understand it, you can do as well
with the commandparamate rs. However I will never put a catch in my code with
no action what I saw in yours.

http://msdn.microsoft.com/library/de...classtopic.asp

Just my thought,

Cor
Nov 21 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
6850
by: Gustaf Liljegren | last post by:
I need to merge several XML files into one large. All of them has a DOCTYPE tag, but the SYSTEM identifier points to a DTD that doesn't exist. (I use the PUBLIC identifier with catalog files, so the SYSTEM identifiers has no purpose in my application.) Anyway, when I load each document, using XmlDocument.Load(file), I get a FileNotFoundException, because of the DTD pointer in the SYSTEM identifier. I'd rather skip all the entity...
7
3783
by: jane | last post by:
I'm going to use cursor load to load 200GB data in my production database. My database has 2 partitions. but I cannot find more info in the manual about this cursor load. I'm concern about loading 200GB , this large amount of data, is there any limitation?
12
6191
by: Sharon | last post by:
I’m wrote a small DLL that used the FreeImage.DLL (that can be found at http://www.codeproject.com/bitmap/graphicsuite.asp). I also wrote a small console application in C++ (unmanaged) that uses the DLL above. Now the application, together with the above DLL’s is successfully loading a TIF image file (62992 x 113386 Pixels, Huffman RLE compression, 3200 x 3200 DPI resolution, binary colored (1 Bit Per Pixel), file on disk size 43.08...
10
2652
by: GeekBoy | last post by:
Okay, I have two identical web servers running Windows 2003 web server. I have an ASP.NET application which runs great on one of them. Dedicated IP address, behind our firewall, etc. Everyone's happy. Now -- how do I take advantage of that second computer to "load-balance" the web site? Will it really give my users a noticable performance increase? How do you accomplish this? I've read many of those MS articles and it's...
4
2213
by: Ravi Ambros Wallau | last post by:
Hi: We developed a set of ASP.NET Web Applications that never runs in stand-alone mode, but always inside a portal (Rainbow Portal). All modules are copied on that portal. My question is: load time takes, sometimes, three or four of minutes in a medium-level machine (a PIII 1.5 Ghz), when the binary contents are changed, or if the time of last modification of the web.config file is changed. An application that runs in "stand-alone" mode...
5
1550
by: Ravi Ambros Wallau | last post by:
Dear friends: This is my third question on this news in three days... Sorry for this spam behavior :-) I've a lot of problems on "first page load" after some change on web.config or even in the contents of bin folder... Nathan Sokalski passed me a link to ILMerge library, I've made some tests grouping a set of DLL's that are loaded without reflection, but there were no improvements on load time. I've noted that ASP.NET copies all...
3
7203
by: db2udbgirl | last post by:
Env: DB2 UDB 8.2, AIX 5.3 While trying to load data (73 Million rows, Medium size table uses 4K tablespace) into a table using cursor it fails with "SQL0964C The transaction log for the database is full. SQLSTATE=57011" But I felt that Load utility wont log any acvitities on the database. Here it what I tried to do db2 "declare c1 cursor for select CAST(PARTITIONING_NBR AS CHAR(10)), VEH_IDENT_NBR, OPTN_CD, VIN_TYPE_CD,...
7
6576
by: P. Adhia | last post by:
Sorry for quoting an old post and probably I am reading out of context so my concern is unfounded. But I would appreciate if I can get someone or Serge to confirm. Also unlike the question asked in the post below, my question involves non-partitioned table loads. I want to know if, in general, loading from cursor is slower than loading from a file? I was thinking cursor would normally be faster, because of DB2's superior buffer/prefetch...
5
3267
by: mike_dba | last post by:
I am looking for comments on experience using a Load from Cursor across multiple db's databases. I have a multi-terrabyte database across many partitions that includes a large table (1 Tb+). The system also contains UTF-8 and LOB data. I am about to refresh the existing platform - going from V8.2 to V9 and leveraging new hardware. I an staying with SuSe Linux at 64-bit. Does anyone have experience using this? My initial plan was to...
0
9645
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9481
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10095
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9953
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8978
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7502
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5383
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5513
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4054
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.