473,811 Members | 1,693 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

File splitting

Jon
I am not too familiar with working with files, so I'd like some advice. I
need to write a function for my program that take large text files (> 150
MB) and splits them into several text files of 1000 lines each. What is the
most efficient method of doing this with the framework?

Thanks

--

*************** ******
Jon
Nov 20 '05 #1
5 1296
Cor
Hi Jon,

A text file you can only read from the start till the end, so first a
question, what is the structure of that textfile (organised lines), plain
text, even a word document some call a text file etc.

With some more information I think some of us can help you maybe?

Now it can not be more than read it and make a new file after 100 lines.

Cor
Nov 20 '05 #2
Jon
It is an ASCII text file with thousands of individual lines (about 50
characters per line). The only way I could think to do it was what you
mentioned, open the file, read in 1000 lines, write those to a file. Read
in another 1000 lines, write to a file, etc...but, that seems terribly
inefficient.

Thanks!

Jon
"Cor" <no*@non.com> wrote in message
news:Oz******** ******@TK2MSFTN GP10.phx.gbl...
Hi Jon,

A text file you can only read from the start till the end, so first a
question, what is the structure of that textfile (organised lines), plain
text, even a word document some call a text file etc.

With some more information I think some of us can help you maybe?

Now it can not be more than read it and make a new file after 100 lines.

Cor

Nov 20 '05 #3
Cor
Hi Jon,

I think that will be the most efficient way to do it if that fits your
problem.

That gives the fewest IO and memory usage.

All other methods will cost you more.

Cor
Nov 20 '05 #4
Jon
"Cor" <no*@non.com> wrote in message
news:e%******** ********@TK2MSF TNGP11.phx.gbl. ..
Hi Jon,

I think that will be the most efficient way to do it if that fits your
problem.

That gives the fewest IO and memory usage.

All other methods will cost you more.

Cor

Thanks
Nov 20 '05 #5
* "Jon" <ru*******@hotm ail.com> scripsit:
I am not too familiar with working with files, so I'd like some advice. I
need to write a function for my program that take large text files (> 150
MB) and splits them into several text files of 1000 lines each. What is the
most efficient method of doing this with the framework?


Have a look at the 'StreamReader' and 'StreamWriter' classes.

Basic code for reading the lines of a file:

\\\
Imports System.IO
..
..
..
Dim sr As New StreamReader("C :\WINDOWS\WIN.I NI")
Dim strLine As String
strLine = sr.ReadLine()
Do Until strLine Is Nothing
MsgBox(strLine)
strLine = sr.ReadLine()
Loop
sr.Close()
///

You can add a line number counter to detect when a new file should be
started. Writing a file can be done with the 'StreamWriter' class
(method 'WriteLine').

--
Herfried K. Wagner
MVP · VB Classic, VB.NET
<http://www.mvps.org/dotnet>

<http://www.plig.net/nnq/nquote.html>
Nov 20 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

22
5322
by: RG | last post by:
I am trying to process a CSV file but am having trouble with my hosts maximum execution time of 30 seconds. This is how the script works at the moment. User uploads their CSV file The script goes through the file() and writes smaller chunk files. The script then goes through processing the smaller files, populating the database, deleting the processed file and refreshing itself, thus starting again. This system works for files up to...
10
13377
by: BCC | last post by:
Hi, I have a tab separated value table like this: header1 header2 header3 13.455 55.3 A string 4.55 5.66 Another string I want to load this guy into a vector of vectors, since I do not know how long it may be. I think I have to have a vector of vectors of strings, and then extract the doubles later(?):
1
1744
by: Andy Britcliffe | last post by:
Hi I'm faced with the situation where I could have a single physical file that could contain multiplie XML documents e.g file.txt contains the following: <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE doc SYSTEM "1.0b.dtd"> <doc transmission-date="20050715T154340Z" >
18
7348
by: Andre Laplume via AccessMonster.com | last post by:
I have inherited a bunch of dbs which are are shared among a small group in my dept. We typically use the dbs to write queries to extract data, usually dumping it into Excel. Most dbs originated in MsAccess 97 or prior and have been converted to 2003. On occassion user 1 will open a db. When user 2 opens the db it will not let user 2 modify macros and what not. I can understand this and realize we could split the db; it is not worth ...
2
1566
by: Afifov | last post by:
How can I split a file into several files each with a fixed size? I recall that on linux there is a command that has a counter for file size in bytes but cant seem to remember. Help is appreciated.
2
358
by: Dan DeLuca | last post by:
All, My understanding of the Windows file system is somewhat limited so I apologize if my question is a bit basic. I am having some issues with the System.IO classes, which I use to copy files on a server, when the server directories start to get full. The program starts to get painfully slow. My thought is that the directories are just getting too full and I need to look at chopping up
1
1267
by: Skc | last post by:
I have used a program called chainsaw to split the file into manageable chunks, but with issues as the file contains crlf (carriage return line feeds on every line) and have found chainsaw totally useless as it splits the file in the middle of a crlf (not on the actual crlf - or end of the line it is on): e.g. line 1: a,b,c,d,e --> the field you cannot see as it is hidden
2
1774
by: Jenny | last post by:
Hello All! I have a long XML file that I should transmit to other computer using http. Problem is that the whole XML Document is too large for one transmitting. What is the nicest way to split XML document into smaller pieces e.g. to 10 pieces? XML document is same kind what comes to itäs tags.
3
1383
by: tac-tics | last post by:
I know about os.path.split(), but Is there any standard function for "fully" splitting a file's pathname? A function that is the opposite of the os.path.join() function? For example: In the meanwhile, I'll do this by hand. I'm just curious if there is a standard way to do this.
22
1671
by: Dale Pennington | last post by:
I find myself in the odd situation of trying to determine the name of a file that has been opened somewhere else with an fopen call, so all I have is the FILE * from that fopen call. I perused my favorite online C API reference and did not find a method, but hopefully that means I did not look in the right places (we all know how easy it is to find stuff once you know where to acutally look). So, the question, given a FILE *, can one...
0
9605
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10648
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10402
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10135
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9205
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7670
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6890
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5554
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
3
3018
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.