Reading a file, sans whitespace

Uri

I have a file that looks like this: (but longer, no wordwrap)

Name: Date: Time: Company: Employee Number:
Jim 2.03.04 12:00 JimEnt 4
Steve 3.04.32 03:00 SteveEnt 5

I want to load 'Jim' and '12:00' and those types of answers into
variables in my program, the only delimiter in the file is whitespace.
How do I do this?

I can do it with string.split(" ",[0]) type line for a file that's
only delimited by single spaces, but when I'm searching for white
space, how do I do it?

THanks!

Jul 18 '05 #1

Subscribe Reply

1579

Michael Geary

Uri wrote:

I have a file that looks like this: (but longer, no wordwrap)

Name: Date: Time: Company: Employee Number:
Jim 2.03.04 12:00 JimEnt 4
Steve 3.04.32 03:00 SteveEnt 5

I want to load 'Jim' and '12:00' and those types of answers
into variables in my program, the only delimiter in the file
is whitespace. How do I do this?

I can do it with string.split(" ",[0]) type line for a file that's
only delimited by single spaces, but when I'm searching
for white space, how do I do it?

Use a regular expression. For speed, precompile it at the beginning of your
program:

reWhitespace = re.compile( r'\s+' )

Then, split each line with:

fields = reWhitespace.sp lit( line )

-Mike

Jul 18 '05 #2

Rick L. Ratzel

How about this:

import re
for line in open( "inputFile" , "r" ).readlines(): .... print re.split( "\s+", line.strip() )
....
['Name:', 'Date:', 'Time:', 'Company:', 'Employee', 'Number:']
['Jim', '2.03.04', '12:00', 'JimEnt', '4']
['Steve', '3.04.32', '03:00', 'SteveEnt', '5']

-Rick Ratzel
Uri wrote: I have a file that looks like this: (but longer, no wordwrap)

Name: Date: Time: Company: Employee Number:
Jim 2.03.04 12:00 JimEnt 4
Steve 3.04.32 03:00 SteveEnt 5

I want to load 'Jim' and '12:00' and those types of answers into
variables in my program, the only delimiter in the file is whitespace.
How do I do this?

I can do it with string.split(" ",[0]) type line for a file that's
only delimited by single spaces, but when I'm searching for white
space, how do I do it?

THanks!

Jul 18 '05 #3

Tim Daneliuk

Uri wrote:

I have a file that looks like this: (but longer, no wordwrap)

Name: Date: Time: Company: Employee Number:
Jim 2.03.04 12:00 JimEnt 4
Steve 3.04.32 03:00 SteveEnt 5

I want to load 'Jim' and '12:00' and those types of answers into
variables in my program, the only delimiter in the file is whitespace.
How do I do this?

I can do it with string.split(" ",[0]) type line for a file that's
only delimited by single spaces, but when I'm searching for white
space, how do I do it?

THanks!

Say you have read a line in the above format into variable 's'.
Then,

l = s.split()

will return a list containing each of the fields of the line as
an entry with the whitespace stripped out. Then,

VarName = l[0]
VarDate = l[1]
VarTime = l[2]
VarCo = l[3]
VarEmp = l[4]
Is this what you had in mind?

--
----------------------------------------------------------------------------
Tim Daneliuk tu****@tundrawa re.com
PGP Key: http://www.tundraware.com/PGP/

Jul 18 '05 #4

Michael Geary

> Uri wrote:

I have a file that looks like this: (but longer, no wordwrap)

Name: Date: Time: Company: Employee Number:
Jim 2.03.04 12:00 JimEnt 4
Steve 3.04.32 03:00 SteveEnt 5

I want to load 'Jim' and '12:00' and those types of answers into
variables in my program, the only delimiter in the file is whitespace.
How do I do this?

I can do it with string.split(" ",[0]) type line for a file that's
only delimited by single spaces, but when I'm searching for white
space, how do I do it?

THanks!

Tim Daneliuk wrote: Say you have read a line in the above format into variable 's'.
Then,

l = s.split()

will return a list containing each of the fields of the line as
an entry with the whitespace stripped out. Then,

VarName = l[0]
VarDate = l[1]
VarTime = l[2]
VarCo = l[3]
VarEmp = l[4]

D'oh! That's much better than the regular expression solution I posted.

The regular expression split is good to know about for more complicated
patterns, but for simple whitespace splitting there's no need for it.

Thanks,

-Mike

Jul 18 '05 #5

Uri

> Tim Daneliuk wrote:

Say you have read a line in the above format into variable 's'.
Then,

l = s.split()

will return a list containing each of the fields of the line as
an entry with the whitespace stripped out. Then,

VarName = l[0]
VarDate = l[1]
VarTime = l[2]
VarCo = l[3]
VarEmp = l[4]

D'oh! That's much better than the regular expression solution I posted.

The regular expression split is good to know about for more complicated
patterns, but for simple whitespace splitting there's no need for it.

Thanks,

-Mike

Thanks guys! Tim's idea seems like the easiest for a newbie to
implement, but I'll play around with Mike's pre-compiling thing, too.
I don't really understand what the compile part does, could you
expound upon that?

Thanks for all your help guys!

Jul 18 '05 #6

Konstantin Veretennicov

"Michael Geary" <Mi**@DeleteThi s.Geary.com> wrote in message news:<10******* ******@corp.sup ernews.com>...

import re
reWhitespace = re.compile( '\s+' )
for line in file( 'inputFile' ).readlines():
print reWhitespace.sp lit( line.strip() )

But for a large file, the second version will be faster because the regular

And you'll want to use "for line in file('inputFile ')"
instead of "for line in file('inputFile ').readlines()" ,
especially for large files ;)

- kv

Jul 18 '05 #7

Terry Reedy

"Michael Geary" <Mi**@DeleteThi s.Geary.com> wrote in message
news:10******** *****@corp.supe rnews.com...

Uri wrote:
For example, these do exactly the same thing:

import re
for line in file( 'inputFile' ).readlines():
print re.split( '\s+', line.strip() )

import re
reWhitespace = re.compile( '\s+' )
for line in file( 'inputFile' ).readlines():
print reWhitespace.sp lit( line.strip() )

But for a large file, the second version will be faster because the regular expression is compiled only once instead of every time through the loop.

I am curious whether you have actually timed this or seen others timings.
My impression (from other posts and from reading the code a year ago) is
that the current re implementation caches compiled re's
(recache[hash(restring)] = re.compile(rest ring)) just so that the first
example will *not* recompile every time thru the loop. If so, I think one
should name an re for pretty much the same reasons as for anything else:
conceptual chunking and reuse in multiple places.

Terry J. Reedy

Jul 18 '05 #8

Michael Geary

> Michael Geary wrote:

For example, these do exactly the same thing:

import re
for line in file( 'inputFile' ).readlines():
print re.split( '\s+', line.strip() )

import re
reWhitespace = re.compile( '\s+' )
for line in file( 'inputFile' ).readlines():
print reWhitespace.sp lit( line.strip() )

But for a large file, the second version will be faster because
the regular expression is compiled only once instead of every
time through the loop.

Terry Reedy wrote: I am curious whether you have actually timed this or seen others
timings. My impression (from other posts and from reading the
code a year ago) is that the current re implementation caches
compiled re's (recache[hash(restring)] = re.compile(rest ring))
just so that the first example will *not* recompile every time thru
the loop. If so, I think one should name an re for pretty much the
same reasons as for anything else: conceptual chunking and reuse
in multiple places.

Oh man, is my face red! No, I didn't know about the caching, and I hadn't
timed this. One should never make assumptions about performance issues! :-)

Also, as Konstantin pointed out, file( 'inputFile' ).readlines() should be
just file( 'inputFile' ), and I just noticed that I didn't use raw strings
for the regular expressions. '\s+' happens to work, but it would be better
to be in the habit of writing r'\s+' instead. This was not my day for
posting good code samples!

Now that you've shamed me into actually testing the performance, it turns
out that precompiling the regular expression does make a difference.
Consider these examples:

import re, time
input = []
for i in xrange( 1000000 ):
input.append( '%d abc def ghi jkl mno pqr stu' % i )
start = time.time()
for line in input:
result = re.split( r'\s+', line )
print time.time() - start

import re, time
input = []
for i in xrange( 1000000 ):
input.append( '%d abc def ghi jkl mno pqr stu' % i )
start = time.time()
reWhitespace = re.compile( r'\s+' )
for line in input:
result = reWhitespace.sp lit( line )
print time.time() - start

On my PIII-1.2GHz system, the first version runs in 27 seconds, and the
second version runs in 18 seconds, quite an improvement. I would guess that
the hash lookup for the cached regular expression is what's taking the extra
time in the first version, but I don't want to assume that's what it is. :-)

-Mike

Jul 18 '05 #9

Similar topics

7962

Reading words with trailing whitespace

by: Brad Marts | last post by:

I am trying to read a file one word at a time, doing something with each word in between each read. When the file has trailing whitespace at the end of the last word, I wind up getting it twice. Code: #include<iostream> #include<fstream> #include<string> using namespace std;

C / C++

13307

? about reading a comma delimited file

by: Hilary Cotter | last post by:

Thanks for all the help you gave me yesterday. here is another question. I have a comma delimited file called redirect.txt which looks like this test, /test.htm test 123,/test123.htm

C / C++

17702

Reading Xml file using stream reader: different result VBNet vs. C#

by: Drew Yallop | last post by:

I read an XML file with a stream reader in VB.Net. When I look at the stream reader output in debug mode (by passing cursor over the stream reader object)the format is a perfect replica of the file as displayed when I open the xml file in VS .net 2003 IDE. When I perform the same procedure in C# the stream reader obkect displays a chaotic mess. Lots of whitespace after and "\r" and "\n" after each element. The problem is that I cannot...

C# / C Sharp

1493

XML Writing and Reading

by: jcrouse | last post by:

I am using the following code to write to an XML file myXmlTextWriter.Formatting = System.Xml.Formatting.Indente myXmlTextWriter.WriteStartElement("CPViewer" myXmlTextWriter.WriteElementString("Height", InputBoxHeight myXmlTextWriter.WriteElementString("Width", InputBoxWidth 'myXmlTextWriter.WriteElementString("Background", (OpenFileDialog1.FileName) myXmlTextWriter.WriteElementString("label1.Top", label1.Top...

Visual Basic .NET

3297

Reading numbers from a file

by: LuTHieR | last post by:

Hi, I'm reading a string of numbers from a file (using Borland C++ Builder 6), and I'm doing it like this: first I use FileRead to store all the data in the file to a char* variable (appropriately called 'data'). Then, I read every number using char *ptr; int value;

C / C++

6456

Paging in ASP.NET(separate code in .vb file)

by: farhad13841384 | last post by:

Hi , I Hope You fine. I have some problem with this code for paging in asp.net this bottom code work correctly without any error but when I try to place separate code in .VB file then error is begin and occured .I want to separate this code and compiling .vb code using VBC.exe later .(bin/paging.dll) when do it like me so you retrive only < Previous Page Next Page > in your web browser and you don't retrive list Of data in your web browser....

.NET Framework

4110

good algorithms come with practice and reading good code/books?

by: vlsidesign | last post by:

I am a newbie and going through "The C programming language" by Kernighan & Richie on my own time (I'm not a programmer but I want to learn because it can save me time in my normal job, and it is kind of fun). As I go through the book, I seek to do all the exercises because they are very useful, and good, but it seems like I am just stumbling through somewhat. In particular, I don't really know how to think about "catching errors", or how...

C / C++

3351

Reading the first line of a file (in a zipfile)

by: mike.aldrich | last post by:

Hi folks, I am trying to read the first occurence of non-whitespace in a file, within a zipfile. Here is my code: zipnames = glob.glob("<search_dir>*") for zipname in zipnames: z = zipfile.ZipFile(zipname, "r") for filename in z.namelist(): count = len(z.read(filename).split('\n')) if fnmatch.fnmatch(filename, "*AUDIT*"):

Python

5041

Implicit loop and reading in an unknown number of variables

by: imailz | last post by:

Hi all, since I'm forced to switch from Fortran to C I wonder if there is posibility in C: 1) to use implicit loops 2) to parse several variables which number is determined at runtime. Following example: The output contains n columns which have to be read in. The number of

C / C++

9577

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10569

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10315

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

7615

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6847

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5519

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5651

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3815

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2990

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General