issues simply parsing a whitespace-delimited textfile in pythonscript

Damon Getsman

Okay so I'm writing a script in python right now as a dirty fix for a
problem we're having at work.. Unfortunately this is the first really
non-trivial script that I've had to work with in python and the book
that I have on it really kind of sucks.

I'm having an issue parsing lines of 'last' output that I have stored
in a /tmp file. The first time it does a .readline() I get the full
line of output, which I'm then able to split() and work with the
individual fields of without any problem. Unfortunately, the second
time that I do a .readline() on the file, I am only receiving the
first character of the first field. Looking through the /tmp file
shows that it's not corrupted from the format that it should be in at
all... Here's the relevant script:

----
#parse
Lastdump = open('/tmp/esd_tmp', 'r')

#find out what the last day entry is in the wtmp
cur_rec = Lastdump.readli ne()
work = cur_rec.split()

if debug == 1:
print work
print " is our split record line from /tmp/esd_tmp\n"

startday = work[3]

if debug == 1:
print startday + " is the starting day\n"
print days
print " is our dictionary of days\n"
print days[startday] + " is our ending day\n"

for cur_rec in Lastdump.readli ne():
work = cur_rec.split()

if debug == 1:
print "Starting table building pass . . .\n"
print work
print " is the contents of our split record line now\n"
print cur_rec + " is the contents of cur_rec\n"

#only go back 2 days

while work[0] != days[startday]:
tmp = work[1]
if table.has_key(w ork[0]):
continue
elif tmp[0] != ':':
#don't keep it if it isn't a SunRay terminal
identifier
continue
else:
#now we keep it
table[work[0]] = tmp
----

the first and second sets of debugging output show everything as they
should be... the third shows that the next working line (in cur_rec),
and thus 'work', as well, only hold the first character of the line.
Here's the output:

----
Debugging run
Building table . . .

['dgetsman', 'pts/3', ':0.0', 'Wed', 'May', '21', '10:21', 'still',
'logged',
'in']
is our split record line from /tmp/esd_tmp

Wed is the starting day

{'Wed': 'Mon', 'Sun': 'Fri', 'Fri': 'Wed', 'Thurs': 'Tues', 'Tues':
'Sun',
'Mon': 'Sat', 'Sat': 'Thurs'}
is our dictionary of days

Mon is our ending day

Starting table building pass . . .

['d']
is the contents of our split record line now

d is the contents of cur_rec

----
And thus everything fails when I try to work with the different fields
in subsequent script afterwards. Does anybody have an idea as to why
this would be happening?

Oh, and if relevant, here's the datafile's first few lines:

----
dgetsman pts/3 :0.0 Wed May 21 10:21 still logged
in
dgetsman pts/2 :0.0 Wed May 21 09:04 still logged
in
dgetsman pts/1 :0.0 Wed May 21 08:56 - 10:21
(01:24)
dgetsman pts/0 :0.0 Wed May 21 08:56 still logged
in

I would really appreciate any pointers or suggestions you can give.

<a href="http://www.zoominfo.co m/people/Getsman_Damon_-214241.aspx">
*Damon Getsman
Linux/Solaris System Administrator
</a>

Jun 27 '08 #1

Subscribe Reply

1873

Damon Getsman

Okay, so I manged to kludge around the issue by not using
the .readline() in my 'for' statement. Instead, I'm slurping the
whole file into a new list that I put in for that purpose, and
everything seems to be working just fine. However, I don't know WHY
the other method failed and I'm at a loss for why that didn't work and
this is working. I'd really like to know the why about this issue so
that I don't have to use crappy coding practice and kludge around it
the next time I have an issue like this.

Any ideas much appreciated.

Damon G.

Jun 27 '08 #2

Paul McGuire

On May 21, 10:59*am, Damon Getsman <dgets...@amire hab.netwrote:

I'm having an issue parsing lines of 'last' output that I have stored
in a /tmp file. *The first time it does a .readline() I get the full
line of output, which I'm then able to split() and work with the
individual fields of without any problem. *Unfortunately, the second
time that I do a .readline() on the file, I am only receiving the
first character of the first field. *Looking through the /tmp file
shows that it's not corrupted from the format that it should be in at
all... *Here's the relevant script:

----
* * #parse
* * Lastdump = open('/tmp/esd_tmp', 'r')

* * #find out what the last day entry is in the wtmp
* * cur_rec = Lastdump.readli ne()
* * work = cur_rec.split()

* * if debug == 1:
* * * * print work
* * * * print " is our split record line from /tmp/esd_tmp\n"

* * startday = work[3]

* * if debug == 1:
* * * * print startday + " is the starting day\n"
* * * * print days
* * * * print " is our dictionary of days\n"
* * * * print days[startday] + " is our ending day\n"

* * for cur_rec in Lastdump.readli ne():
* * * * work = cur_rec.split()

<snip>
for cur_rec in Lastdump.readli ne():

is the problem. readline() returns a string containing the next
line's worth of text, NOT an iterator over all the subsequent lines in
the file. So your code is really saying:

next_line_in_fi le = Lastdump.readli ne():
for cur_rec in next_line_in_fi le:

which of course, is iterating over a string character by character.

Since you are opening Lastdump (not great casing for a variable name,
BTW - looks like a class name with that leading capital letter), it
gives you an iterator already. Try this instead:

lastdump = open('/tmp/esd_tmp', 'r')

cur_rec = lastdump.next()

...

for cur_rec in lastdump:

...

This should get you over the hump on reading the file.

Also, may I suggest this method for splitting up each record line, and
assigning individual fields to variables:

user,s1,s2,day, month,date,time ,desc = cur_rec.split(N one,7)

-- Paul

Jun 27 '08 #3

Damon Getsman

On May 21, 11:15 am, Paul McGuire <pt...@austin.r r.comwrote:

<snip>

for cur_rec in Lastdump.readli ne():

is the problem. readline() returns a string containing the next
line's worth of text, NOT an iterator over all the subsequent lines in
the file. So your code is really saying:

next_line_in_fi le = Lastdump.readli ne():
for cur_rec in next_line_in_fi le:

which of course, is iterating over a string character by character.

Since you are opening Lastdump (not great casing for a variable name,
BTW - looks like a class name with that leading capital letter), it
gives you an iterator already. Try this instead:

lastdump = open('/tmp/esd_tmp', 'r')

cur_rec = lastdump.next()

...

for cur_rec in lastdump:

...

This should get you over the hump on reading the file.

Also, may I suggest this method for splitting up each record line, and
assigning individual fields to variables:

user,s1,s2,day, month,date,time ,desc = cur_rec.split(N one,7)

-- Paul

Well the individual variables isn't exactly appropriate as I'm only
going to be using 2 of the fields. I think I will set those to
individual variables with a slice of what you mentioned, though, for
readability. Thank you for the tips, they were all much appreciated.

-Damon

Jun 27 '08 #4

Similar topics

2910

Help with parsing web page

by: RiGGa | last post by:

Hi, I want to parse a web page in Python and have it write certain values out to a mysql database. I really dont know where to start with parsing the html code ( I can work out the database part ). I have had a look at htmllib but I need more info. Can anyone point me in the right direction , a tutorial or something would be great. Many thanks

Python

3900

Regular expressions: parsing an "OLEDB like" connection string ...

by: Martin Robins | last post by:

I am trying to parse a string that is similar in form to an OLEDB connection string using regular expressions; in principle it is working, but certain character combinations in the string being parsed can completely wreck it. The string I am trying to parse is as follows: commandText=insert into (Text) values (@message + N': ' + @category);commandType=StoredProcedure; message=@message; category=@category I am looking to retrive name value...

.NET Framework

7012

Parsing a string using istringstream

by: Adam Parkin | last post by:

Hello all, I'm trying to write a function which given a std::string parses the string by breaking the sentance up by whitespace (\t, ' ', \n) and returns the result as a vector of strings. Here's what I have so far: std::vector<std::string> tokenize (std::string foo) { std::istringstream s (foo); std::vector <std::string> v; std::string tok;

C / C++

23596

parsing config file

by: Mantorok Redgormor | last post by:

If I am parsing a config file that uses '#' for comments and the config file itself is 1640 bytes, and the format is VARIABLE=VALUE, is it recommended to use a) fgetc (parse a character at a time) b) fgets (read in blocks of whatever size) c) fread (get the size of the file and fread the entire thing into memory) and when would it be appropriate to use either a, b, or c?

C / C++

2686

Regex parsing - numeric values with whitespace

by: David | last post by:

I have rows of 8 numerical values in a text file that I have to parse. Each value must occupy 10 spaces and can be either a decimal or an integer. // these are valid - each fit in a 10 character block 123.8 123.8 1234.567 12345 12345 1234.567

C# / C Sharp

4054

Parsing Baseball Stats

by: ankitdesai | last post by:

I would like to parse a couple of tables within an individual player's SHTML page. For example, I would like to get the "Actual Pitching Statistics" and the "Translated Pitching Statistics" portions of Babe Ruth page (http://www.baseballprospectus.com/dt/ruthba01.shtml) and store that info in a CSV file. Also, I would like to do this for numerous players whose IDs I have stored in a text file (e.g.: cobbty01, ruthba01, speaktr01, etc.)....

Python

4373

parsing an ifstream to get some specific text

by: toton | last post by:

Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in the file) with the location. And for a particular section I parse only that section. The file is something like, .... DATAS

C / C++

5492

parsing with std::istringstream.

by: Dave Townsend | last post by:

Hi, I have to read some memory data from a stream. This would be in the following format, for example: 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 that is i have 8 values on a line, separated by whitespace, except the last line which might not have a full complement of data.

C / C++

2821

Would a lack of line breaks in a doc cause parsing problems ?

by: charliefortune | last post by:

I am fetching some product feeds with PHP like this $merch = substr($key,1); $feed = file_get_contents($_POST); $fp = fopen("./feeds/feed".$merch.".txt","w+"); fwrite ($fp,$feed); fclose ($fp); and then parsing them with PHP's native parsing functions. This is succesful for most of the feeds, but a couple of them claim to be

.NET Framework

2185

String parsing program

by: pereges | last post by:

Hi I've a string input and I have to parse it in such a way that that there can be only white space till a digit is reached and once a digit is reached, there can be only digits or white space till the string ends. Am I doing this correctly ? : Code: #include <stdio.h> #include <string.h>

C / C++

8425

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8326

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8743

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

8522

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8622

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

5647

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4333

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2745

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1973

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP