473,322 Members | 1,409 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

buffering choking sys.stdin.readlines() ?

Newbie question:

I'm trying to turn a large XML file (~7G compressed) into a YAML file,
and my program seems to be buffering the input.

IOtest.py is just

import sys
for line in sys.stdin.readlines():
print line

but when I run

$ gzcat bigXMLfile.gz | IOtest.py

but it hangs then dies.

The goal of the program is to build a YAML file with print statements,
rather than building a gigantic nested dictionary, but I am obviously
doing something wrong in passing input through without buffering. Any
advice gratefully fielded.

-clay
Jun 27 '08 #1
2 2241
cshirky schrieb:
Newbie question:

I'm trying to turn a large XML file (~7G compressed) into a YAML file,
and my program seems to be buffering the input.

IOtest.py is just

import sys
for line in sys.stdin.readlines():
print line

but when I run

$ gzcat bigXMLfile.gz | IOtest.py

but it hangs then dies.

The goal of the program is to build a YAML file with print statements,
rather than building a gigantic nested dictionary, but I am obviously
doing something wrong in passing input through without buffering. Any
advice gratefully fielded.
readlines() reads all of the file into the memory. Try using xreadlines,
the generator-version, instead. And I'm not 100% sure, but I *think* doing

for line in sys.stdin:
...

does exactly that.

Diez
Jun 27 '08 #2
readlines() reads all of the file into the memory. Try using xreadlines,
the generator-version, instead. And I'm not 100% sure, but I *think* doing

for line in sys.stdin
both work -- many thanks.

-clay
Jun 27 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Svenne Krap | last post by:
Hi. I am writing a small script, that is called from an external program (the qmail mailserver), I need to capture all input from stdin before continueing the script .. I looked for...
12
by: Mike Maxwell | last post by:
When I invoke readline() in a for loop, why does it return a series of one-char strings, rather than the full line? >>> for sL in sys.stdin.readline(): print sL .... abc a b c
34
by: Ross Reyes | last post by:
HI - Sorry for maybe a too simple a question but I googled and also checked my reference O'Reilly Learning Python book and I did not find a satisfactory answer. When I use readlines, what...
7
by: Will McDonald | last post by:
Hi all. I'm writing a little script that operates on either stdin or a file specified on the command line when run. I'm trying to handle the situation where the script's run without any input...
4
by: Adam Funk | last post by:
I'm using this sort of standard thing: for line in fileinput.input(): do_stuff(line) and wondering whether it reads until it hits an EOF and then passes lines (one at a time) into the...
2
by: Rudy Gevaert | last post by:
Hi, I have written an perl program that read from stdin: while(<STDIN>) { chomp do_it($_); } Data is fed to it via a pipe:
0
by: Jean-Paul Calderone | last post by:
On Mon, 12 May 2008 08:05:39 -0700 (PDT), cshirky <cshirky@gmail.comwrote: file.readlines reads the entire file into a list in memory. You may not want to do this. You could try, instead,...
4
by: zugnush | last post by:
I often grep particular patterns out of large logfiles and then pipeline the output to sort and uniq -c I thought today to knock up a script to do the counting in a python dict. This seems work...
5
by: zxo102 | last post by:
Hello All, I have a system. An instrument attched to 'com1' is wireless connected to many sensors at different locations. The instrument can forward the "commands" (from pyserial's write()) to...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.