Having trouble with tail -f standard input

sab

Hello,

I have been working on a python script to parse a continuously growing
log file on a UNIX server. The input is the standard in, piped in
from the log file. The application works well for the most part, but
the problem is when attempting to continuously pipe information into
the application via the tail -f command. The command line looks
something like this:

tail -f <logfile| grep <search string| python parse.py

If I don't pipe the standard in to the python script, it displays any
new entries immediately on the screen. However, if I pipe the
information into the script, the sys.stdin.readline() doesn't get any
new data until a buffer fills, after which it parses a block of new
information all at once (output is fine). I need it to read the data
in real-time instead of waiting for the buffer to fill. I have tried
running the script with the -u parameter but that doesn't seem to be
doing anything. Also, if I run the program against a text file and
add a line to the text file (via cat ><text file>) it picks it up
right away. I'm sure that it's just a simple parameter that needs to
be passed or something along those lines but have been unable to find
the answer. Any ideas would be appreciated.

Thanks!

Aug 21 '08 #1

Subscribe Post Reply

4513

Diez B. Roggisch

sab schrieb:

Hello,

I have been working on a python script to parse a continuously growing
log file on a UNIX server. The input is the standard in, piped in
from the log file. The application works well for the most part, but
the problem is when attempting to continuously pipe information into
the application via the tail -f command. The command line looks
something like this:

tail -f <logfile| grep <search string| python parse.py

If I don't pipe the standard in to the python script, it displays any
new entries immediately on the screen. However, if I pipe the
information into the script, the sys.stdin.readline() doesn't get any
new data until a buffer fills, after which it parses a block of new
information all at once (output is fine). I need it to read the data
in real-time instead of waiting for the buffer to fill. I have tried
running the script with the -u parameter but that doesn't seem to be
doing anything. Also, if I run the program against a text file and
add a line to the text file (via cat ><text file>) it picks it up
right away. I'm sure that it's just a simple parameter that needs to
be passed or something along those lines but have been unable to find
the answer. Any ideas would be appreciated.

Get rid of tail, it's useless here anyway and most probably causing the
problem.

If for whatever reason you can't get rid of it, try and see if there is
any other way of skipping most of the input file - maybe creating *one*
python script to seek to the end, grep & parse.

You can't do anything in python though - the buffering and potential
flushing is courtesy of the upper end of the pipe - not python.

Diez

Aug 21 '08 #2

Derek Martin

On Thu, Aug 21, 2008 at 02:58:24PM -0700, sab wrote:

I have been working on a python script to parse a continuously growing
log file on a UNIX server.

If you weren't aware, there are already a plethora of tools which do
this... You might save yourself the trouble by just using one of
those. Try searching for something like "parse log file" on google or
freshmeat.net or whatever...

The input is the standard in, piped in from the log file. The
application works well for the most part, but the problem is when
attempting to continuously pipe information into the application via
the tail -f command. The command line looks something like this:

tail -f <logfile| grep <search string| python parse.py

The pipe puts STDIN/STDOUT into "fully buffered" mode, which results
in the behavior you're seeing. You can set the buffering mode of
those files in your program, but unfortunately tail and grep are not
your program... You might get this to work by setting stdin to
non-blocking I/O in your Python program, but I don't think it will be
that easy...

You can get around this in a couple of ways. One is to call tail and
grep from within your program, using something like os.popen()...
Then set the blocking mode on the resulting files. You'll have to
feed the output of one to the input of the other, then read the output
of grep and parse that. Yucky. That method isn't very efficient,
since Python can do everything that tail and grep are doing for you...
So I'd suggest you read the file directly in your python program, and
use Python's regex parsing functionality to do what you're doing with
grep.

As for how to actually do what tail does, I'd suggest looking at the
source code for tail to see how it does what it does.

But, if I were you, I'd just download something like swatch, and be
done with it. :)

--
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFIrfCMdjdlQoHP510RAkGKAJ9lMoo7i7Tb/ZFFWaLaDDWwqzi/3ACfZ+LA
0ArSlkB+1IP+Jh9V0ulwRgw=
=8Woo
-----END PGP SIGNATURE-----

Aug 21 '08 #3

norseman

Derek Martin wrote:

On Thu, Aug 21, 2008 at 02:58:24PM -0700, sab wrote:
>I have been working on a python script to parse a continuously growing
log file on a UNIX server.

If you weren't aware, there are already a plethora of tools which do
this... You might save yourself the trouble by just using one of
those. Try searching for something like "parse log file" on google or
freshmeat.net or whatever...

>The input is the standard in, piped in from the log file. The
application works well for the most part, but the problem is when
attempting to continuously pipe information into the application via
the tail -f command. The command line looks something like this:

tail -f <logfile| grep <search string| python parse.py

The pipe puts STDIN/STDOUT into "fully buffered" mode, which results
in the behavior you're seeing. You can set the buffering mode of
those files in your program, but unfortunately tail and grep are not
your program... You might get this to work by setting stdin to
non-blocking I/O in your Python program, but I don't think it will be
that easy...

You can get around this in a couple of ways. One is to call tail and
grep from within your program, using something like os.popen()...
Then set the blocking mode on the resulting files. You'll have to
feed the output of one to the input of the other, then read the output
of grep and parse that. Yucky. That method isn't very efficient,
since Python can do everything that tail and grep are doing for you...
So I'd suggest you read the file directly in your python program, and
use Python's regex parsing functionality to do what you're doing with
grep.

As for how to actually do what tail does, I'd suggest looking at the
source code for tail to see how it does what it does.

But, if I were you, I'd just download something like swatch, and be
done with it. :)

------------------------------------------------------------------------

--
http://mail.python.org/mailman/listinfo/python-list

================================
I have to agree with Derek about using Python as the control here. Pipe
or otherwise redirect incoming data to Python. If the incoming is
buffered then the program terminates only by force. (Deleted from memory
or system shutdown or crash)

The python: print >>file, str see Python's lib.pdf
acts like incoming | tee -a file in the sense of double output.
One to a file and one to standard out. Str can be a .read() on stdin.
As long as it is a string it don't care how it got there.

Depending on choice (per Unix):
incoming | tee -a logfile | program.py
incoming | program.py (copy all to (log)file) | programsub1.py
with all parsing in the .py's

The advantage is python can control keeping the buffers and thus the
programs open and running, whether or not data is in the pipe at the
moment. This way the logfile gets a full data set and is not further
disturbed. No trying to determine where last record read is located.
OR
Last time I looked, the syslog section was NOT disallowed the use of
named pipes (which default to first in, first out (FIFO)).
This allows pgm.py to read named_pipe, append all read to log and
parse each line as desired, sleep for a time when empty and go again.
Once more, sequence maintained. No digging to find last tested input.

Hope this helps.

Steve
no******@hughes.net

Aug 22 '08 #4

Similar topics

A better way to tail a file

by: Count Dracula | last post by:

Is there a way to write a c++ program that will print out the last few lines of a file without reading the whole file? The implementations of 'tail' I have seen all appear to be system dependent....

C / C++

creating a "tail" like textarea ?

by: Mel | last post by:

i need to create a unix like "tail" function on my site. the question: the text is displayed in a scrolled area and a new line is added at the end and the entire text is scrolled down so that...

Javascript

trouble with double free

by: slashdotcommacolon | last post by:

Hello, I'm working on the exercises from k&r, exercise 5-13 is to implement a simple replacement for the unix tail command. The brief says it should be able to cope no matter how unreasonable the...

C / C++

efficient 'tail' implementation

by: s99999999s2003 | last post by:

hi I have a file which is very large eg over 200Mb , and i am going to use python to code a "tail" command to get the last few lines of the file. What is a good algorithm for this type of task...

Python

lxml/ElementTree and .tail

by: Chas Emerick | last post by:

I looked around for an ElementTree-specific mailing list, but found none -- my apologies if this is too broad a forum for this question. I've been using the lxml variant of the ElementTree API,...

Python

What is best way to implement "tail"?

by: Owen Zhang | last post by:

What is the best way to implement "tail -f" in C or C++ and higher performance compared to either unix shell command "tail -f" or perl File::Tail ? Any suggestion appreciated. Thanks.

C / C++

implementing tail command

by: lak | last post by:

i want to write a program that implement tail and tail -f command in unix. can any one help me in this?

C / C++

Pointers and string trouble

by: blackstormdragon | last post by:

My trouble causing lines are maked within the code. #include<iostream> #include<string> using namespace std; void main() { string* head; string* tail;

C / C++

is it tail recursion

by: Muzammil | last post by:

int harmonic(int n) { if (n=1) { return 1; } else { return harmonic(n-1)+1/n; } } can any help me ??

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General