How to count lines in a text file ?

Ling Lee

Hi all.

I'm trying to write a program that:
1) Ask me what file I want to count number of lines in, and then counts the
lines and writes the answear out.

2) I made the first part like this:

in_file = raw_input("What is the name of the file you want to open: ")
in_file = open("test.txt","r")
text = in_file.read()

3) I think that I have to use a for loop ( something like: for line in text:
count +=1)
Or maybee I have to do create a def: something like: ( def loop(line,
count)), but not sure how to do this properly.
And then perhaps use the readlines() function, but again not quite sure how
to do this. So do one of you have a good idea.

Thanks for all help

Jul 18 '05 #1

Subscribe Post Reply

61324

Ling Lee

Oh I just did it.

Just used the line:

print "%d lines in your choosen file" % len(open("test.txt").readlines())

Thanks though :)
"Ling Lee" <ja*****@mail.trillegaarden.dk> wrote in message
news:41***********************@nntp02.dk.telia.net ...

Hi all.

I'm trying to write a program that:
1) Ask me what file I want to count number of lines in, and then counts
the lines and writes the answear out.

2) I made the first part like this:

in_file = raw_input("What is the name of the file you want to open: ")
in_file = open("test.txt","r")
text = in_file.read()

3) I think that I have to use a for loop ( something like: for line in
text: count +=1)
Or maybee I have to do create a def: something like: ( def loop(line,
count)), but not sure how to do this properly.
And then perhaps use the readlines() function, but again not quite sure
how to do this. So do one of you have a good idea.

Thanks for all help

Jul 18 '05 #2

Phil Frost

Yes, you need a for loop, and a count variable. You can count in several
ways. File objects are iterable, and they iterate over the lines in the
file. readlines() returns a list of the lines, which will have the same
effect, but because it builds the entire list in memory first, it uses
more memory. Example:

########

filename = raw_input('file? ')
file = open(filename)

lines = 0
for line in file:
# line is ignored here, but it contains each line of the file,
# including the newline
lines += 1

print '%r has %r lines' % (filename, lines)

########

another alternative is to use the standard posix program "wc" with the
-l option, but this isn't Python.

On Mon, Sep 20, 2004 at 03:18:53PM +0200, Ling Lee wrote:

Hi all.

I'm trying to write a program that:
1) Ask me what file I want to count number of lines in, and then counts the
lines and writes the answear out.

2) I made the first part like this:

in_file = raw_input("What is the name of the file you want to open: ")
in_file = open("test.txt","r")
text = in_file.read()

3) I think that I have to use a for loop ( something like: for line in text:
count +=1)
Or maybee I have to do create a def: something like: ( def loop(line,
count)), but not sure how to do this properly.
And then perhaps use the readlines() function, but again not quite sure how
to do this. So do one of you have a good idea.

Thanks for all help

Jul 18 '05 #3

Alex Martelli

Ling Lee <ja*****@mail.trillegaarden.dk> wrote:

Oh I just did it.

Just used the line:

print "%d lines in your choosen file" % len(open("test.txt").readlines())

Thanks though :)

You're welcome;-). However, this approach reads all of the file into
memory at once. If you must be able to deal with humungoug files, too
big to fit in memory at once, try something like:

numlines = 0
for line in open('text.txt'): numlines += 1
Alex

Jul 18 '05 #4

Roland Heiber

Ling Lee wrote:

Hi all.

I'm trying to write a program that:
1) Ask me what file I want to count number of lines in, and then counts the
lines and writes the answear out.

2) I made the first part like this:

in_file = raw_input("What is the name of the file you want to open: ")
in_file = open("test.txt","r")
text = in_file.read()

3) I think that I have to use a for loop ( something like: for line in text:
count +=1)
Or maybee I have to do create a def: something like: ( def loop(line,
count)), but not sure how to do this properly.
And then perhaps use the readlines() function, but again not quite sure how
to do this. So do one of you have a good idea.

Thanks for all help

text = in_file.readlines()
print len(text)

HtH, Roland

Jul 18 '05 #5

Ling Lee

Thanks for you replies :)

I just ran the program with a different file name, and it only counts the
number of lines in the file named test.txt. I try to give it a nother try
with your input...

Thanks again... for the fast reply... Hope I get it right this time :)

"Phil Frost" <in****@bitglue.com> wrote in message
news:ma**************************************@pyth on.org...

Yes, you need a for loop, and a count variable. You can count in several
ways. File objects are iterable, and they iterate over the lines in the
file. readlines() returns a list of the lines, which will have the same
effect, but because it builds the entire list in memory first, it uses
more memory. Example:

########

filename = raw_input('file? ')
file = open(filename)

lines = 0
for line in file:
# line is ignored here, but it contains each line of the file,
# including the newline
lines += 1

print '%r has %r lines' % (filename, lines)

########

another alternative is to use the standard posix program "wc" with the
-l option, but this isn't Python.

On Mon, Sep 20, 2004 at 03:18:53PM +0200, Ling Lee wrote:
Hi all.

I'm trying to write a program that:
1) Ask me what file I want to count number of lines in, and then counts
the
lines and writes the answear out.

2) I made the first part like this:

in_file = raw_input("What is the name of the file you want to open: ")
in_file = open("test.txt","r")
text = in_file.read()

3) I think that I have to use a for loop ( something like: for line in
text:
count +=1)
Or maybee I have to do create a def: something like: ( def loop(line,
count)), but not sure how to do this properly.
And then perhaps use the readlines() function, but again not quite sure
how
to do this. So do one of you have a good idea.

Thanks for all help

Jul 18 '05 #6

Erik Heneryd

Phil Frost wrote:

another alternative is to use the standard posix program "wc" with the
-l option, but this isn't Python.

Not the same thing. wc -l counts newline bytes, not "real" lines.
Erik

Jul 18 '05 #7

Brian van den Broek

Ling Lee said unto the world upon 2004-09-20 09:36:

Thanks for you replies :)

I just ran the program with a different file name, and it only counts the
number of lines in the file named test.txt. I try to give it a nother try
with your input...

Thanks again... for the fast reply... Hope I get it right this time :)

<SNIP>

On Mon, Sep 20, 2004 at 03:18:53PM +0200, Ling Lee wrote:
Hi all.

I'm trying to write a program that:
1) Ask me what file I want to count number of lines in, and then counts
the
lines and writes the answear out.

2) I made the first part like this:

in_file = raw_input("What is the name of the file you want to open: ")
in_file = open("test.txt","r")
text = in_file.read()

3) I think that I have to use a for loop ( something like: for line in
text:
count +=1)
Or maybee I have to do create a def: something like: ( def loop(line,
count)), but not sure how to do this properly.
And then perhaps use the readlines() function, but again not quite sure
how
to do this. So do one of you have a good idea.

Thanks for all help

Hi Ling Lee,

you've got:

in_file = raw_input("What is the name of the file you want to open: ")
in_file = open("test.txt","r")

What this does is take the user input and assign it the name "in_file"
and then promptly reassigns the name "in_file" to the output of
open("test.txt","r").

So, you never make use of the input, and keep asking it to open test.txt
instead.

Try something like:

in_file_name = raw_input("What is the file you want to open: ")
in_file = open(in_file_name,"r")

Also, and I say this as a fellow newbie, you might want to check out the
Tutor list: <http://mail.python.org/pipermail/tutor/>

HTH,

Brian vdB

Jul 18 '05 #8

Andrew Dalke

Ling Lee wrote:

2) I made the first part like this:

in_file = raw_input("What is the name of the file you want to open: ")
in_file = open("test.txt","r")
text = in_file.read()
You have two different objects related to the file.
One is the filename (the result of calling raw_input) and
the other is the file handle (the result of calling open).
You are using same variable name for both of them. You
really should make them different.

First you get the file name and reference it by the variable
named 'in_file'. Next you use another filename ("test.txt")
for the open call. This returns a file handle, but not
a file handle to the file named in 'in_file'.

You then change things so that 'in_file' no longer refers
to the filename but now refers to the file handle.

A nicer solution is to use one variable name for the name
(like "in_filename") and another for the handle (you can
keep "in_file" if you want to). In the following I
reformatted it so the example fits in under 80 colums

in_filename = raw_input("What is the name of the file "
"you want to open: ")
in_file = open(in_filename,"r")
text = in_file.read()
Now the in_file.read() reads all of the file into memory. There
are several ways to count the number of lines. The first is
to count the number of newline characters. Because the newline
character is special, it's most often written as what's called
an escape code. In this case, "\n". Others are backspace ("\b")
and beep ("\g"), and backslash ("\\") since otherwise there's
no way to get the single character "\".

Here's how to cound the number of newlines in the text

num_lines = text.count("\n")

print "There are", num_lines, "in", in_filename
This will work for almost every file except for one where
the last line doesn't end with a newline. It's rare, but
it does happen. To fix that you need to see if the
text ends with a newline and if it doesn't then add one
more to the count
num_lines = text.count("\n")
if not text.endswith("\n"):
num_lines = num_lines + 1

print "There are", num_lines, "in", in_filename

3) I think that I have to use a for loop ( something like
for line in text: count +=1)

Something like that will work. When you say "for xxxx in string"
it loops through every character in the string, and not
every line. What you need is some way to get the lines.

One solution is to use the 'splitlines' method of strings.
This knows how to deal with the "final line doesn't end with
a newline" case and return a list of all the lines. You
can use it like this

count = 0
for line in text.splitlines():
count = count + 1

or, since splitlines() returns a list of lines you can
also do

count = len(text.splitlines())

It turns out that reading lines from a file is very common.
When you say "for xxx in file" it loops through every line
in the file. This is not a list so you can't say

len(open(in_filename, "r")) # DOES NOT WORK

instead you need to have the explicit loop, like this

count = 0
for line in open(in_filename, "r")):
count = count + 1

An advantage to this approach is that it doesn't read
the whole file into memory. That's only a problems
if you have a large file. Try counting the number of
lines in a 1.5 GB file!

By the way, the "r" is the default for the a file open.
Most people omit it from the parameter list and just use

open(in_filename)

Hope this helped!

By the way, you might want to look at the "Beginner's
Guide to Python" page at http://python.org/topics/learn/ .
It has pointers to resources that might help, including
the tutor mailing list meant for people like you who
are learning to program in Python.

Andrew
da***@dalkescientific.com

Jul 18 '05 #9

Ling Lee

Thanks for explaining it that well, really makes sense now :)

Cheers....
"Andrew Dalke" <ad****@mindspring.com> wrote in message
news:ek**************@newsread3.news.pas.earthlink .net...

Ling Lee wrote:
2) I made the first part like this:

in_file = raw_input("What is the name of the file you want to open: ")
in_file = open("test.txt","r")
text = in_file.read()

You have two different objects related to the file.
One is the filename (the result of calling raw_input) and
the other is the file handle (the result of calling open).
You are using same variable name for both of them. You
really should make them different.

First you get the file name and reference it by the variable
named 'in_file'. Next you use another filename ("test.txt")
for the open call. This returns a file handle, but not
a file handle to the file named in 'in_file'.

You then change things so that 'in_file' no longer refers
to the filename but now refers to the file handle.

A nicer solution is to use one variable name for the name
(like "in_filename") and another for the handle (you can
keep "in_file" if you want to). In the following I
reformatted it so the example fits in under 80 colums

in_filename = raw_input("What is the name of the file "
"you want to open: ")
in_file = open(in_filename,"r")
text = in_file.read()
Now the in_file.read() reads all of the file into memory. There
are several ways to count the number of lines. The first is
to count the number of newline characters. Because the newline
character is special, it's most often written as what's called
an escape code. In this case, "\n". Others are backspace ("\b")
and beep ("\g"), and backslash ("\\") since otherwise there's
no way to get the single character "\".

Here's how to cound the number of newlines in the text

num_lines = text.count("\n")

print "There are", num_lines, "in", in_filename
This will work for almost every file except for one where
the last line doesn't end with a newline. It's rare, but
it does happen. To fix that you need to see if the
text ends with a newline and if it doesn't then add one
more to the count
num_lines = text.count("\n")
if not text.endswith("\n"):
num_lines = num_lines + 1

print "There are", num_lines, "in", in_filename

3) I think that I have to use a for loop ( something like
for line in text: count +=1)

Something like that will work. When you say "for xxxx in string"
it loops through every character in the string, and not
every line. What you need is some way to get the lines.

One solution is to use the 'splitlines' method of strings.
This knows how to deal with the "final line doesn't end with
a newline" case and return a list of all the lines. You
can use it like this

count = 0
for line in text.splitlines():
count = count + 1

or, since splitlines() returns a list of lines you can
also do

count = len(text.splitlines())

It turns out that reading lines from a file is very common.
When you say "for xxx in file" it loops through every line
in the file. This is not a list so you can't say

len(open(in_filename, "r")) # DOES NOT WORK

instead you need to have the explicit loop, like this

count = 0
for line in open(in_filename, "r")):
count = count + 1

An advantage to this approach is that it doesn't read
the whole file into memory. That's only a problems
if you have a large file. Try counting the number of
lines in a 1.5 GB file!

By the way, the "r" is the default for the a file open.
Most people omit it from the parameter list and just use

open(in_filename)

Hope this helped!

By the way, you might want to look at the "Beginner's
Guide to Python" page at http://python.org/topics/learn/ .
It has pointers to resources that might help, including
the tutor mailing list meant for people like you who
are learning to program in Python.

Andrew
da***@dalkescientific.com

Jul 18 '05 #10

Christos TZOTZIOY Georgiou

On Mon, 20 Sep 2004 15:29:18 +0200, rumours say that al*****@yahoo.com
(Alex Martelli) might have written:

Ling Lee <ja*****@mail.trillegaarden.dk> wrote:
Oh I just did it.

Just used the line:

print "%d lines in your choosen file" % len(open("test.txt").readlines())

Thanks though :)

[Alex]You're welcome;-). However, this approach reads all of the file into
memory at once. If you must be able to deal with humungoug files, too
big to fit in memory at once, try something like:

numlines = 0
for line in open('text.txt'): numlines += 1

And a short story of premature optimisation follows...

Saw the plain code above and instantly the programmer's instinct of
optimisation came into action... we all know that C loops are faster
than python loops, right? So I spent 2 minutes of my time to write the
following 'clever' function:

def count_lines(filename):
fp = open(filename)
count = 1 + max(enumerate(fp))[0]
fp.close()
return count

Proud of my programming skills, I timed it against another function
containing Alex' code. Guess what? My code was slower... (and I should
put a try: except Value: clause to cater for empty files)

Of course, on second thought, the reason must be that enumerate
generates one tuple for every line in the file; in any case, I'll mark
this rule:

C loops are *always* faster than python loops, unless the loop does
something useful ;-) in the latter case, timeit.py is your friend.
--
TZOTZIOY, I speak England very best,
"Tssss!" --Brad Pitt as Achilles in unprecedented Ancient Greek

Jul 18 '05 #11

Alex Martelli

Christos TZOTZIOY Georgiou <tz**@sil-tec.gr> wrote:
...

memory at once. If you must be able to deal with humungoug files, too
big to fit in memory at once, try something like:

numlines = 0
for line in open('text.txt'): numlines += 1
And a short story of premature optimisation follows...

Thanks for sharing!
def count_lines(filename):
fp = open(filename)
count = 1 + max(enumerate(fp))[0]
fp.close()
return count
Cute, actually!
containing Alex' code. Guess what? My code was slower... (and I should
put a try: except Value: clause to cater for empty files)

Of course, on second thought, the reason must be that enumerate
generates one tuple for every line in the file; in any case, I'll mark

I thought built-ins could recycle their tuples, sometimes, but you may
in fact be right (we should check with Raymong Hettinger, though).

With 2.4, I measure 30 msec with your approach, and 24 with mine, to
count the 45425 lines of /usr/share/dict/words on my Linux box
(admittedly not a creat example of 'humungous file'); and similarly
kjv.txt, a King James' Bible (31103 lines, but 10 times the size of the
words file), 41 with yours, 36 with mine. They're pretty close. At
least they beat len(file(...).readlines()), which takes 33 on words, 62
on kjv.txt...

If one is really in a hurry counting lines, a dedicated C extension
might help. E.g.:

static PyObject *count(PyObject *self, PyObject *args)
{
PyObject* seq;
PyObject* item;
int result;

/* get one argument as an iterator */
if(!PyArg_ParseTuple(args, "O", &seq))
return 0;
seq = PyObject_GetIter(seq);
if(!seq)
return 0;

/* count items */
result = 0;
while((item=PyIter_Next(seq))) {
result += 1;
Py_DECREF(item);
}

/* clean up and return result */
Py_DECREF(seq);
return Py_BuildValue("i", result);
}

Using this count-items-in-iterable thingy, words takes 10 msec, kjv
takes 26.

Happier news is that one does NOT have to learn C to gain this.
Consider the Pyrex file:

def count(seq):
cdef int i
it = iter(seq)
i = 0
for x in it:
i = i + 1
return i

pyrexc'ing this and building the Python extension from the resulting C
file gives just about the same performance as the pure-C coding: 10 msec
on words, 26 on kjv, the same to within 1% as pure-C coding (there is a
systematic speedup of a bit less than 1% for the C-coded function).

And if one doesn't even want to use pyrex? Why, that's what psyco is
for...:

import psyco
def count(seq):
it = iter(seq)
i = 0
for x in it:
i = i + 1
return i
psyco.bind(seq)

Again to the same level of precision, the SAME numbers, 10 and 26 msec
(actually, in this case the less-than-1% systematic bias is in favour of
psyco compared to pure-C coding...!-)
So: your instinct that C-coded loops are faster weren't too badly off...
and you can get the same performance (just about) with Pyrex or (on an
intel or compatible processor, only -- sigh) with psyco.
Alex

Jul 18 '05 #12

Bengt Richter

On Mon, 20 Sep 2004 15:29:18 +0200, al*****@yahoo.com (Alex Martelli) wrote:

Ling Lee <ja*****@mail.trillegaarden.dk> wrote:
Oh I just did it.

Just used the line:

print "%d lines in your choosen file" % len(open("test.txt").readlines())

Thanks though :)

You're welcome;-). However, this approach reads all of the file into
memory at once. If you must be able to deal with humungoug files, too
big to fit in memory at once, try something like:

numlines = 0
for line in open('text.txt'): numlines += 1

I don't have 2.4, but how would that compare with a generator expression like (untested)

sum(1 for line in open('text.txt'))

or, if you _are_ willing to read in the whole file,

open('text.txt').read().count('\n')

Regards,
Bengt Richter

Jul 18 '05 #13

Alex Martelli

Bengt Richter <bo**@oz.net> wrote:
...

memory at once. If you must be able to deal with humungoug files, too
big to fit in memory at once, try something like:

numlines = 0
for line in open('text.txt'): numlines += 1
I don't have 2.4

2.4a3 is freely available for download and everybody's _encouraged_ to
download it and try it out -- come on, don't be the last one to!-)
but how would that compare with a generator expression like (untested)

sum(1 for line in open('text.txt'))

or, if you _are_ willing to read in the whole file,

open('text.txt').read().count('\n')

I'm not on the same machine as when I ran the other timing measurements
(including pyrex &c) but here's the results on this one machine...:

$ wc /usr/share/dict/words
234937 234937 2486825 /usr/share/dict/words
$ python2.4 ~/cb/timeit.py "numlines=0
for line in file('/usr/share/dict/words'): numlines+=1"
10 loops, best of 3: 3.08e+05 usec per loop
$ python2.4 ~/cb/timeit.py
"file('/usr/share/dict/words').read().count('\n')"
10 loops, best of 3: 2.72e+05 usec per loop
$ python2.4 ~/cb/timeit.py
"len(file('/usr/share/dict/words').readlines())"
10 loops, best of 3: 3.25e+05 usec per loop
$ python2.4 ~/cb/timeit.py "sum(1 for line in
file('/usr/share/dict/words'))"
10 loops, best of 3: 4.42e+05 usec per loop

Last but not least...:

$ python2.4 ~/cb/timeit.py -s'import cou'
"cou.cou(file('/usr/share/dict/words'))"
10 loops, best of 3: 2.05e+05 usec per loop

where cou.pyx is the pyrex program I've already shown on the other
subthread. Using the count.c I've also shown takes 2.03e+05 usec.
(Can't try psyco here, not an intel-like cpu).
Summary: "sum(1 for ...)" is no speed daemon; the plain loop is best
among the pure-python approaches for files that can't fit in memory. If
the file DOES fit in memory, read().count('\n') is faster, but
len(...readlines()) is slower. Pyrex rocks, essentially removing the
need for C-coded extensions (less than a 1% advantage) -- and so does
psyco, but not if you're using a Mac (quick, somebody gift Armin Rigo
with a Mac before it's too late...!!!).
Alex

Jul 18 '05 #14

Andrew Dalke

Bengt Richter wrote:

or, if you _are_ willing to read in the whole file,

open('text.txt').read().count('\n')

Except the last line might not have a terminal newline.

Andrew
da***@dalkescientific.com

Jul 18 '05 #15

Andrew Dalke

Alex Martelli wrote:

If one is really in a hurry counting lines, a dedicated C extension
might help. E.g.:

static PyObject *count(PyObject *self, PyObject *args) ... Using this count-items-in-iterable thingy

There's been a few times I've wanted a function like
this. I keep expecting that len(iterable) will work,
but of course it doesn't.

Would itertools.len(iterable) be useful? More likely
the name collision with len itself would be a problem,
so perhaps itertools.length(iterable).
BTW, I saw itertools.count and figured that might be
it. Nope. And don't try the following

import itertools
itertools.count(5) count(5) print list(_)

:)

Andrew
da***@dalkescientific.com

Jul 18 '05 #16

Bengt Richter

On Wed, 22 Sep 2004 19:48:21 GMT, Andrew Dalke <ad****@mindspring.com> wrote:

Bengt Richter wrote:
or, if you _are_ willing to read in the whole file,

open('text.txt').read().count('\n')

Except the last line might not have a terminal newline.

I _knew_ I should have mentioned that ;-)

Regards,
Bengt Richter

Jul 18 '05 #17

Alex Martelli

Andrew Dalke <ad****@mindspring.com> wrote:

Alex Martelli wrote:
If one is really in a hurry counting lines, a dedicated C extension
might help. E.g.:

static PyObject *count(PyObject *self, PyObject *args) ...
Using this count-items-in-iterable thingy

There's been a few times I've wanted a function like

Me too, that's why I wrote the C and Pyrex versions:-).
this. I keep expecting that len(iterable) will work,
but of course it doesn't.
Yep -- it would probably be too risky to have len(...) consume a whole
iterator, beginning users wouldn't expect that and might get burnt.

Would itertools.len(iterable) be useful? More likely
the name collision with len itself would be a problem,
so perhaps itertools.length(iterable).

Unfortunately, itertools's functions are there to produce iterators, not
to consume them. I doubt Raymond Hettinger, itertools' guru, would
approve of changing that (though one could surely ask him, and if he
surprised me, I guess the change might get in).

There's currently no good single place for 'accumulators', i.e.
consumers of iterators which produce scalars or thereabouts -- sum, max,
and min, are built-ins; other useful accumulators can be found in heapq
(because they're implemented via a heap...)... and there's nowhere to
put the obviously needed "trivial" accumulators, such as average,
median, variance, count...

A "stats" module was proposed, but also shot down (presumably people
have more ambitious ideas about 'statistics' than there simple
accumulators, alas -- I'm not sure exactly what the problem was).
Alex

Jul 18 '05 #18

Alex Martelli

Andrew Dalke <ad****@mindspring.com> wrote:

Bengt Richter wrote:
or, if you _are_ willing to read in the whole file,

open('text.txt').read().count('\n')

Except the last line might not have a terminal newline.

....and wc would then not count that non-line as a line, so why should
we...? Witness...:

$ echo -n 'bu'>em
$ wc em
0 1 2 em

zero lines, one word, two characters: seems right to me.
Alex

Jul 18 '05 #19

Andrew Dalke

Alex Martelli wrote:

....and wc would then not count that non-line as a line, so why should
we...? Witness...:

'Cause that's what Python does. Witness:

% echo -n 'bu' | python -c \
? 'import sys; print len(sys.stdin.readlines())'
1

;)

Andrew
da***@dalkescientific.com

Jul 18 '05 #20

Alex Martelli

Andrew Dalke <ad****@mindspring.com> wrote:

Alex Martelli wrote:
....and wc would then not count that non-line as a line, so why should
we...? Witness...:
'Cause that's what Python does. Witness:

If you tell it to count non-lines too (pieces that don't end with an
endline marker), it does, of course:
% echo -n 'bu' | python -c \
? 'import sys; print len(sys.stdin.readlines())'
1

But that's just because you told it to.

To reproduce wc's behavior, you have to exclude non-lines -- use
len([ l for l in sys.stdin if l.endswith('\n') ]) for example. Or, the
simpler .count('\n') approach.

I suspect somebody who asks the subject question wants to reproduce wc's
counting behavior. Of course, it _is_ an imprecise spec they're giving.
Alex

Jul 18 '05 #21

Andrew Dalke

Alex Martelli wrote:

If you tell it to count non-lines too (pieces that don't end with an
endline marker), it does, of course:
My reply was meant to be a bit of a jest, pointing out that
I'm using Python's definition of a line. Otherwise if
lines must end with a newline then the method should be
named "readlines_and_any_trailing_text()"

Since you used

numlines=0
for line in file('/usr/share/dict/words'): numlines+=1

as a way to count lines, I assumed you would agree with
Python's definition as a reasonable way to count the
number of lines in a file and that your previous post
(on the behavior of wc) was meant more as a rhetorical
way to highlight the ambiguity than as an assertion of
general correctness.

I suspect somebody who asks the subject question wants to reproduce wc's
counting behavior.
Really? I was actually surprised at what wc does. I didn't
realize it only did a "\n" character count. The other programs
I know of number based on the start of line rather than end
of line.
% echo -n "blah" > blah.txt
% less blah.txt
(then press "=")
blah.txt lines 1-1/1 byte 4/4 (END) (press RETURN)
% echo -n "" | perl -ne '$line++; END{$line+=0;print "$line\n"}'
0
% echo -n "blah" | perl -ne '$line++; END{$line+=0;print "$line\n"}'
1

% echo -n "" | awk 'END {print NR}'
0
% echo -n "blah" | awk 'END {print NR}'
1

% echo -n "blah" | grep -n "blah"
1:blah
Of course, it _is_ an imprecise spec they're giving.

Yup.

Andrew
da***@dalkescientific.com

Jul 18 '05 #22

Alex Martelli

Andrew Dalke <ad****@mindspring.com> wrote:

I suspect somebody who asks the subject question wants to reproduce wc's
counting behavior.

Really? I was actually surprised at what wc does. I didn't
realize it only did a "\n" character count. The other programs

Ah well -- maybe it's just me, 25+ years of either using Unix or pining
for it (when I had to use VMS, VM/SP, Windows, etc, etc) must have left
their mark.
Alex

Jul 18 '05 #23

How to count lines in a text file ?

Similar topics