473,382 Members | 1,353 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Reading newlines from a text file

I have a data file that I read with readline(), and would like to
control the formats of the lines when they are printed. I have tried
inserting escape sequences into the data file, but am having trouble
getting them to work as I think they should. For example, if my data
file has only one line which reads:
1\n234\n567

I would like to read it with a command of the form
x=datafile.readline()

and I would like
print x

to give me
1
234
567

The readline works like a charm, but the print gives me
1\n234\n567
Clearly the linefeeds are not interpreted as such. How can I get the
Python interpreter to correctly interpret escape sequences in strings
that are read from files?

Thomas Philips
Jul 18 '05 #1
5 4894
Thomas Philips wrote:
I have a data file that I read with readline(), and would like to
control the formats of the lines when they are printed. I have tried
inserting escape sequences into the data file, but am having trouble
getting them to work as I think they should. For example, if my data
file has only one line which reads:
1\n234\n567

I would like to read it with a command of the form
x=datafile.readline()

and I would like
print x

to give me
1
234
567

The readline works like a charm, but the print gives me
1\n234\n567
Clearly the linefeeds are not interpreted as such. How can I get the
Python interpreter to correctly interpret escape sequences in strings
that are read from files?


I see you are using "correct" as a synonym for "the way I want" :-)
The Python interpreter does not interpret data read from a file and you were
in for serious trouble if it would. However, you can process the data in
any way you like, and here's how to replace C-style escape sequences with
the corresponding characters:
import codecs
file("tmp.txt", "w").write("\\ntoerichte\\nlogik\\nboeser\\nkobold \n"*2)
for line in file("tmp.txt"): .... print line
....
\ntoerichte\nlogik\nboeser\nkobold

\ntoerichte\nlogik\nboeser\nkobold
for line in codecs.open("tmp.txt", "r", "string_escape"): .... print line
....

toerichte
logik
boeser
kobold
toerichte
logik
boeser
kobold


Peter
Jul 18 '05 #2
"Thomas Philips" <tk****@hotmail.com> wrote in message
news:b4**************************@posting.google.c om...
I have a data file that I read with readline(), and would like to
control the formats of the lines when they are printed. I have tried
inserting escape sequences into the data file, but am having trouble
getting them to work as I think they should. For example, if my data
file has only one line which reads:
1\n234\n567

I would like to read it with a command of the form
x=datafile.readline()

and I would like
print x

to give me
1
234
567

The readline works like a charm, but the print gives me
1\n234\n567
Clearly the linefeeds are not interpreted as such. How can I get the
Python interpreter to correctly interpret escape sequences in strings
that are read from files?

Thomas Philips


You've got a couple of misconceptions. If you use a standard
open(<file>, "rt"), Python will only recognize end of line sequences
for your system. Windows, unices and the Mac all have different
conventions, so if you're on Windows and your file has unix or mac
newline sequences, they won't be recognized. They'll come in on
one readline().

The second misconception seems to be that Python will strip
newlines when it reads in line mode. It doesn't. It does convert
the OS dependent newline sequences to a standard \n, but that's
all. Each line read is still has a newline at the end (except the last one,
if it was missing in the file.)

Print, on the other hand, is going to add a newline regardless of
whether one exists in the data.

You can solve the first problem by adding a "U" somewhere in
the open/file call. I'm not sure where, check the docs. To solve
the second problem, you need to do one of several things:

1) if you want to use print, strip the newline from the string
before writing it. The print statement will add it to the end.

2) if all you want is line endings for your system, open with
"wt" and put a /n at the end of each line. Python will take care
of the rest.

3) if you want line endings for a different system, open the file
as 'wb' and insert the proper sequence at the end yourself.

HTH
John Roth
Jul 18 '05 #3
Peter,
|
Very neat. That is exactly what I want it to do. On looking through
the help for open() (or its replacement, file()), I did not see a
reference to a "string_escape" option. Where is this documented (in
language comprehensible to a newbie)?

Also, I notice that you have two backslashes before each n in
"\\ntoerichte\\nlogik\\nboeser\\nkobold\n". Why is there a need for 2
backslashes -would not one have done the trick?
Thomas Philips
Jul 18 '05 #4
> Also, I notice that you have two backslashes before each n in
"\\ntoerichte\\nlogik\\nboeser\\nkobold\n". Why is there a need for 2
backslashes -would not one have done the trick?


Two backslashes escape the bacslash - otherwise the text in tmp.txt would
have contained _newlines_, not the \n-escape sequence you wanted. Test
this:

print "foo\nbar"
foo
bar
print "foo\\nbar"
foo\nbar

And the encoding argument is in codecs.open, not in builtin open - so you'll
find it documented in the codecs-module.

--
Regards,

Diez B. Roggisch

Jul 18 '05 #5
Thomas Philips wrote:
Very neat. That is exactly what I want it to do. On looking through
After reading John Roth's post I'm no longer sure. If you want multiple
lines from a file you don't need escape sequences, just use

s = file("tmp.txt", "U").read()

to put it all in a single string. Only if you want to interpret each line in
a file as multiple line strings, the codec hack should be considered. To
get the best solution, you'd rather post the concrete problem you are
trying to solve.
the help for open() (or its replacement, file()), I did not see a
reference to a "string_escape" option. Where is this documented (in
language comprehensible to a newbie)?
The documentation is here (and on the neighbouring pages):

http://www.python.org/doc/current/lib/node127.html

Codecs are probably not the first thing you'll want to learn about Python,
though.
Also, I notice that you have two backslashes before each n in
"\\ntoerichte\\nlogik\\nboeser\\nkobold\n". Why is there a need for 2
backslashes -would not one have done the trick?


I think Diez has already explained that. Another way to find out is to run
my example and then look into tmp.txt. There are actually two lines. Can
you find out why? Hint: the last \n is different.

Peter
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Alessandro Crugnola *sephiroth* | last post by:
Hi, I'm trying to detect the newlines method of a file opened. I'm using the 'U' mode as parameter when opening a file, but for every file opened the result is always "\r\n", even if the file has...
6
by: shailesh_gaikar | last post by:
All, Please help in the following. I am new to XML and XSL. I have written one XSL as follows: <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0"...
15
by: djj858 | last post by:
Another newbie question: How do I begin reading data, but starting from the xth line down a list? In other words, how do I skip the first lines and not read in those values?
3
by: KevinD | last post by:
thank you for your helpful explanations. In my first note I forgot to mention that my simple flatfile is a text file with a newline character at the end thus I able to get an entire record . ...
1
by: dbee | last post by:
Hi, So I'm having a problem with disappearing newlines. I import the newlines from a file into my shell script fine. But then I process the text and the url_encode comes out the other end with...
10
by: Tyler | last post by:
Hello All: After trying to find an open source alternative to Matlab (or IDL), I am currently getting acquainted with Python and, in particular SciPy, NumPy, and Matplotlib. While I await the...
6
by: chrispwd | last post by:
Hello, I have a situation where I have a file that contains text similar to: myValue1 = contents of value1 myValue2 = contents of value2 but with a new line here myValue3 = contents of...
10
by: lancer6238 | last post by:
Hi all, I'm having programs reading from files. I have a text file "files.txt" that contains the names of the files to be opened, i.e. the contents of files.txt are Homo_sapiens.fa...
13
by: kronecker | last post by:
I am trying to delete multiple lines in a text file using the following Private Sub Read_TextFile() Dim objReader As StreamReader Dim strfull, strContents, strContentsold, strContentsnew As...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.