Does Python mess with CRLFs?

Gilles Ganault

Hello

I'm stuck at understanding why Python can't extract some bit from an
HTML file using regexes, although I can find it just fine with
UltraEdit.

I wonder if Python rewrites CRLFs when reading a text file with
open/read?

Here's the code:
==========
f = open("content.html", "r")
content = f.read()
f.close()

#BAD
friends = re.compile('</td></tr></table>\r\n</div>\r\n',re.IGNORECASE
| re.MULTILINE | re.DOTALL)

#GOOD
friends = re.compile('</td></tr></table>',re.IGNORECASE | re.MULTILINE
| re.DOTALL)

m = friends.search(content)
if m:
print "Found"
else:
print "List not found"
==========

Thank you for any tip.

Nov 12 '08 #1

Subscribe Post Reply

1268

Gilles Ganault

On Wed, 12 Nov 2008 12:04:07 +0100, Gilles Ganault <no****@nospam.com>
wrote:

>I wonder if Python rewrites CRLFs when reading a text file with
open/read?

For those seeing the same thing, the answer is yes: On Windows, the
code above turns CRLF into LF. I tried "rb" instead of "r", with no
difference.

Nov 12 '08 #2

John Machin

On Nov 12, 10:04*pm, Gilles Ganault <nos...@nospam.comwrote:

Hello

I'm stuck at understanding why Python can't extract some bit from an
HTML file using regexes, although I can find it just fine with
UltraEdit.

I wonder if Python rewrites CRLFs when reading a text file with
open/read?

Don't wonder; do some very elementary debugging and find out for
yourself.

Here's the code:
==========
f = open("content.html", "r")
content = f.read()
f.close()

Consider inserting
print repr(content)
here.

Nov 12 '08 #3

Irmen de Jong

Gilles Ganault wrote:

On Wed, 12 Nov 2008 12:04:07 +0100, Gilles Ganault <no****@nospam.com>
wrote:
>I wonder if Python rewrites CRLFs when reading a text file with
open/read?

For those seeing the same thing, the answer is yes: On Windows, the
code above turns CRLF into LF. I tried "rb" instead of "r", with no
difference.

Sorry but that is not what's happening. Your problem is not in reading the
file, it's in the regular expression you're using.

Using open with the "rb" flag leaves the file content intact and does not munge newlines
in any way. A read() will return the exact bytes that are in the file.

--irmen

Nov 12 '08 #4

Irmen de Jong

Gilles Ganault wrote:

Hello

I'm stuck at understanding why Python can't extract some bit from an
HTML file using regexes, although I can find it just fine with
UltraEdit.

#BAD
friends = re.compile('</td></tr></table>\r\n</div>\r\n',re.IGNORECASE
| re.MULTILINE | re.DOTALL)

If you keep running into trouble and you're sure it's related to the newlines,
maybe it helps using the 'whitespace' symbol instead of \r\n in your expression:
re.compile('</td></tr></table>\\s*</div>\\s*', .... )

Other than that, hard to say what's not working as expected without knowing
the exact contents of the "content.html" file you're searching in....

--irmen

Nov 12 '08 #5

by: Antoon Pardon | last post by:

I have had a look at the signal module and the example and came to the conclusion that the example wont work if you try to do this in a thread. So is there a chance similar code will work in a...

Python

Help With Hiring Python Developers

by: fuego | last post by:

My company (http://primedia.com/divisions/businessinformation/) has two job openings that we're having a heckuva time filling. We've posted at Monster, Dice, jobs.perl.org and python.jobmart.com. ...

Python

PHP vs. Python

by: stephen.mayer | last post by:

Anyone know which is faster? I'm a PHP programmer but considering getting into Python ... did searches on Google but didn't turn much up on this. Thanks! Stephen

Python

Unable to extract Python source code using Windows

by: Elric02 | last post by:

I'm currently trying to get access to the Python source code, however whenever I try to extract the files using the latest version of WinZip (version 10) I get the following error "error reading...

Python

Is python for me?

by: lennart | last post by:

Hi, I'm planning to learn a language for 'client' software. Until now, i 'speak' only some web based languages, like php. As a kid i programmed in Basic (CP/M, good old days :'-) ) Now i want to...

Python

Why does Python never add itself to the Windows path?

by: Ben Sizer | last post by:

I've installed several different versions of Python across several different versions of MS Windows, and not a single time was the Python directory or the Scripts subdirectory added to the PATH...

Python

113

Python does not play well with others

by: John Nagle | last post by:

The major complaint I have about Python is that the packages which connect it to other software components all seem to have serious problems. As long as you don't need to talk to anything outside...

Python

What happened with python? messed strings?

by: | last post by:

Hi, I used extensively python and now I find this mess with strings, I can't even reproduce tutorial examples: File "<stdin>", line 0 ^ SyntaxError: 'ascii' codec can't decode byte 0xc4 in...

Python

162

Why does python not have a mechanism for data hiding?

by: Sh4wn | last post by:

Hi, first, python is one of my fav languages, and i'll definitely keep developing with it. But, there's 1 one thing what I -really- miss: data hiding. I know member vars are private when you...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Does Python mess with CRLFs?

Similar topics