Hello
I'm stuck at understanding why Python can't extract some bit from an
HTML file using regexes, although I can find it just fine with
UltraEdit.
I wonder if Python rewrites CRLFs when reading a text file with
open/read?
Here's the code:
==========
f = open("content.html", "r")
content = f.read()
f.close()
#BAD
friends = re.compile('</td></tr></table>\r\n</div>\r\n',re.IGNORECASE
| re.MULTILINE | re.DOTALL)
#GOOD
friends = re.compile('</td></tr></table>',re.IGNORECASE | re.MULTILINE
| re.DOTALL)
m = friends.search(content)
if m:
print "Found"
else:
print "List not found"
==========
Thank you for any tip. 4 1268
On Wed, 12 Nov 2008 12:04:07 +0100, Gilles Ganault <no****@nospam.com>
wrote:
>I wonder if Python rewrites CRLFs when reading a text file with open/read?
For those seeing the same thing, the answer is yes: On Windows, the
code above turns CRLF into LF. I tried "rb" instead of "r", with no
difference.
On Nov 12, 10:04*pm, Gilles Ganault <nos...@nospam.comwrote:
Hello
I'm stuck at understanding why Python can't extract some bit from an
HTML file using regexes, although I can find it just fine with
UltraEdit.
I wonder if Python rewrites CRLFs when reading a text file with
open/read?
Don't wonder; do some very elementary debugging and find out for
yourself.
Here's the code:
==========
f = open("content.html", "r")
content = f.read()
f.close()
Consider inserting
print repr(content)
here.
Gilles Ganault wrote:
On Wed, 12 Nov 2008 12:04:07 +0100, Gilles Ganault <no****@nospam.com>
wrote:
>I wonder if Python rewrites CRLFs when reading a text file with open/read?
For those seeing the same thing, the answer is yes: On Windows, the
code above turns CRLF into LF. I tried "rb" instead of "r", with no
difference.
Sorry but that is not what's happening. Your problem is not in reading the
file, it's in the regular expression you're using.
Using open with the "rb" flag leaves the file content intact and does not munge newlines
in any way. A read() will return the exact bytes that are in the file.
--irmen
Gilles Ganault wrote:
Hello
I'm stuck at understanding why Python can't extract some bit from an
HTML file using regexes, although I can find it just fine with
UltraEdit.
#BAD
friends = re.compile('</td></tr></table>\r\n</div>\r\n',re.IGNORECASE
| re.MULTILINE | re.DOTALL)
If you keep running into trouble and you're sure it's related to the newlines,
maybe it helps using the 'whitespace' symbol instead of \r\n in your expression:
re.compile('</td></tr></table>\\s*</div>\\s*', .... )
Other than that, hard to say what's not working as expected without knowing
the exact contents of the "content.html" file you're searching in....
--irmen This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Antoon Pardon |
last post by:
I have had a look at the signal module and the example
and came to the conclusion that the example wont work
if you try to do this in a thread.
So is there a chance similar code will work in a...
|
by: fuego |
last post by:
My company (http://primedia.com/divisions/businessinformation/) has
two job openings that we're having a heckuva time filling. We've
posted at Monster, Dice, jobs.perl.org and python.jobmart.com. ...
|
by: stephen.mayer |
last post by:
Anyone know which is faster? I'm a PHP programmer but considering
getting into Python ... did searches on Google but didn't turn much up
on this.
Thanks!
Stephen
|
by: Elric02 |
last post by:
I'm currently trying to get access to the Python source code, however
whenever I try to extract the files using the latest version of WinZip
(version 10) I get the following error "error reading...
|
by: lennart |
last post by:
Hi,
I'm planning to learn a language for 'client' software. Until now, i
'speak' only some web based languages, like php. As a kid i programmed
in Basic (CP/M, good old days :'-) ) Now i want to...
|
by: Ben Sizer |
last post by:
I've installed several different versions of Python across several
different versions of MS Windows, and not a single time was the Python
directory or the Scripts subdirectory added to the PATH...
|
by: John Nagle |
last post by:
The major complaint I have about Python is that the packages
which connect it to other software components all seem to have
serious problems. As long as you don't need to talk to anything
outside...
|
by: |
last post by:
Hi,
I used extensively python and now I find this mess with strings,
I can't even reproduce tutorial examples:
File "<stdin>", line 0
^
SyntaxError: 'ascii' codec can't decode byte 0xc4 in...
|
by: Sh4wn |
last post by:
Hi,
first, python is one of my fav languages, and i'll definitely keep
developing with it. But, there's 1 one thing what I -really- miss:
data hiding. I know member vars are private when you...
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
| |