By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,595 Members | 1,433 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,595 IT Pros & Developers. It's quick & easy.

Python 2.4.1 hang

P: n/a
Hi,

This is on WinXP SP1.

I needed to get to the POST body and while I was trying out various
regular expressions, one of them caused Python to hang. The Python
process was taking up 100% of the CPU. I couldn't even see the "Max
recursion depth exceeded message". Is this a bug? Code below:

import re

s = \
"""POST /TradeManagement-RT3/ReportController.Servlet HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/vnd.ms-excel, application/vnd.ms-powerpoint,
application/msword, */*
Referer: https://dummy.com/TradeManagement-RT...portSearch.jsp
Accept-Language: en-us
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR
1.0.3705)
Host: dummy.com
Content-Length: 267
Connection: Keep-Alive
Cache-Control: no-cache
Cookie:
SMSESSION=RNwINGbjtRijIplKDE8EZ79NtSJREhKu4OogQqQD 1PTukTIE3pclwfkkj2b5YSFscbW97A8QQxk1066rmtqwatlBQs nxr2h6fvAPiazYWex297WmDDjPd05RjXNhVqiPmhxSN9nOVP6P q1igcFC5b3R4AFHWFcz+lW1QyUz+1yeaLfupKDkwaV7jP0qgPb ccioWUpEmn762OyqCnehjuVBJ9hDBGO07Bx8Pv/tHd0l6xjFOt6YbtHG9IfMaKhrnPwmdtyo8c/4trmRNO84BoqwhtOzhrJTVuPjzYN2uxg04ZgAt+j75gSA9OPYY ymirfwx5zBnhHvQz7ezGQqUPe45l3CvnRhkFVM/kOAYdY2Cdlv+15EMCVqJLT+2cMRCPPY+vlqlgsY30h5V9NWiG+ AdXKQ8LEUnPEhnSYhyIo1a3FzB1yr+E/CZfXkNi1lMrG0HiwU+NJVK7rY0deee7gFeiq8T0660eq2WOVF7 USMESTOAbSDsR6Ejo+rRscvHfX7uzvu1pRw0Phw7ffF0pr/nBhunq4v7/dW6WXOzvWAEocBXK9/Hl5Ua63X/UxXVs8g6psI0mqoRziFWw+O4t4jjn1fS2e1YvvtAGRPIcNeEEP SCgqEhSUKoGz1qysPoK87MgflIaHt/PsOeRCYSS/53B87RH9RrcaJEgrHyIBZNuzEjD1AG4Uud5oKi88902RW3IATH nH1E4UntvEdo/NbCcNbgN/dGWEvBnBzLn6KYxd4PxG0pQ3vr3qDDa7v0i9eXq9t6++tlM1tI S/XIHZc4bfGKPdZC30Dtw7HwUc7bl74/SHVEEcgzgXJPkCH2zSHaxyot3sqGHCwDa3AmuUkaPSC+iviVHl Te3Uk4KsOnnG94UIwB4yv+mlkXqnw0JwausWVuetCIm+cDIuvZ XgRYghjZnNcNsji0k15ddr8j4=;
CCTMRT3=O0PVLBMWBVD2115CLLS4REI

EntrySourceDescription=All&SEARCHReportType=All&SE ARCHReportStatus=All&available=IC&SEARCHReportComm odity=All&SortContract_Year=1&SortTrade_Price=&Sor tOrder_Type=&SortBuy_Sell_Ind=&SortAccount_Number= &SortExternal_TradeId=&GroupBy=contract&command=Pr elimReportCommand"""

#pattern_str = "^POST.*\\r\\n\\r((\\n)|(\\n[^\r]*))"
#pattern_str = "^POST.*\\n((\\n)|(\\n[^\r]*&))"
pattern_str = "^POST(.*\\n*)+\\n\\n" # <--- Offending pattern

pattern = re.compile(pattern_str)

match = pattern.match(s);

if match:
print match.groups()

Jul 19 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Mahesh wrote:
I needed to get to the POST body and while I was trying out various
regular expressions, one of them caused Python to hang. The Python
process was taking up 100% of the CPU. I couldn't even see the "Max
recursion depth exceeded message". Is this a bug?
no, it's just a very stupid way to implement a trivial operation.
import re

s = \
"""POST /TradeManagement-RT3/ReportController.Servlet HTTP/1.1
/snip>

#pattern_str = "^POST.*\\r\\n\\r((\\n)|(\\n[^\r]*))"
#pattern_str = "^POST.*\\n((\\n)|(\\n[^\r]*&))"
pattern_str = "^POST(.*\\n*)+\\n\\n" # <--- Offending pattern


the first .* is a variable-length match. so is the second .*. and then you're putting it
inside a repeated capturing group. and then you're applying it to a moderately large
string. the poor engine has to check zillions of combinations before finding something
that works.

if you want to split on "\r\n\r\n", use split:

header, body = message.split("\r\n\r\n")

for more robust code, consider using the rfc822 module:

f = StringIO.String(message)
request = f.readline()
header = rfc822.Message(f)
body = f.read()

</F>

Jul 19 '05 #2

P: n/a
Yes, it is stupid but I am debugging some poorly written C++ code so I
cannot change it. It was easier for me to use python to try various
combinations (since the C++ code uses a non-standard re engine). I just
chanced upon the problem and was curious as to what Python was up to.

Thanks for clearing that up.

Jul 19 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.