473,399 Members | 4,177 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

Parsing HTTP messages

My initial question is, what Python library do I use to parse HTTP
messages?

Trying to use the "email" module has led me to another question and I'm
not even sure where to ask this second question.

When I parse an HTTP request using the email module, I get a field name of
"GET http:", which isn't a field name or part of header at all but part of
the "start-line" (request line) of the HTTP request.

Checking the HTTP/1.1 spec (RFC 1616) I find that HTTP messages use the
generic message format of RFC 822 (obsoleted by RFC 2822) and that:

"Both types of message consist of a start-line, one or more header fields
(also known as 'headers'), an empty line (i.e., a line with nothing
preceding the CRLF) indicating the end of the header fields, and an
optional message-body."

But my understanding of RFC (2)822 is that there is no such thing as a
"start-line" in that format, and so the "email" module is right in trying
to treat the HTTP "start-line" as a header and that that start-line should
be stripped out before feeding it the remainder of the message which _is_
in (2)822 format.

Am I (don't laugh) missing something here?

Chris Gray
Jul 18 '05 #1
3 3814
Chris Gray <cp****@library.uwaterloo.ca> writes:
My initial question is, what Python library do I use to parse HTTP
messages?


httplib

Jul 18 '05 #2
Chris Gray:
what Python library do I use to parse HTTP messages?
http://www.python.org/doc/2.2.3/lib/module-httplib.html

Or a library depending on the format of the payload, if that's what you
mean. For example, if the message contains HTML:

http://www.python.org/doc/2.2.3/lib/...TMLParser.html
When I parse an HTTP request using the email module


Que?

--
René Pijlman
Jul 18 '05 #3
Chris Gray wrote:
My initial question is, what Python library do I use to parse HTTP
messages?
mimetools.Message is a good choice. httplib.HTTPHeader is a slightly
better choice (it's a subclass of mimetools.Message; see the httplib.py
source code for more info)
But my understanding of RFC (2)822 is that there is no such thing as a
"start-line" in that format, and so the "email" module is right in trying
to treat the HTTP "start-line" as a header and that that start-line should
be stripped out before feeding it the remainder of the message which _is_
in (2)822 format.

Am I (don't laugh) missing something here?


not really, as long as "stripped out" means "processed", not "ignored"
(the start line contains the HTTP method and the target URL)

</F>


Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: dont bother | last post by:
Hey, I have been trying to parse emails: But I could not find any examples or snippets of parsing emails in python from the documentation. Google did not help me much too. I am trying to...
4
by: Dana | last post by:
Hello. I'm new to XML and I have a question that looks to me like it should be easy to answer, but I have not found anything by searching google at all on this. We are using msxml.dll...
6
by: Thomas Polan | last post by:
Sorry if this has been posted before... I am receiving XML messages over a TCP client. Messages vary in size and sometimes can arrive in groups. Thus, I am not guaranteed to receive a full...
5
by: jwang | last post by:
I'm currently writing some C code that uses libxml. I've seen several example of parsing xml when the xml are in files. However, I would like to parse the xml from a char buffer. Currently I am ...
1
by: yonido | last post by:
hello, my goal is to get patterns out of email files - say "message forwarding" patterns (message forwarded from: xx to: yy subject: zz) now lets say there are tons of these patterns (by gmail,...
0
by: emf | last post by:
Dearest mail manipulating macaques and perambulating python prestidigitators, I have been blessed by the grace of Google and so am working full-time on improving Mailman's web UI: ...
4
by: Jim Langston | last post by:
In my program I am accepting messages over the network and parsing them. I find that the function that does this has gotten quite big, and so want to break the if else code into functions. I...
9
by: Paulers | last post by:
Hello, I have a log file that contains many multi-line messages. What is the best approach to take for extracting data out of each message and populating object properties to be stored in an...
3
by: anush | last post by:
I have a a few log files in a directory. Each of these log files have status messages of the system. Eg: 2007-06-07 14:30 Critical 2007-06-07 14:40 Error What I need to do is list all the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.