473,503 Members | 1,655 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Parsing Multipart formdata

Greeting,
I am writing my own web server and having some problme parsing the the
mulitpart/form-data
stream that is sent from the browsers.

I have a form looks something like this

<form action="process.dll>
<input type=file name=fileupload</input>
</form>

So when I choose the local file from the browser, and click submit it
will take me to the process.dll file.

The browser will send a post request to the server with the Headers
looks something like this

-------------Start REQUEST Headers--------------
Content-Length : 28624
Content-Type : multipart/form-data;
boundary=---------------------------3765104465873
Connection : keep-alive
Cookie : SESSION=cPnKc7PmT8wdsy+:ccPnKlJF1Af1d
Host : localhost:9000
Referer : http://localhost:80/ajaxupload.html
User-Agent : Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1)
Gecko/20061010 Firefox/2.0
url : /backend/fileupload/test
Accept-Language : en-us,en;q=0.5
Accept-Charset : ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept :
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Encoding : gzip,deflate
Keep-Alive : 300
method : POST
-----------------------------3765104465873
Content-Disposition: form-data; name="filename"; filename="review
form.doc"
Content-Type: application/msword

Some binary contetn blah blah

-----------------------------3765104465873--

I can get my stream reader to read up to the application/msword, or in
another word begning of the binary stream, however I have no way to
know how many bytes to read in.. or the length of the binary content of
the current part.

Please note I have no access to ASP.NET library as i am using my own
webserver.

Any hints and/or comments are appreciated.


Regard,

Nov 15 '06 #1
3 17236
The other part of "multipart" is MIME. If you Google various MIME
details, you can find lots more information.

Basically, it works like this:

1. Read the header, look for the "boundary" tag.

2. Read the string out of the boundary tag.

3. Keep reading the header, until you found the boundary, the string in
the boundary tag (in this case, it's
"---------------------------3765104465873" and it's normally going to
be a whole pile of hyphen characters followed by a number, just like
that.

4. Start reading the part header, save that for future reference.

5. Keep reading part header until you find a newline. What a newline
looks like depends on your system, and there doesn't seem to be a
standard. It'll be some collection of \n (newline) and \r (carriage
return).

6. Start reading data into a string buffer.

7. Stop reading into the string buffer when you see the boundary again.

8. Un-Base64-encode the contents of your string buffer. This should
give you an array of bytes. The array of bytes is your binary data. I
seem to remember there being a framework Base64 codec.

As you see, your data isn't really binary. It's Base64, which
constitutes a text (I believe ASCII, but I'm a little rusty on that)
representation of the binary data. Rip is out of the headers and decode
it to get your stream of bytes, then you can write them to disk or
whatever.

HTH. Please ask questions if any of those steps don't make sense to
you. I have done this many times, likely as have many others who read
this board. There are lots of little nuances that can make or break
your application.
Stephan

Cu********@gmail.com wrote:
Greeting,
I am writing my own web server and having some problme parsing the the
mulitpart/form-data
stream that is sent from the browsers.

I have a form looks something like this

<form action="process.dll>
<input type=file name=fileupload</input>
</form>

So when I choose the local file from the browser, and click submit it
will take me to the process.dll file.

The browser will send a post request to the server with the Headers
looks something like this

-------------Start REQUEST Headers--------------
Content-Length : 28624
Content-Type : multipart/form-data;
boundary=---------------------------3765104465873
Connection : keep-alive
Cookie : SESSION=cPnKc7PmT8wdsy+:ccPnKlJF1Af1d
Host : localhost:9000
Referer : http://localhost:80/ajaxupload.html
User-Agent : Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1)
Gecko/20061010 Firefox/2.0
url : /backend/fileupload/test
Accept-Language : en-us,en;q=0.5
Accept-Charset : ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept :
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Encoding : gzip,deflate
Keep-Alive : 300
method : POST
-----------------------------3765104465873
Content-Disposition: form-data; name="filename"; filename="review
form.doc"
Content-Type: application/msword

Some binary contetn blah blah

-----------------------------3765104465873--

I can get my stream reader to read up to the application/msword, or in
another word begning of the binary stream, however I have no way to
know how many bytes to read in.. or the length of the binary content of
the current part.

Please note I have no access to ASP.NET library as i am using my own
webserver.

Any hints and/or comments are appreciated.


Regard,
Nov 16 '06 #2
Hello Stephen,

I can get my code to do up to step 5 of your algorithm, which is to
readin in the headers eg: filename and contentType. After this, there
will be a \r and \n which consititue of 2 bytes.

Now, after reading this \r \n which seperate the part header and part
data, my stream position is now at the BEGINING of the binary stream or
base64 encoded stream as you mentioned.

And here is where I want to clarify something:
6. Start reading data into a string buffer.
So I can just have a string object then have buffer.readline ? And for
each of the line i compare it if it contains the boundary?

Is my data integrity will be broken, eg file corruption if i read the
stream in as string, then base64decode it, then convert it to bytes
array.

If reading the stream in doesnt break the data integrity, I think it
should be good because I can just read the wholething in with
stream.ReadToEnd(); then use regular expression or string.split to
split the multiparts to different part.

Kind regard,

Nov 16 '06 #3
Cu********@gmail.com wrote:
I can get my code to do up to step 5 of your algorithm, which is to
readin in the headers eg: filename and contentType. After this, there
will be a \r and \n which consititue of 2 bytes.
In general, be a little wary about line breaks. If you're designing a
system that will read only from one source, you're fine. If you're
reading from more than one source, you may not always get \r\n, and, if
you do, they may not be in that order.

That's a small concern, though.
So I can just have a string object then have buffer.readline ? And for
each of the line i compare it if it contains the boundary?
Yes, but you probably want to use a StringBuilder.
Is my data integrity will be broken, eg file corruption if i read the
stream in as string, then base64decode it, then convert it to bytes
array.
That'll work fine. Just make sure you're feeding only the
Base64-encoded data to the decoder or it'll choke.
If reading the stream in doesnt break the data integrity, I think it
should be good because I can just read the wholething in with
stream.ReadToEnd(); then use regular expression or string.split to
split the multiparts to different part.
You can do that.

In fact, you can ReadToEnd() from the beginning if you want and regex
or split into pieces. Since System.Convert doesn't have stream
processing methods anyway, you're going to end up putting everything
into a big string anyway.

You may be able to specify a single regex that'll take the whole
message and split out just the Base64 stuff, or find such a regex on
the Internet somewhere. If I wasn't behind on my project deadline, I'd
write you one myself as it seems like an interesting problem.
Stephan

Nov 16 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
17255
by: Mark Waser | last post by:
Hi all, I'm trying to post multipart/form-data to a web page but seem to have run into a wall. I'm familiar with RFC 1867 and have done this before (with AOLServer and Tcl) but just can't seem...
0
1618
by: | last post by:
Hello, Is anyone have an example of RegExp expression to parse .EML files (Email Message)? I need to extract headers, HTML body, Textual body and attachments if any exists. I did some...
3
5931
by: Vijay | last post by:
Hi, In my application i have to parse through a multipart response. In my case the first part is a xml portion and second part is byte stream portion. How can i parse through the various...
0
1409
by: Li-fan Chen | last post by:
Hi, We work with email in a large CRM solution and one of the email-related tasks that has plagued us is our decision to make use of a 3rd-party local-sourcer to work on the parsing of inbound...
2
14704
by: Cuong.Tong | last post by:
Greeting, Can anyone give me some hints about parsing the mulitpart/form-data stream? I have a form looks something like this <form action="process.dll> <input type=file...
0
2768
by: sachintandon | last post by:
Hello all, Thanks in advance for your help I have a problem in sending emails, my requirement is to send multipart alternative emails with attachments, I'm able to send text with attachments or...
6
10539
by: fnoppie | last post by:
Hi, I am near to desperation as I have a million things to get a solution for my problem. I have to post a multipart message to a url that consists of a xml file and an binary file (pdf)....
1
4402
by: WeCi2i | last post by:
Okay, I have a problem that has been stumping me for weeks. I have tried many different solutions and this is pretty much my last resort. I have seen a lot of good answers give here so I figured I...
3
3819
by: Steven Allport | last post by:
I am working on processing eml email message using the email module (python 2.5), on files exported from an Outlook PST file, to extract the composite parts of the email. In most instances this...
0
7202
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
1
6991
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5578
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4672
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3167
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3154
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1512
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
736
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
380
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.