473,703 Members | 3,037 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Parsing Multipart formdata

Greeting,
I am writing my own web server and having some problme parsing the the
mulitpart/form-data
stream that is sent from the browsers.

I have a form looks something like this

<form action="process .dll>
<input type=file name=fileupload </input>
</form>

So when I choose the local file from the browser, and click submit it
will take me to the process.dll file.

The browser will send a post request to the server with the Headers
looks something like this

-------------Start REQUEST Headers--------------
Content-Length : 28624
Content-Type : multipart/form-data;
boundary=---------------------------3765104465873
Connection : keep-alive
Cookie : SESSION=cPnKc7P mT8wdsy+:ccPnKl JF1Af1d
Host : localhost:9000
Referer : http://localhost:80/ajaxupload.html
User-Agent : Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1)
Gecko/20061010 Firefox/2.0
url : /backend/fileupload/test
Accept-Language : en-us,en;q=0.5
Accept-Charset : ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept :
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,ima ge/png,*/*;q=0.5
Accept-Encoding : gzip,deflate
Keep-Alive : 300
method : POST
-----------------------------3765104465873
Content-Disposition: form-data; name="filename" ; filename="revie w
form.doc"
Content-Type: application/msword

Some binary contetn blah blah

-----------------------------3765104465873--

I can get my stream reader to read up to the application/msword, or in
another word begning of the binary stream, however I have no way to
know how many bytes to read in.. or the length of the binary content of
the current part.

Please note I have no access to ASP.NET library as i am using my own
webserver.

Any hints and/or comments are appreciated.


Regard,

Nov 15 '06 #1
3 17360
The other part of "multipart" is MIME. If you Google various MIME
details, you can find lots more information.

Basically, it works like this:

1. Read the header, look for the "boundary" tag.

2. Read the string out of the boundary tag.

3. Keep reading the header, until you found the boundary, the string in
the boundary tag (in this case, it's
"---------------------------3765104465873" and it's normally going to
be a whole pile of hyphen characters followed by a number, just like
that.

4. Start reading the part header, save that for future reference.

5. Keep reading part header until you find a newline. What a newline
looks like depends on your system, and there doesn't seem to be a
standard. It'll be some collection of \n (newline) and \r (carriage
return).

6. Start reading data into a string buffer.

7. Stop reading into the string buffer when you see the boundary again.

8. Un-Base64-encode the contents of your string buffer. This should
give you an array of bytes. The array of bytes is your binary data. I
seem to remember there being a framework Base64 codec.

As you see, your data isn't really binary. It's Base64, which
constitutes a text (I believe ASCII, but I'm a little rusty on that)
representation of the binary data. Rip is out of the headers and decode
it to get your stream of bytes, then you can write them to disk or
whatever.

HTH. Please ask questions if any of those steps don't make sense to
you. I have done this many times, likely as have many others who read
this board. There are lots of little nuances that can make or break
your application.
Stephan

Cu********@gmai l.com wrote:
Greeting,
I am writing my own web server and having some problme parsing the the
mulitpart/form-data
stream that is sent from the browsers.

I have a form looks something like this

<form action="process .dll>
<input type=file name=fileupload </input>
</form>

So when I choose the local file from the browser, and click submit it
will take me to the process.dll file.

The browser will send a post request to the server with the Headers
looks something like this

-------------Start REQUEST Headers--------------
Content-Length : 28624
Content-Type : multipart/form-data;
boundary=---------------------------3765104465873
Connection : keep-alive
Cookie : SESSION=cPnKc7P mT8wdsy+:ccPnKl JF1Af1d
Host : localhost:9000
Referer : http://localhost:80/ajaxupload.html
User-Agent : Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1)
Gecko/20061010 Firefox/2.0
url : /backend/fileupload/test
Accept-Language : en-us,en;q=0.5
Accept-Charset : ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept :
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,ima ge/png,*/*;q=0.5
Accept-Encoding : gzip,deflate
Keep-Alive : 300
method : POST
-----------------------------3765104465873
Content-Disposition: form-data; name="filename" ; filename="revie w
form.doc"
Content-Type: application/msword

Some binary contetn blah blah

-----------------------------3765104465873--

I can get my stream reader to read up to the application/msword, or in
another word begning of the binary stream, however I have no way to
know how many bytes to read in.. or the length of the binary content of
the current part.

Please note I have no access to ASP.NET library as i am using my own
webserver.

Any hints and/or comments are appreciated.


Regard,
Nov 16 '06 #2
Hello Stephen,

I can get my code to do up to step 5 of your algorithm, which is to
readin in the headers eg: filename and contentType. After this, there
will be a \r and \n which consititue of 2 bytes.

Now, after reading this \r \n which seperate the part header and part
data, my stream position is now at the BEGINING of the binary stream or
base64 encoded stream as you mentioned.

And here is where I want to clarify something:
6. Start reading data into a string buffer.
So I can just have a string object then have buffer.readline ? And for
each of the line i compare it if it contains the boundary?

Is my data integrity will be broken, eg file corruption if i read the
stream in as string, then base64decode it, then convert it to bytes
array.

If reading the stream in doesnt break the data integrity, I think it
should be good because I can just read the wholething in with
stream.ReadToEn d(); then use regular expression or string.split to
split the multiparts to different part.

Kind regard,

Nov 16 '06 #3
Cu********@gmai l.com wrote:
I can get my code to do up to step 5 of your algorithm, which is to
readin in the headers eg: filename and contentType. After this, there
will be a \r and \n which consititue of 2 bytes.
In general, be a little wary about line breaks. If you're designing a
system that will read only from one source, you're fine. If you're
reading from more than one source, you may not always get \r\n, and, if
you do, they may not be in that order.

That's a small concern, though.
So I can just have a string object then have buffer.readline ? And for
each of the line i compare it if it contains the boundary?
Yes, but you probably want to use a StringBuilder.
Is my data integrity will be broken, eg file corruption if i read the
stream in as string, then base64decode it, then convert it to bytes
array.
That'll work fine. Just make sure you're feeding only the
Base64-encoded data to the decoder or it'll choke.
If reading the stream in doesnt break the data integrity, I think it
should be good because I can just read the wholething in with
stream.ReadToEn d(); then use regular expression or string.split to
split the multiparts to different part.
You can do that.

In fact, you can ReadToEnd() from the beginning if you want and regex
or split into pieces. Since System.Convert doesn't have stream
processing methods anyway, you're going to end up putting everything
into a big string anyway.

You may be able to specify a single regex that'll take the whole
message and split out just the Base64 stuff, or find such a regex on
the Internet somewhere. If I wasn't behind on my project deadline, I'd
write you one myself as it seems like an interesting problem.
Stephan

Nov 16 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
17327
by: Mark Waser | last post by:
Hi all, I'm trying to post multipart/form-data to a web page but seem to have run into a wall. I'm familiar with RFC 1867 and have done this before (with AOLServer and Tcl) but just can't seem to get it to work in Visual Basic. I tried coding it once myself from scratch and then modified a class that I found on a newsgroup (referenced below). Both seem to be doing the same thing and neither works (or rather, they seem to work but the...
0
1638
by: | last post by:
Hello, Is anyone have an example of RegExp expression to parse .EML files (Email Message)? I need to extract headers, HTML body, Textual body and attachments if any exists. I did some example, but not sure that its a good start: ^Message-ID: (?<messageid>.*)\nFrom: (?<from>.*)\nTo: (?<to>.*)\nSubject: (?<subject>.*)\nDate: (?<date>.*)\nMIME-Version: (?<mime>.*)\nContent-Type:
3
5954
by: Vijay | last post by:
Hi, In my application i have to parse through a multipart response. In my case the first part is a xml portion and second part is byte stream portion. How can i parse through the various parts in the response and get the values in each part.
0
1427
by: Li-fan Chen | last post by:
Hi, We work with email in a large CRM solution and one of the email-related tasks that has plagued us is our decision to make use of a 3rd-party local-sourcer to work on the parsing of inbound email. It would appear to be a simple exercise (writing a parser against a select few RFCs), but having someone write this component NIH (doing it by hand, instead of using a 3rd party component) has caused endless problems. We are hoping to right...
2
14719
by: Cuong.Tong | last post by:
Greeting, Can anyone give me some hints about parsing the mulitpart/form-data stream? I have a form looks something like this <form action="process.dll> <input type=file name=fileupload</input> </form>
0
2792
by: sachintandon | last post by:
Hello all, Thanks in advance for your help I have a problem in sending emails, my requirement is to send multipart alternative emails with attachments, I'm able to send text with attachments or HTML mails with attachments, but some mail clients are not able to display the html mails that is why I need to send multipart alternative emails, when I used multipart/alternative then I'm not able to send attachments and if I use the multipart/mixed...
6
10566
by: fnoppie | last post by:
Hi, I am near to desperation as I have a million things to get a solution for my problem. I have to post a multipart message to a url that consists of a xml file and an binary file (pdf). Seperately the posting words fine but when I want to create one multipart message with both then things go wrong. The binary file is converted and of datatype byte() The xml file is just a string.
1
4414
by: WeCi2i | last post by:
Okay, I have a problem that has been stumping me for weeks. I have tried many different solutions and this is pretty much my last resort. I have seen a lot of good answers give here so I figured I would give it a try. First of all, I am using Visual Studio 2005 to write my program. I am using C# .NET as the language. I am running Windows XP Professional with all service packs and updates applied. Now, I have been trying to write a...
3
3836
by: Steven Allport | last post by:
I am working on processing eml email message using the email module (python 2.5), on files exported from an Outlook PST file, to extract the composite parts of the email. In most instances this works fine, the message is read in using message_from_file, is_multipart returns True and I can process each component and extract message attachments. I am however running into problem with email messages that contain emails forwarded as...
0
8669
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9251
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9017
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8963
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7872
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4433
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4687
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3125
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2453
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.