By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,262 Members | 1,125 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,262 IT Pros & Developers. It's quick & easy.

white spaces in uploaded html file

P: n/a
Hi all,
on one of my sites I want to give the user the possibility to upload a
html file where I want to extract all that is within the <body>-tags.
The upload works fine:

<form id="uploadform" action="index.php" method="post"
enctype="multipart/form-data">
<input type="file" name="Datei" size="30"/>
<input type="submit"/>
</form>

Then I want to parse the uploaded file with:

<?php
if (isset($_FILES['Datei']) and !$_FILES['Datei']['error']) {
$buffer = file_get_Contents($_FILES['Datei']['tmp_name']);
echo "body: ".$buffer."\n";
}
?>

I get a weird result:
body: ’ž< h t m l < h e a d < t i t l e < / t i t l e .....
So there seem to be some white spaces between every character.

And then there is no way to find the <body>-tag.
Neither
echo "sub: ".strpos($buffer, "< b o d y")."\n";
nor
echo "sub: ".strpos($buffer, "<body")."\n";
works. Both show no result.

Can anybody explain me this? How can I parse the file to extract
everything which is within the <body>-Tags (possibly without the white
spaces)?

Thanks a lot,
Langi

Jul 22 '06 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On Sun, 23 Jul 2006 01:16:49 +0200, Matthias Langbein
<ma***************@web.dewrote:
>I get a weird result:
body: ’ž< h t m l < h e a d < t i t l e < / t i t l e .....
So there seem to be some white spaces between every character.
They're not spaces; that's UTF-16 encoded (with a leading BOM character).

What encoding is the original page in? What was the file you uploaded edited
with? You may also want to look at the accept-charset attribute of the <form>
element.

--
Andy Hassall :: an**@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Jul 22 '06 #2

P: n/a
Thanks, that was the problem. I saved the file with UTF-8 and it
worked.

Are you shure that the accept-charset attribute works? I had set it to
UTF-8 or UTF-16 and uploaded a file in UTF-8 and UTF-16, and the
upload worked in all four cases. So I don't really thing that this
contraint is implemented.

Unfortunatelly I have to accept UTF-8 and UTF-16 encoded files. Is
there a way to convert the stream to UTF-8

Thanks for your help!!

On Sun, 23 Jul 2006 00:50:15 +0100, Andy Hassall <an**@andyh.co.uk>
wrote:
>On Sun, 23 Jul 2006 01:16:49 +0200, Matthias Langbein
<ma***************@web.dewrote:
>>I get a weird result:
body: ’ž< h t m l < h e a d < t i t l e < / t i t l e .....
So there seem to be some white spaces between every character.

They're not spaces; that's UTF-16 encoded (with a leading BOM character).

What encoding is the original page in? What was the file you uploaded edited
with? You may also want to look at the accept-charset attribute of the <form>
element.
Jul 23 '06 #3

P: n/a
sp**@outolempi.net || Gedoon-S @ IRCnet || rot13(xv***@bhgbyrzcv.arg)
"Matthias Langbein" <ma***************@web.dewrote in message
news:nr********************************@4ax.com...
Thanks, that was the problem. I saved the file with UTF-8 and it
worked.

Are you shure that the accept-charset attribute works? I had set it to
UTF-8 or UTF-16 and uploaded a file in UTF-8 and UTF-16, and the
upload worked in all four cases. So I don't really thing that this
contraint is implemented.

Unfortunatelly I have to accept UTF-8 and UTF-16 encoded files. Is
there a way to convert the stream to UTF-8

http://php.net/manual/en/ref.mbstring.php

$file = mb_convert_encoding($file, 'UTF-8', 'auto');

--
"Ohjelmoija on organismi joka muuttaa kofeiinia koodiksi" - lpk
http://outolempi.net/ahdistus/ - Satunnaisesti päivittyvä nettisarjis
Jul 24 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.