473,388 Members | 1,408 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

white spaces in uploaded html file

Hi all,
on one of my sites I want to give the user the possibility to upload a
html file where I want to extract all that is within the <body>-tags.
The upload works fine:

<form id="uploadform" action="index.php" method="post"
enctype="multipart/form-data">
<input type="file" name="Datei" size="30"/>
<input type="submit"/>
</form>

Then I want to parse the uploaded file with:

<?php
if (isset($_FILES['Datei']) and !$_FILES['Datei']['error']) {
$buffer = file_get_Contents($_FILES['Datei']['tmp_name']);
echo "body: ".$buffer."\n";
}
?>

I get a weird result:
body: ’ž< h t m l < h e a d < t i t l e < / t i t l e .....
So there seem to be some white spaces between every character.

And then there is no way to find the <body>-tag.
Neither
echo "sub: ".strpos($buffer, "< b o d y")."\n";
nor
echo "sub: ".strpos($buffer, "<body")."\n";
works. Both show no result.

Can anybody explain me this? How can I parse the file to extract
everything which is within the <body>-Tags (possibly without the white
spaces)?

Thanks a lot,
Langi

Jul 22 '06 #1
3 2306
On Sun, 23 Jul 2006 01:16:49 +0200, Matthias Langbein
<ma***************@web.dewrote:
>I get a weird result:
body: ’ž< h t m l < h e a d < t i t l e < / t i t l e .....
So there seem to be some white spaces between every character.
They're not spaces; that's UTF-16 encoded (with a leading BOM character).

What encoding is the original page in? What was the file you uploaded edited
with? You may also want to look at the accept-charset attribute of the <form>
element.

--
Andy Hassall :: an**@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Jul 22 '06 #2
Thanks, that was the problem. I saved the file with UTF-8 and it
worked.

Are you shure that the accept-charset attribute works? I had set it to
UTF-8 or UTF-16 and uploaded a file in UTF-8 and UTF-16, and the
upload worked in all four cases. So I don't really thing that this
contraint is implemented.

Unfortunatelly I have to accept UTF-8 and UTF-16 encoded files. Is
there a way to convert the stream to UTF-8

Thanks for your help!!

On Sun, 23 Jul 2006 00:50:15 +0100, Andy Hassall <an**@andyh.co.uk>
wrote:
>On Sun, 23 Jul 2006 01:16:49 +0200, Matthias Langbein
<ma***************@web.dewrote:
>>I get a weird result:
body: ’ž< h t m l < h e a d < t i t l e < / t i t l e .....
So there seem to be some white spaces between every character.

They're not spaces; that's UTF-16 encoded (with a leading BOM character).

What encoding is the original page in? What was the file you uploaded edited
with? You may also want to look at the accept-charset attribute of the <form>
element.
Jul 23 '06 #3
sp**@outolempi.net || Gedoon-S @ IRCnet || rot13(xv***@bhgbyrzcv.arg)
"Matthias Langbein" <ma***************@web.dewrote in message
news:nr********************************@4ax.com...
Thanks, that was the problem. I saved the file with UTF-8 and it
worked.

Are you shure that the accept-charset attribute works? I had set it to
UTF-8 or UTF-16 and uploaded a file in UTF-8 and UTF-16, and the
upload worked in all four cases. So I don't really thing that this
contraint is implemented.

Unfortunatelly I have to accept UTF-8 and UTF-16 encoded files. Is
there a way to convert the stream to UTF-8

http://php.net/manual/en/ref.mbstring.php

$file = mb_convert_encoding($file, 'UTF-8', 'auto');

--
"Ohjelmoija on organismi joka muuttaa kofeiinia koodiksi" - lpk
http://outolempi.net/ahdistus/ - Satunnaisesti päivittyvä nettisarjis
Jul 24 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: \Dandy\ Randy | last post by:
Hello everyone. I have been following misc posts, as well as reading several FAQ's on this issue, unfortunatley I cannot locate a solution. I am hoping that someone will be able to provide me with...
2
by: Jerry Sievers | last post by:
tried to avoid using PRE in the page markup and instead used DIV CLASS=foo and assigned the white-space pre property to it. have some reports already that text is not showing as preformatted. ...
4
by: ucfcpegirl06 | last post by:
Hi, I need help getting rid of trailing white spaces. I am searching a file for various data (not important) and retrieving it. I output the data if found to a file. An example would be:...
11
by: gopal srinivasan | last post by:
Hi, I have a text like this - "This is a message containing tabs and white spaces" Now this text contains tabs and white spaces. I want remove the tabs and white...
3
by: Prince | last post by:
I have some <RequiredFieldValidator> on my page and everything works fine except that there are lots of white spaces between the web server controls that are being validated. I've set the Display...
4
by: Andreas Prilop | last post by:
How many spaces should be displayed in A <span style="display:none">x</span> B between "A" and "B"? I notice that Mozilla displays one space and Internet Explorer (5 & 6) displays two spaces....
12
by: JA | last post by:
Is there a way to remove all the white space in the fields? I have been using Find-and-replace - looking for 2 or 3 or 4 or 10 spaces and replacing them with none. I don't want to replace single...
2
by: delyan.nestorov | last post by:
Hi All, I have the following problem: I read lines from DXF file ( AutoCAD format file ). Then I need to remove white spaces from lines to continue working on data i.e. converting from string...
2
by: royzlife | last post by:
Hi, I am running a command like the one given below through a process: abc d:\myfolder\xyz_%s.*.log . -now the %s puts a string which may or may not contain white spaces... e,g abc...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.