473,793 Members | 2,922 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

white spaces in uploaded html file

Hi all,
on one of my sites I want to give the user the possibility to upload a
html file where I want to extract all that is within the <body>-tags.
The upload works fine:

<form id="uploadform " action="index.p hp" method="post"
enctype="multip art/form-data">
<input type="file" name="Datei" size="30"/>
<input type="submit"/>
</form>

Then I want to parse the uploaded file with:

<?php
if (isset($_FILES['Datei']) and !$_FILES['Datei']['error']) {
$buffer = file_get_Conten ts($_FILES['Datei']['tmp_name']);
echo "body: ".$buffer." \n";
}
?>

I get a weird result:
body: ÿþ< h t m l < h e a d < t i t l e < / t i t l e .....
So there seem to be some white spaces between every character.

And then there is no way to find the <body>-tag.
Neither
echo "sub: ".strpos($buffe r, "< b o d y")."\n";
nor
echo "sub: ".strpos($buffe r, "<body")."\ n";
works. Both show no result.

Can anybody explain me this? How can I parse the file to extract
everything which is within the <body>-Tags (possibly without the white
spaces)?

Thanks a lot,
Langi

Jul 22 '06 #1
3 2331
On Sun, 23 Jul 2006 01:16:49 +0200, Matthias Langbein
<ma************ ***@web.dewrote :
>I get a weird result:
body: ÿþ< h t m l < h e a d < t i t l e < / t i t l e .....
So there seem to be some white spaces between every character.
They're not spaces; that's UTF-16 encoded (with a leading BOM character).

What encoding is the original page in? What was the file you uploaded edited
with? You may also want to look at the accept-charset attribute of the <form>
element.

--
Andy Hassall :: an**@andyh.co.u k :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Jul 22 '06 #2
Thanks, that was the problem. I saved the file with UTF-8 and it
worked.

Are you shure that the accept-charset attribute works? I had set it to
UTF-8 or UTF-16 and uploaded a file in UTF-8 and UTF-16, and the
upload worked in all four cases. So I don't really thing that this
contraint is implemented.

Unfortunatelly I have to accept UTF-8 and UTF-16 encoded files. Is
there a way to convert the stream to UTF-8

Thanks for your help!!

On Sun, 23 Jul 2006 00:50:15 +0100, Andy Hassall <an**@andyh.co. uk>
wrote:
>On Sun, 23 Jul 2006 01:16:49 +0200, Matthias Langbein
<ma*********** ****@web.dewrot e:
>>I get a weird result:
body: ÿþ< h t m l < h e a d < t i t l e < / t i t l e .....
So there seem to be some white spaces between every character.

They're not spaces; that's UTF-16 encoded (with a leading BOM character).

What encoding is the original page in? What was the file you uploaded edited
with? You may also want to look at the accept-charset attribute of the <form>
element.
Jul 23 '06 #3
sp**@outolempi. net || Gedoon-S @ IRCnet || rot13(xv***@bhg byrzcv.arg)
"Matthias Langbein" <ma************ ***@web.dewrote in message
news:nr******** *************** *********@4ax.c om...
Thanks, that was the problem. I saved the file with UTF-8 and it
worked.

Are you shure that the accept-charset attribute works? I had set it to
UTF-8 or UTF-16 and uploaded a file in UTF-8 and UTF-16, and the
upload worked in all four cases. So I don't really thing that this
contraint is implemented.

Unfortunatelly I have to accept UTF-8 and UTF-16 encoded files. Is
there a way to convert the stream to UTF-8

http://php.net/manual/en/ref.mbstring.php

$file = mb_convert_enco ding($file, 'UTF-8', 'auto');

--
"Ohjelmoija on organismi joka muuttaa kofeiinia koodiksi" - lpk
http://outolempi.net/ahdistus/ - Satunnaisesti päivittyvä nettisarjis
Jul 24 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
4264
by: \Dandy\ Randy | last post by:
Hello everyone. I have been following misc posts, as well as reading several FAQ's on this issue, unfortunatley I cannot locate a solution. I am hoping that someone will be able to provide me with the simple answer. My problem has to do with the leading white spaces after the first line when calling data using the @ variable. Here is my code: open (PREVIEW, "<preview.txt") or &error("Unable to open the data file for reading"); flock...
2
3397
by: Jerry Sievers | last post by:
tried to avoid using PRE in the page markup and instead used DIV CLASS=foo and assigned the white-space pre property to it. have some reports already that text is not showing as preformatted. looks ok with mozilla 1.4 though. http://www.jerrysievers.com/Projects/UPSMon/?title=Makefile&file=Makefile comments please.
4
3225
by: ucfcpegirl06 | last post by:
Hi, I need help getting rid of trailing white spaces. I am searching a file for various data (not important) and retrieving it. I output the data if found to a file. An example would be: HD='Three blind mice ' My output would consist of: Three Blind Mice (w/ all the white space behind being printed until the quote mark is reached) I don't the white space after the text to be printed.
11
15020
by: gopal srinivasan | last post by:
Hi, I have a text like this - "This is a message containing tabs and white spaces" Now this text contains tabs and white spaces. I want remove the tabs and white spaces(if it more than once between two words). Is there any function we have in C which will find out the tabs and white spaces and returns the text in the follwong way -
3
3369
by: Prince | last post by:
I have some <RequiredFieldValidator> on my page and everything works fine except that there are lots of white spaces between the web server controls that are being validated. I've set the Display properties for all the controls to "Dynamic" and still I can't get rid of the white spaces between controls. It's as if there are bunch of <br> tags separating the controls. For example,the "HTML" look similar to this.
4
3600
by: Andreas Prilop | last post by:
How many spaces should be displayed in A <span style="display:none">x</span> B between "A" and "B"? I notice that Mozilla displays one space and Internet Explorer (5 & 6) displays two spaces. See http://www.unics.uni-hannover.de/nhtcapri/temp/white-space.html for a cumulative effect. Why is this important? Instead of "span", think of the "rp" element
12
7225
by: JA | last post by:
Is there a way to remove all the white space in the fields? I have been using Find-and-replace - looking for 2 or 3 or 4 or 10 spaces and replacing them with none. I don't want to replace single spaces, those are the spaces between the words. But most of what is in the fields has been cut-n-pasted from online forms, and the results can be VERY spread out. I could probably save 80% of the space if I could get rid of the extra white space....
2
1691
by: delyan.nestorov | last post by:
Hi All, I have the following problem: I read lines from DXF file ( AutoCAD format file ). Then I need to remove white spaces from lines to continue working on data i.e. converting from string to int and so on. My StripWhiteSpace function works in test program: #include <iostream> #include <string>
2
1343
by: royzlife | last post by:
Hi, I am running a command like the one given below through a process: abc d:\myfolder\xyz_%s.*.log . -now the %s puts a string which may or may not contain white spaces... e,g abc d:\myfolder\xyz_EARTH LIFE.*.log how should i confirm that that the entire string should be taken into consideration along with the white space in between to run the command abc on the file xyz_EARTH LIFE.123.log?
0
9671
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10433
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10000
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9035
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7538
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6777
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5436
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4112
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3720
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.