469,356 Members | 1,973 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,356 developers. It's quick & easy.

UTF-8 file reading and writing for PHP

I'm creating a page that:
- accepts user input in whatever language
- saves that input to a file
- reads the file and displays the original input

The following code successfully writes the user input to a file (when I
open the file, it's in the correct font), but I can't get PHP to read
the file and display the correct characters.

HTML --------------- Form
<FORM name=saveform method=post action="wiki.php">
File:
<TEXTAREA name=thetext rows=20 cols=30></TEXTAREA>
</TEXTAREA>
<INPUT type=submit>
<INPUT type=hidden name=action value="save">
</FORM>

PHP --------------- Sticks the data in a file
$message = $_REQUEST['thetext'];
echo $message; // This displays the correct stuff
$filename = "tmp/tmp.txt";
$fr = fopen($filename, "wb+");
// adding header
fwrite($fr, pack("CCC",0xef,0xbb,0xbf));
fputs($fr, $message);
fclose($fr);

PHP --------------- Read the data from the file
$thefile = file($filename);
array_shift($thefile); //To get rid of the BOM
$ret = "";
foreach ($thefile as $i => $line) {
$line = rtrim(utf8_decode($line));
$ret .= $line;
}
echo $ret; // This _doesn't_ display the correct stuff

Jun 3 '06 #1
6 18436
HaggMan wrote:
I'm creating a page that:
- accepts user input in whatever language
- saves that input to a file
- reads the file and displays the original input

The following code successfully writes the user input to a file (when I
open the file, it's in the correct font), but I can't get PHP to read
the file and display the correct characters. [snip] PHP --------------- Sticks the data in a file
$message = $_REQUEST['thetext'];
echo $message; // This displays the correct stuff
$filename = "tmp/tmp.txt";
$fr = fopen($filename, "wb+");
// adding header
fwrite($fr, pack("CCC",0xef,0xbb,0xbf));
Is it safe to assume the data to be UTF-8?

If you just discard the byteordermark later, there's little reason to
add it (if there ever was).
fputs($fr, $message);
fclose($fr);

PHP --------------- Read the data from the file
$thefile = file($filename);
array_shift($thefile); //To get rid of the BOM
BOM = 3 bytes
$thefile = array of lines of text terminated by newline.
$ret = "";
foreach ($thefile as $i => $line) {
$line = rtrim(utf8_decode($line));
I am not sure what to make of this. If you expect the browser to send
data in utf-8, then I would assume you serve your pages in utf-8, then
why convert the text to iso8859-1?
$ret .= $line;
}
echo $ret; // This _doesn't_ display the correct stuff


Start with simple file_put_contents and file_get_contents.
/Bent
Jun 3 '06 #2
Thanks for the reply...

My goal is to allow user input in UTF-8, in Arabic script, for example.
I then save what they input to a file. Then I'd like to retrieve and
print out the original stuff that they wrote.

I've tried various variations of utf8_encode() and utf8_decode() and
even without them, and every time, the resulting stuff is just ????? or
other weird characters.

Bent Stigsen wrote:
HaggMan wrote:
I'm creating a page that:
- accepts user input in whatever language
- saves that input to a file
- reads the file and displays the original input

The following code successfully writes the user input to a file (when I
open the file, it's in the correct font), but I can't get PHP to read
the file and display the correct characters.

[snip]
PHP --------------- Sticks the data in a file
$message = $_REQUEST['thetext'];
echo $message; // This displays the correct stuff
$filename = "tmp/tmp.txt";
$fr = fopen($filename, "wb+");
// adding header
fwrite($fr, pack("CCC",0xef,0xbb,0xbf));


Is it safe to assume the data to be UTF-8?

If you just discard the byteordermark later, there's little reason to
add it (if there ever was).
fputs($fr, $message);
fclose($fr);

PHP --------------- Read the data from the file
$thefile = file($filename);
array_shift($thefile); //To get rid of the BOM


BOM = 3 bytes
$thefile = array of lines of text terminated by newline.
$ret = "";
foreach ($thefile as $i => $line) {
$line = rtrim(utf8_decode($line));


I am not sure what to make of this. If you expect the browser to send
data in utf-8, then I would assume you serve your pages in utf-8, then
why convert the text to iso8859-1?
$ret .= $line;
}
echo $ret; // This _doesn't_ display the correct stuff


Start with simple file_put_contents and file_get_contents.
/Bent


Jun 3 '06 #3
HaggMan wrote:
Thanks for the reply...

My goal is to allow user input in UTF-8, in Arabic script, for example.
I then save what they input to a file. Then I'd like to retrieve and
print out the original stuff that they wrote.

I've tried various variations of utf8_encode() and utf8_decode() and
even without them, and every time, the resulting stuff is just ????? or
other weird characters.

[snip]

If what you get sent is in UTF-8 and the page you send out is in
UTF-8, then you don't need to do anything.

Make sure what you get really is UTF-8 (e.g.
"mb_detect_encoding($thetext)" )

Also make sure the browser is told it is UTF-8. (check headers,
metatags, xml-declaration)
/Bent
Jun 4 '06 #4
HaggMan wrote:
I'm creating a page that:
- accepts user input in whatever language
- saves that input to a file
- reads the file and displays the original input

The following code successfully writes the user input to a file (when I
open the file, it's in the correct font), but I can't get PHP to read
the file and display the correct characters.

<snip>

Save your (processing) PHP script in UTF-8.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Jun 4 '06 #5
Thank you, both of you, for your help... I finally figured out what
does it:

The part I didn't mention (go figure, I thought it was harmless) is
that I was doing some str_replaces (with regex) on the UTF8 stuff, and
that is what messed up the output. So, when I accept the normal input,
save it to the file, and throw out the normal output, it comes out
exactly as I typed it (I'm testing with Arabic).

So I guess a followup question would be: How do I parse through UTF8
stuff with regexpressions? I'll do some research tomorrow.

Thank you again!

HaggMan
R. Rajesh Jeba Anbiah wrote:
HaggMan wrote:
I'm creating a page that:
- accepts user input in whatever language
- saves that input to a file
- reads the file and displays the original input

The following code successfully writes the user input to a file (when I
open the file, it's in the correct font), but I can't get PHP to read
the file and display the correct characters.

<snip>

Save your (processing) PHP script in UTF-8.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/


Jun 6 '06 #6
HaggMan wrote:
<snip>
So I guess a followup question would be: How do I parse through UTF8
stuff with regexpressions? I'll do some research tomorrow.


You may use mb_ereg and any other mb string functions
<http://in2.php.net/mbstring>

And, may also use hexadecimal representation with PCRE functions such
as preg_match() <11**********************@o13g2000cwo.googlegroups .com>
( http://groups.google.com/group/comp....4b602f9b5a78b? ),
but it will be more cumbersome.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Jun 6 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by lawrence | last post: by
38 posts views Thread by Haines Brown | last post: by
32 posts views Thread by Wolfgang Draxinger | last post: by
1 post views Thread by stevelooking41 | last post: by
1 post views Thread by David Bertoni | last post: by
6 posts views Thread by archana | last post: by
7 posts views Thread by Jimmy Shaw | last post: by
23 posts views Thread by Allan Ebdrup | last post: by
35 posts views Thread by Bjoern Hoehrmann | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.