By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,028 Members | 1,825 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,028 IT Pros & Developers. It's quick & easy.

How to upload form data containing special characters correctly?

P: n/a
Hello,

I was wondering if there are a few good pages and/or examples on how to
process form data correctly for putting it in a MySQL DB.

Since I'm not used to using PHP a lot, I already found out that
addslashes() can be used escape some characters, but I'm having some
more problems with for instance , and (since the text is scientifical)
Now some people also throw in htmlspecialchars() to convert those to
HTML entities, but some nest htmlspecialchars() in addslashes() and
others do the opposite.

Is there a good and error proof way of ensuring that what one puts in a
textarea gets stored and can be retrieved safe and sound?

Thanks in advance,

Wimmy

--
Being owned by someone used to be called slavery.
Now it's called commitment.
Sep 4 '06 #1
Share this Question
Share on Google+
25 Replies


P: n/a
try that:

- $input_string = 'some text with special characters';
- $input_string = base64_encode($input_string);
- write to database,
- read from database,
- $output_string = base64_decode($output_string);

Hope It will help.

Sep 4 '06 #2

P: n/a
"Wim Cossement" <wc******@nospam.bcol.bewrote in message
news:ed**********@snic.vub.ac.be...
Hello,

I was wondering if there are a few good pages and/or examples on how to
process form data correctly for putting it in a MySQL DB.

Since I'm not used to using PHP a lot, I already found out that
addslashes() can be used escape some characters, but I'm having some more
problems with for instance , and (since the text is scientifical)
Now some people also throw in htmlspecialchars() to convert those to HTML
entities, but some nest htmlspecialchars() in addslashes() and others do
the opposite.

Is there a good and error proof way of ensuring that what one puts in a
textarea gets stored and can be retrieved safe and sound?

Use Unicode for everything. Set utf-8 encoding to your database, save the
pages in utf-8, tell the browsers in every possibly imaginable way that you
are providing the content as utf-8. Not exactly easy process, but I
recommend you to try that.

--
"Ohjelmoija on organismi joka muuttaa kofeiinia koodiksi" - lpk
http://outolempi.net/ahdistus/ - Satunnaisesti pivittyv nettisarjis
sp**@outolempi.net || Gedoon-S @ IRCnet || rot13(xv***@bhgbyrzcv.arg)
Sep 4 '06 #3

P: n/a
Wim Cossement wrote:
Hello,

I was wondering if there are a few good pages and/or examples on how to
process form data correctly for putting it in a MySQL DB.

Since I'm not used to using PHP a lot, I already found out that
addslashes() can be used escape some characters, but I'm having some
more problems with for instance , and (since the text is scientifical)
Now some people also throw in htmlspecialchars() to convert those to
HTML entities, but some nest htmlspecialchars() in addslashes() and
others do the opposite.

Is there a good and error proof way of ensuring that what one puts in a
textarea gets stored and can be retrieved safe and sound?

Thanks in advance,

Wimmy
You'll need to select the correct character set for MySQL. It might be
utf-8, as some have suggested, but you might find another charaset more
applicable. See the MySQL doc and comp.databases.mysql newsgroup for
more info on mysql topics.

Also, rather than use addslashes() you should use
mysql_real_escape_string() to escape your characters.

You shouldn't use htmlspecialchars() for storing data into the database;
that's a display issue, not a storage issue. You should only use it
when displaying data (if necessary).

And also ensure you're using the correct character set on your html page
to display the data.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Sep 4 '06 #4

P: n/a
Jerry Stuckle wrote:
>
You'll need to select the correct character set for MySQL. It might be
utf-8, as some have suggested, but you might find another charaset more
applicable. See the MySQL doc and comp.databases.mysql newsgroup for
more info on mysql topics.
Well, I've been hearing for a while UTF-8 is the best for all that
stuff, so tables and DB's are all in utf8_general_ci (does anyone know
the difference between that and utf8_bin, and what's utf8_unicode_ci
doing in that list)
Also, rather than use addslashes() you should use
mysql_real_escape_string() to escape your characters.
Some like the other better, there are still discussions going on... :-)
http://www.sitepoint.com/forums/showthread.php?t=337881
You shouldn't use htmlspecialchars() for storing data into the database;
that's a display issue, not a storage issue. You should only use it
when displaying data (if necessary).
The fact is that the data does not realy need to be displayed in a
webpage, this is just for uploading. I'll rather use OpenOffice with
MyODBC to edit the data when needed and use a report to display it.
And also ensure you're using the correct character set on your html page
to display the data.
I guess this is the case.
The header contains <meta http-equiv="content-type"
content="application/xhtml+xml; charset=utf-8" />

Now I'm going to try this and I'll let you know the outcome.

Thanks a bunch,

Wimmy
Sep 4 '06 #5

P: n/a
On Mon, 04 Sep 2006 11:24:04 +0200, Wim Cossement <wc******@nospam.bcol.bewrote:
>Hello,

I was wondering if there are a few good pages and/or examples on how to
process form data correctly for putting it in a MySQL DB.

Since I'm not used to using PHP a lot, I already found out that
addslashes() can be used escape some characters, but I'm having some
more problems with for instance , and (since the text is scientifical)
Now some people also throw in htmlspecialchars() to convert those to
HTML entities, but some nest htmlspecialchars() in addslashes() and
others do the opposite.

Is there a good and error proof way of ensuring that what one puts in a
textarea gets stored and can be retrieved safe and sound?

Thanks in advance,

Wimmy


i found user comments in the php manual under htmlspecialchar
think these might help

also if you need to save special characters I sugget turning off magic quotes and that supresses
the backslashes normally adds with set_magic_quote_runtime(0);

After inspecting the non-native encoding problem, I noticed that for example, if the encoding is
cyrillic, and I write Latin characters that are not part of the encoding ( for example -
ae-ligature), the browser will send the real entity, such as &aelig; for this case.
Therefore, the only way I see to display multilingual text that is encoded with entities is by:
<?php
echo str_replace('&amp;', '&', htmlspecialchars($txt));
?>
The regex for numeric entities will skip the Latin-1 textual entities.



A sample function, if anybody want to turn html entities (and special characters) back to simple.
(eg: "&egrave;", "<" etc)
function html2specialchars($str){
$trans_table = array_flip(get_html_translation_table(HTML_ENTITIE S));
return strtr($str, $trans_table);
}


Quite often, on HTML pages that are not encoded as UTF-8, and people write in not native encoding,
some browser (for sure IExplorer) will send the different charset characters using HTML Entities,
such as б for small russian 'b'.
htmlspecialchars() will convert this character to the entity, since it changes all & to &amp;
What I usually do, is either turn &amp; back to & so the correct characters will appear in the
output, or I use some regex to replace all entities of characters back to their original entity:
<?php
// treat this as pseudo-code, it hasn't been tested...
$result = preg_replace('/&amp;#(x[a-f0-9]+|[0-9]+);/i', '&#$1;', $source);
?>

Why '? The HTML and XML DTDs proposed &apos; for this.
See http://www.w3.org/TR/html/dtds.html#...ial_characters
So better use this:
$text = htmlspecialchars($text, ENT_QUOTES);
$text = preg_replace('/&#0*39;/', '&apos;', $text);

Sep 4 '06 #6

P: n/a
Wim Cossement wrote:
Jerry Stuckle wrote:
>>
You'll need to select the correct character set for MySQL. It might
be utf-8, as some have suggested, but you might find another charaset
more applicable. See the MySQL doc and comp.databases.mysql newsgroup
for more info on mysql topics.


Well, I've been hearing for a while UTF-8 is the best for all that
stuff, so tables and DB's are all in utf8_general_ci (does anyone know
the difference between that and utf8_bin, and what's utf8_unicode_ci
doing in that list)
That some peoples opinions. And remember, they are opinions. Some
people know what they're talking about, and some don't. Take anything
you get on the internet (including this) with a grain of salt.

Personally, I use the characterset which matches my data. This may or
may not be utf-8.
>Also, rather than use addslashes() you should use
mysql_real_escape_string() to escape your characters.


Some like the other better, there are still discussions going on... :-)
http://www.sitepoint.com/forums/showthread.php?t=337881
Not much discussion. addslashes() is a PHP construct which escapes
certain characters. mysql_real_escape_string() is a mysql function to
escape the characters necessary to place the data in a mysql database
using the current charset.

mysql_real_escape_string needs no special processing when reading the
data out - the data is exactly as it was before mysql_real_escape_string
was called. That is not the case for addslashes().
>You shouldn't use htmlspecialchars() for storing data into the
database; that's a display issue, not a storage issue. You should
only use it when displaying data (if necessary).


The fact is that the data does not realy need to be displayed in a
webpage, this is just for uploading. I'll rather use OpenOffice with
MyODBC to edit the data when needed and use a report to display it.
That's fine. So don't use htmlspecialchars() at all then.
>And also ensure you're using the correct character set on your html
page to display the data.


I guess this is the case.
The header contains <meta http-equiv="content-type"
content="application/xhtml+xml; charset=utf-8" />

Now I'm going to try this and I'll let you know the outcome.

Thanks a bunch,

Wimmy

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Sep 4 '06 #7

P: n/a
Hi again,

I must say I've tried all the suggested options but I still can't do a
proper upload.

There is one textarea where users must put in text about their subject
(more or less 2 formatted pages in a PFD/DOC document), so most (not to
say all) of them cut 'n' paste it from Acrobat/Word/OpenOffce into their
browser.

Most of them contain double quotes that are not escaped by addslashes or
htmlspecialchars , I've copied a few myself: "bla" "bla" "bla"

If I add an entry by hand in phpMyAdmin for instance and one field
contains these characters they are stored and displayed OK.
When I store the resulting page and look at it in vi those quoted bla's
are displayed as ~@~\bla~@~]

How do I get rid of those, since Thunderbird wants to convert the
message to UTF-8?

Is there a way to limit or convert the encoding used in a textarea?
Or is this more HTML related?

Regards,

Wimmy
Sep 5 '06 #8

P: n/a
Wim Cossement wrote:
Hi again,

I must say I've tried all the suggested options but I still can't do a
proper upload.

There is one textarea where users must put in text about their subject
(more or less 2 formatted pages in a PFD/DOC document), so most (not to
say all) of them cut 'n' paste it from Acrobat/Word/OpenOffce into their
browser.

Most of them contain double quotes that are not escaped by addslashes or
htmlspecialchars , I've copied a few myself: "bla" "bla" "bla"

If I add an entry by hand in phpMyAdmin for instance and one field
contains these characters they are stored and displayed OK.
When I store the resulting page and look at it in vi those quoted bla's
are displayed as ~@~\bla~@~]

How do I get rid of those, since Thunderbird wants to convert the
message to UTF-8?

Is there a way to limit or convert the encoding used in a textarea?
Or is this more HTML related?

Regards,

Wimmy
Well, what Thunderbird does is completely client side and has nothing to
do with PHP. What charset do you have defined for the page?

And if they care cutting and pasting from a Word document or a PDF,
chances are the document itself has the special characters. For
instance, Word can use different characters for left and right double
quotes, depending on the version and releases.

Nothing in PHP or MySQL would handle such characters; you'll have to
handle them yourself, i.e. with str_replace().

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Sep 5 '06 #9

P: n/a
Jerry Stuckle wrote:
Well, what Thunderbird does is completely client side and has nothing to
do with PHP. What charset do you have defined for the page?
The header contains <meta http-equiv="content-type"
content="application/xhtml+xml; charset=utf-8" />, so it should be UTF-8
And if they care cutting and pasting from a Word document or a PDF,
chances are the document itself has the special characters. For
instance, Word can use different characters for left and right double
quotes, depending on the version and releases.
Well, when I save the text with those weird things in a textfile with
UTF-8 encoding they are still there when I open it, so it must be a
character that exists in this character set.
But how do I determine which one it is specificly?

I've put an example here in case someone knows how to do it:
http://ultr23.vub.ac.be/~wcosseme/someFile.txt
Nothing in PHP or MySQL would handle such characters; you'll have to
handle them yourself, i.e. with str_replace().
Then I might be able to replace it, who knows...

Many cheers to the one that can do it!

Wimmy
Sep 5 '06 #10

P: n/a
Wim Cossement wrote:
Jerry Stuckle wrote:
>Well, what Thunderbird does is completely client side and has nothing
to do with PHP. What charset do you have defined for the page?


The header contains <meta http-equiv="content-type"
content="application/xhtml+xml; charset=utf-8" />, so it should be UTF-8
>And if they care cutting and pasting from a Word document or a PDF,
chances are the document itself has the special characters. For
instance, Word can use different characters for left and right double
quotes, depending on the version and releases.


Well, when I save the text with those weird things in a textfile with
UTF-8 encoding they are still there when I open it, so it must be a
character that exists in this character set.
But how do I determine which one it is specificly?

I've put an example here in case someone knows how to do it:
http://ultr23.vub.ac.be/~wcosseme/someFile.txt
>Nothing in PHP or MySQL would handle such characters; you'll have to
handle them yourself, i.e. with str_replace().


Then I might be able to replace it, who knows...

Many cheers to the one that can do it!

Wimmy
Hi, Wimmy,

That's going to be difficult. They're valid characters in utf-8, but
who knows that they mean in Word or a pdf. They could be bullets,
left/right double quotes or a number of other special characters.

I don't have a conversion table available - there probably is one
somewhere on the net (maybe someone else can give some hints). I did
try a couple of google searches and found some editors which accept word
documents, but that's all. I didn't spend a lot of time on it, though.

Otherwise, you might get them to email you the word doc they're using
and you can try to figure out what each character means and replace it.
It might take a few tries to get all the characters, but it shouldn't
be that hard.

Sorry I can't be of more help.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Sep 5 '06 #11

P: n/a
"Jerry Stuckle" <js*******@attglobal.netwrote in
news:9Z******************************@comcast.com. ..
Hi, Wimmy,
[...]
I don't have a conversion table available - there probably is one
Maybe this can help you
http://www.unicode.org/

--

Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
Sep 5 '06 #12

P: n/a
Petr Vileta wrote:
"Jerry Stuckle" <js*******@attglobal.netwrote in
news:9Z******************************@comcast.com. ..
>Hi, Wimmy,
[...]
>I don't have a conversion table available - there probably is one


Maybe this can help you
http://www.unicode.org/
Maybe I'm missing something, but I don't see anywhere on that site where
they indicate the special characters used by MS Word or PDF's.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Sep 6 '06 #13

P: n/a
mysqli_real_escape_string() or mysql_real_escape_string should take out
all the characters that would affect MYSQL
Wim Cossement wrote:
Hello,

I was wondering if there are a few good pages and/or examples on how to
process form data correctly for putting it in a MySQL DB.

Since I'm not used to using PHP a lot, I already found out that
addslashes() can be used escape some characters, but I'm having some
more problems with for instance , and (since the text is scientifical)
Now some people also throw in htmlspecialchars() to convert those to
HTML entities, but some nest htmlspecialchars() in addslashes() and
others do the opposite.

Is there a good and error proof way of ensuring that what one puts in a
textarea gets stored and can be retrieved safe and sound?

Thanks in advance,

Wimmy

--
Being owned by someone used to be called slavery.
Now it's called commitment.
Sep 6 '06 #14

P: n/a
"Jerry Stuckle" <js*******@attglobal.netwrote in
news:Yd******************************@comcast.com. ..
Petr Vileta wrote:
>"Jerry Stuckle" <js*******@attglobal.netwrote in
news:9Z******************************@comcast.com ...
>>Hi, Wimmy,
[...]
>>I don't have a conversion table available - there probably is one


Maybe this can help you
http://www.unicode.org/

Maybe I'm missing something, but I don't see anywhere on that site where
they indicate the special characters used by MS Word or PDF's.
If I remember right you wrote in some previous message this

<cite>
And if they care cutting and pasting from a Word document or a PDF,
chances are the document itself has the special characters. For
instance, Word can use different characters for left and right double
quotes, depending on the version and releases.
</cite>

As far as I know all browsers (except Linx) convert characters from current
system codepage to current web page (defined by <metatag). If you define
your web page as UTF-8 all user's cut&paste must be converted by browser.
UTF-8 have defined all characters like windows-1250, windows-1252, koi8-r,
kanji and other "exotic" codepages.

--

Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
Sep 6 '06 #15

P: n/a
Petr Vileta wrote:
"Jerry Stuckle" <js*******@attglobal.netwrote in
news:Yd******************************@comcast.com. ..
>Petr Vileta wrote:
>>"Jerry Stuckle" <js*******@attglobal.netwrote in
news:9Z******************************@comcast.co m...

Hi, Wimmy,

[...]

I don't have a conversion table available - there probably is one

Maybe this can help you
http://www.unicode.org/

Maybe I'm missing something, but I don't see anywhere on that site
where they indicate the special characters used by MS Word or PDF's.
If I remember right you wrote in some previous message this

<cite>
And if they care cutting and pasting from a Word document or a PDF,
chances are the document itself has the special characters. For
instance, Word can use different characters for left and right double
quotes, depending on the version and releases.
</cite>

As far as I know all browsers (except Linx) convert characters from
current system codepage to current web page (defined by <metatag). If
you define your web page as UTF-8 all user's cut&paste must be converted
by browser. UTF-8 have defined all characters like windows-1250,
windows-1252, koi8-r, kanji and other "exotic" codepages.
Yes, the browsers convert the characters. But what does the character
"" mean in a Word document or a pdf? Is it a left or right quote? A
bullet? Something else?

That's what he needs to know, not the utf-8 codes.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Sep 6 '06 #16

P: n/a
"Jerry Stuckle" <js*******@attglobal.netwrote in
news:ob******************************@comcast.com. ..
Yes, the browsers convert the characters. But what does the character ""
mean in a Word document or a pdf? Is it a left or right quote? A bullet?
Something else?

That's what he needs to know, not the utf-8 codes.
If you see "" in Word or Acrobat (reader) then this is a character. If you
see quote or bullet then this is a quote or bullet. Value of character code
is not important at this moment. IMHO user must cut text in Word or Acrobat
and paste in browser and at this moment all clipboard content is converted
to UTF-8 or UTF-16 as you have defined in html (php) page.
I have many pages where users paste text into <texareafrom Word, Acrobat,
Corel and more applications and I have no problem with converting characters
because my pages are defined as unicode (UTF-16) and my database too. For my
pages is irrelevant if user is Czech, English, German, Japanese, Russian or
Martian :-)
My recommendations is: if you have problem with charsets, use unicode
(UTF-16) at all.
--

Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
Sep 6 '06 #17

P: n/a
Petr Vileta wrote:
"Jerry Stuckle" <js*******@attglobal.netwrote in
news:ob******************************@comcast.com. ..
>Yes, the browsers convert the characters. But what does the character
"" mean in a Word document or a pdf? Is it a left or right quote? A
bullet? Something else?

That's what he needs to know, not the utf-8 codes.
If you see "" in Word or Acrobat (reader) then this is a character. If
you see quote or bullet then this is a quote or bullet. Value of
character code is not important at this moment. IMHO user must cut text
in Word or Acrobat and paste in browser and at this moment all clipboard
content is converted to UTF-8 or UTF-16 as you have defined in html
(php) page.
I have many pages where users paste text into <texareafrom Word,
Acrobat, Corel and more applications and I have no problem with
converting characters because my pages are defined as unicode (UTF-16)
and my database too. For my pages is irrelevant if user is Czech,
English, German, Japanese, Russian or Martian :-)
My recommendations is: if you have problem with charsets, use unicode
(UTF-16) at all.
The problem is it is not an "" in Word or a pdf. It could be the
internal code for a left or right double quote, a bullet, or whatever.
The browser cannot convert these characters - it has no idea what an ""
really is. All it knows is the UTF-8 or whatever code is.

Of course Word knows what it means, and if you post it into another Word
document you will get the right information. But paste it into any
non-Word application and you get "".

Therein lies the problem.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Sep 6 '06 #18

P: n/a

Jerry Stuckle wrote:
Petr Vileta wrote:
"Jerry Stuckle" <js*******@attglobal.netwrote in
news:ob******************************@comcast.com. ..
Yes, the browsers convert the characters. But what does the character
"" mean in a Word document or a pdf? Is it a left or right quote? A
bullet? Something else?

That's what he needs to know, not the utf-8 codes.
If you see "" in Word or Acrobat (reader) then this is a character. If
you see quote or bullet then this is a quote or bullet. Value of
character code is not important at this moment. IMHO user must cut text
in Word or Acrobat and paste in browser and at this moment all clipboard
content is converted to UTF-8 or UTF-16 as you have defined in html
(php) page.
I have many pages where users paste text into <texareafrom Word,
Acrobat, Corel and more applications and I have no problem with
converting characters because my pages are defined as unicode (UTF-16)
and my database too. For my pages is irrelevant if user is Czech,
English, German, Japanese, Russian or Martian :-)
My recommendations is: if you have problem with charsets, use unicode
(UTF-16) at all.

The problem is it is not an "" in Word or a pdf. It could be the
internal code for a left or right double quote, a bullet, or whatever.
The browser cannot convert these characters - it has no idea what an ""
really is. All it knows is the UTF-8 or whatever code is.
Actually, in my experience *and* in this context, this is not quite
true. When you select and copy a piece of text from
Acrobat/Word/Whatever, then the text iteslf gets copied to the
clipboard, not it's internal repersentation from the original
application (if that would be the case, you could never paste it as
plain text in the first place). As Petr said:
If you see "" in Word or Acrobat (reader) then this is a character. If
you see quote or bullet then this is a quote or bullet. Value of
character code is not important at this moment.
You can try it yourself by setting the encoding of your plain text
editor to UTF-8, then copying some bullets and other special characters
from Word and pasting them in your editor. Provided you have an
appropriate font for the used UTF-8 glyphs, you should see all of it
properly (including bullets and such).

Which brings out the issue you have with vi displaying garbage. Perhaps
your console just don't use a font that has the needed glyphs - I had
problems myself with vi (elvis, actually) and UTF-8/cp1250/cp1252
texts.

Anyway, this still doesn't solve your main problem - uploading these
characters to the database. Firstly, you should not rely on the META
tag alone to do the work - you should send an appropriate header. Put
something like this in your script before any of your output - header
("Content-type: text/html; charset: utf-8").

You should do this because if you don't, then your http server sends
this header for you, and that header may contain different charset
information. If the charset information is present in the header,
browsers will disregard the charset set in the meta tag of the document
(as indicated in html specifications). Also, if you use utf16, you
might want to send BOM character before everything else (this one I
haven't tried personally, but it's recomended in the html
specification).

Also, you should use the 'accept-charset' attribute on the form tag.
This aditionally specifies what charset your script expects from the
form, and most browsers will do their best to indulge it.

In my experience, you shouldn't rely on only one of these - it's best
that you use all three ways to specify the encoding (header, meta tag
and accept-charset attribute).

I hope this helps,

Vladislav

Sep 6 '06 #19

P: n/a
Hi boys and girls,

I've finally succeded in getting those uncommon UTF-8 chars in my DB.

The solution is in the end always seems rather obvious, but I'd like to
say thanks for pushing me in the good direction.

The thing that did the trick was putting
AddType 'text/html; charset=UTF-8' html
in .htaccess

Sending a PHP header with "Content-type: text/html; charset: utf-8"
resulted in an error message that headers had already been sent.

Now I'm still having a problem with a simple mail function that just
does not want to go, but, that's for later.

See you,

Wimmy

Wim Cossement wrote:
Hello,

I was wondering if there are a few good pages and/or examples on how to
process form data correctly for putting it in a MySQL DB.

Since I'm not used to using PHP a lot, I already found out that
addslashes() can be used escape some characters, but I'm having some
more problems with for instance , and (since the text is scientifical)
Now some people also throw in htmlspecialchars() to convert those to
HTML entities, but some nest htmlspecialchars() in addslashes() and
others do the opposite.

Is there a good and error proof way of ensuring that what one puts in a
textarea gets stored and can be retrieved safe and sound?

Thanks in advance,

Wimmy

--
Being owned by someone used to be called slavery.
Now it's called commitment.
Sep 7 '06 #20

P: n/a

Wim Cossement wrote:
Hi boys and girls,

I've finally succeded in getting those uncommon UTF-8 chars in my DB.

The solution is in the end always seems rather obvious, but I'd like to
say thanks for pushing me in the good direction.

The thing that did the trick was putting
AddType 'text/html; charset=UTF-8' html
in .htaccess

Sending a PHP header with "Content-type: text/html; charset: utf-8"
resulted in an error message that headers had already been sent.

Now I'm still having a problem with a simple mail function that just
does not want to go, but, that's for later.

See you,

Wimmy
I'm glad to hear you solved your problem. But just in case you need it
anytime later, you received the error when you tried to send a header
through PHP because there was some output before the header. You should
always send headers first, before anything else goes out.

regards,
Vladislav

Sep 7 '06 #21

P: n/a
malatestapunk wrote:
>
I'm glad to hear you solved your problem. But just in case you need it
anytime later, you received the error when you tried to send a header
through PHP because there was some output before the header. You should
always send headers first, before anything else goes out.

regards,
Vladislav
Hi Vladislav,

It was the 1st line of the PHP code, before preparing and executing the
SQL statement.

Before that there was just some common HTML, containing the <headand
beginning of the <body>.

From my limited knowledge of HTTP I thought that the headers were
always the first data being sent by the webserver, and that this is done
automatically.

And I just found out that adding <? header("Content-type: text/html;
charset=utf-8"); ?would do the same as specifying the header in the
Apache config or .htaccess file, if the HTTP headers from the website
would be turned of, right?

Regards,

Wimmy
Sep 7 '06 #22

P: n/a
"Wim Cossement" <wc******@nospam.bcol.bewrote in
news:ed**********@snic.vub.ac.be...
Hi boys and girls,

I've finally succeded in getting those uncommon UTF-8 chars in my DB.
[...]
Now I'm still having a problem with a simple mail function that just does
not want to go, but, that's for later.
Maybe this can help you
http://cz.php.net/manual/en/function...cyr-string.php
This page is about Cyrilic to another conversion, but in discusion below are
some tips.
--

Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
Sep 7 '06 #23

P: n/a

Wim Cossement wrote:
malatestapunk wrote:

I'm glad to hear you solved your problem. But just in case you need it
anytime later, you received the error when you tried to send a header
through PHP because there was some output before the header. You should
always send headers first, before anything else goes out.

regards,
Vladislav

Hi Vladislav,
Hello Wim,

>
It was the 1st line of the PHP code, before preparing and executing the
SQL statement.

Before that there was just some common HTML, containing the <headand
beginning of the <body>.

So there you go :)

Seriously, plain HTML is considered as output as well. Headers should
be sent before anything else leaves the server.

>
From my limited knowledge of HTTP I thought that the headers were
always the first data being sent by the webserver, and that this is done
automatically.

That's true. However, you can set headers from your server side script
as well. This can come in handy: for instance, when you can't (or
won't, or aren't allowed to) alter your server configuration.

And I just found out that adding <? header("Content-type: text/html;
charset=utf-8"); ?would do the same as specifying the header in the
Apache config or .htaccess file, if the HTTP headers from the website
would be turned of, right?

Exactly. Since you solved your problem through server configuration,
you don't have to use it at all - I'm not insisting ;). I was only
pointing at what was the problem you encountered with headers
generally.

The same rule applies to cookies and sessions - you must initialize and
set them before you output anything.

>
Regards,

Wimmy
best wishes,
Vladislav

Sep 7 '06 #24

P: n/a
Seriously, plain HTML is considered as output as well. Headers should
be sent before anything else leaves the server.
Well, they never told us that in school :-)
Exactly. Since you solved your problem through server configuration,
you don't have to use it at all - I'm not insisting ;). I was only
pointing at what was the problem you encountered with headers
generally.

The same rule applies to cookies and sessions - you must initialize and
set them before you output anything.
Hmzz, I hope I never have to create anything that involves that, because
I'm quite a lousy programmer ;-)

Wim
Sep 8 '06 #25

P: n/a
Petr Vileta wrote:
"Wim Cossement" <wc******@nospam.bcol.bewrote in
news:ed**********@snic.vub.ac.be...
>Hi boys and girls,

I've finally succeded in getting those uncommon UTF-8 chars in my DB.
[...]
>Now I'm still having a problem with a simple mail function that just
does not want to go, but, that's for later.
Maybe this can help you
http://cz.php.net/manual/en/function...cyr-string.php
This page is about Cyrilic to another conversion, but in discusion below
are some tips.
Okidoki, I'll take a looksee.

Wim
Sep 8 '06 #26

This discussion thread is closed

Replies have been disabled for this discussion.