473,883 Members | 1,750 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to upload form data containing special characters correctly?

Hello,

I was wondering if there are a few good pages and/or examples on how to
process form data correctly for putting it in a MySQL DB.

Since I'm not used to using PHP a lot, I already found out that
addslashes() can be used escape some characters, but I'm having some
more problems with for instance š, Ś and Ķ (since the text is scientifical)
Now some people also throw in htmlspecialchar s() to convert those to
HTML entities, but some nest htmlspecialchar s() in addslashes() and
others do the opposite.

Is there a good and error proof way of ensuring that what one puts in a
textarea gets stored and can be retrieved safe and sound?

Thanks in advance,

Wimmy

--
Being owned by someone used to be called slavery.
Now it's called commitment.
Sep 4 '06
25 5423
Wim Cossement wrote:
Jerry Stuckle wrote:
>Well, what Thunderbird does is completely client side and has nothing
to do with PHP. What charset do you have defined for the page?


The header contains <meta http-equiv="content-type"
content="applic ation/xhtml+xml; charset=utf-8" />, so it should be UTF-8
>And if they care cutting and pasting from a Word document or a PDF,
chances are the document itself has the special characters. For
instance, Word can use different characters for left and right double
quotes, depending on the version and releases.


Well, when I save the text with those weird things in a textfile with
UTF-8 encoding they are still there when I open it, so it must be a
character that exists in this character set.
But how do I determine which one it is specificly?

I've put an example here in case someone knows how to do it:
http://ultr23.vub.ac.be/~wcosseme/someFile.txt
>Nothing in PHP or MySQL would handle such characters; you'll have to
handle them yourself, i.e. with str_replace().


Then I might be able to replace it, who knows...

Many cheers to the one that can do it!

Wimmy
Hi, Wimmy,

That's going to be difficult. They're valid characters in utf-8, but
who knows that they mean in Word or a pdf. They could be bullets,
left/right double quotes or a number of other special characters.

I don't have a conversion table available - there probably is one
somewhere on the net (maybe someone else can give some hints). I did
try a couple of google searches and found some editors which accept word
documents, but that's all. I didn't spend a lot of time on it, though.

Otherwise, you might get them to email you the word doc they're using
and you can try to figure out what each character means and replace it.
It might take a few tries to get all the characters, but it shouldn't
be that hard.

Sorry I can't be of more help.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===
Sep 5 '06 #11
"Jerry Stuckle" <js*******@attg lobal.netwrote in
news:9Z******** *************** *******@comcast .com...
Hi, Wimmy,
[...]
I don't have a conversion table available - there probably is one
Maybe this can help you
http://www.unicode.org/

--

Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
Sep 5 '06 #12
Petr Vileta wrote:
"Jerry Stuckle" <js*******@attg lobal.netwrote in
news:9Z******** *************** *******@comcast .com...
>Hi, Wimmy,
[...]
>I don't have a conversion table available - there probably is one


Maybe this can help you
http://www.unicode.org/
Maybe I'm missing something, but I don't see anywhere on that site where
they indicate the special characters used by MS Word or PDF's.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===
Sep 6 '06 #13
mysqli_real_esc ape_string() or mysql_real_esca pe_string should take out
all the characters that would affect MYSQL
Wim Cossement wrote:
Hello,

I was wondering if there are a few good pages and/or examples on how to
process form data correctly for putting it in a MySQL DB.

Since I'm not used to using PHP a lot, I already found out that
addslashes() can be used escape some characters, but I'm having some
more problems with for instance š, Ś and Ķ (since the text is scientifical)
Now some people also throw in htmlspecialchar s() to convert those to
HTML entities, but some nest htmlspecialchar s() in addslashes() and
others do the opposite.

Is there a good and error proof way of ensuring that what one puts in a
textarea gets stored and can be retrieved safe and sound?

Thanks in advance,

Wimmy

--
Being owned by someone used to be called slavery.
Now it's called commitment.
Sep 6 '06 #14
"Jerry Stuckle" <js*******@attg lobal.netwrote in
news:Yd******** *************** *******@comcast .com...
Petr Vileta wrote:
>"Jerry Stuckle" <js*******@attg lobal.netwrote in
news:9Z******* *************** ********@comcas t.com...
>>Hi, Wimmy,
[...]
>>I don't have a conversion table available - there probably is one


Maybe this can help you
http://www.unicode.org/

Maybe I'm missing something, but I don't see anywhere on that site where
they indicate the special characters used by MS Word or PDF's.
If I remember right you wrote in some previous message this

<cite>
And if they care cutting and pasting from a Word document or a PDF,
chances are the document itself has the special characters. For
instance, Word can use different characters for left and right double
quotes, depending on the version and releases.
</cite>

As far as I know all browsers (except Linx) convert characters from current
system codepage to current web page (defined by <metatag). If you define
your web page as UTF-8 all user's cut&paste must be converted by browser.
UTF-8 have defined all characters like windows-1250, windows-1252, koi8-r,
kanji and other "exotic" codepages.

--

Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
Sep 6 '06 #15
Petr Vileta wrote:
"Jerry Stuckle" <js*******@attg lobal.netwrote in
news:Yd******** *************** *******@comcast .com...
>Petr Vileta wrote:
>>"Jerry Stuckle" <js*******@attg lobal.netwrote in
news:9Z****** *************** *********@comca st.com...

Hi, Wimmy,

[...]

I don't have a conversion table available - there probably is one

Maybe this can help you
http://www.unicode.org/

Maybe I'm missing something, but I don't see anywhere on that site
where they indicate the special characters used by MS Word or PDF's.
If I remember right you wrote in some previous message this

<cite>
And if they care cutting and pasting from a Word document or a PDF,
chances are the document itself has the special characters. For
instance, Word can use different characters for left and right double
quotes, depending on the version and releases.
</cite>

As far as I know all browsers (except Linx) convert characters from
current system codepage to current web page (defined by <metatag). If
you define your web page as UTF-8 all user's cut&paste must be converted
by browser. UTF-8 have defined all characters like windows-1250,
windows-1252, koi8-r, kanji and other "exotic" codepages.
Yes, the browsers convert the characters. But what does the character
"‚" mean in a Word document or a pdf? Is it a left or right quote? A
bullet? Something else?

That's what he needs to know, not the utf-8 codes.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===
Sep 6 '06 #16
"Jerry Stuckle" <js*******@attg lobal.netwrote in
news:ob******** *************** *******@comcast .com...
Yes, the browsers convert the characters. But what does the character "‚"
mean in a Word document or a pdf? Is it a left or right quote? A bullet?
Something else?

That's what he needs to know, not the utf-8 codes.
If you see "‚" in Word or Acrobat (reader) then this is a character. If you
see quote or bullet then this is a quote or bullet. Value of character code
is not important at this moment. IMHO user must cut text in Word or Acrobat
and paste in browser and at this moment all clipboard content is converted
to UTF-8 or UTF-16 as you have defined in html (php) page.
I have many pages where users paste text into <texareafrom Word, Acrobat,
Corel and more applications and I have no problem with converting characters
because my pages are defined as unicode (UTF-16) and my database too. For my
pages is irrelevant if user is Czech, English, German, Japanese, Russian or
Martian :-)
My recommendations is: if you have problem with charsets, use unicode
(UTF-16) at all.
--

Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
Sep 6 '06 #17
Petr Vileta wrote:
"Jerry Stuckle" <js*******@attg lobal.netwrote in
news:ob******** *************** *******@comcast .com...
>Yes, the browsers convert the characters. But what does the character
"‚" mean in a Word document or a pdf? Is it a left or right quote? A
bullet? Something else?

That's what he needs to know, not the utf-8 codes.
If you see "‚" in Word or Acrobat (reader) then this is a character. If
you see quote or bullet then this is a quote or bullet. Value of
character code is not important at this moment. IMHO user must cut text
in Word or Acrobat and paste in browser and at this moment all clipboard
content is converted to UTF-8 or UTF-16 as you have defined in html
(php) page.
I have many pages where users paste text into <texareafrom Word,
Acrobat, Corel and more applications and I have no problem with
converting characters because my pages are defined as unicode (UTF-16)
and my database too. For my pages is irrelevant if user is Czech,
English, German, Japanese, Russian or Martian :-)
My recommendations is: if you have problem with charsets, use unicode
(UTF-16) at all.
The problem is it is not an "‚" in Word or a pdf. It could be the
internal code for a left or right double quote, a bullet, or whatever.
The browser cannot convert these characters - it has no idea what an "‚"
really is. All it knows is the UTF-8 or whatever code is.

Of course Word knows what it means, and if you post it into another Word
document you will get the right information. But paste it into any
non-Word application and you get "‚".

Therein lies the problem.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===
Sep 6 '06 #18

Jerry Stuckle wrote:
Petr Vileta wrote:
"Jerry Stuckle" <js*******@attg lobal.netwrote in
news:ob******** *************** *******@comcast .com...
Yes, the browsers convert the characters. But what does the character
"‚" mean in a Word document or a pdf? Is it a left or right quote? A
bullet? Something else?

That's what he needs to know, not the utf-8 codes.
If you see "‚" in Word or Acrobat (reader) then this is a character. If
you see quote or bullet then this is a quote or bullet. Value of
character code is not important at this moment. IMHO user must cut text
in Word or Acrobat and paste in browser and at this moment all clipboard
content is converted to UTF-8 or UTF-16 as you have defined in html
(php) page.
I have many pages where users paste text into <texareafrom Word,
Acrobat, Corel and more applications and I have no problem with
converting characters because my pages are defined as unicode (UTF-16)
and my database too. For my pages is irrelevant if user is Czech,
English, German, Japanese, Russian or Martian :-)
My recommendations is: if you have problem with charsets, use unicode
(UTF-16) at all.

The problem is it is not an "‚" in Word or a pdf. It could be the
internal code for a left or right double quote, a bullet, or whatever.
The browser cannot convert these characters - it has no idea what an "‚"
really is. All it knows is the UTF-8 or whatever code is.
Actually, in my experience *and* in this context, this is not quite
true. When you select and copy a piece of text from
Acrobat/Word/Whatever, then the text iteslf gets copied to the
clipboard, not it's internal repersentation from the original
application (if that would be the case, you could never paste it as
plain text in the first place). As Petr said:
If you see "‚" in Word or Acrobat (reader) then this is a character. If
you see quote or bullet then this is a quote or bullet. Value of
character code is not important at this moment.
You can try it yourself by setting the encoding of your plain text
editor to UTF-8, then copying some bullets and other special characters
from Word and pasting them in your editor. Provided you have an
appropriate font for the used UTF-8 glyphs, you should see all of it
properly (including bullets and such).

Which brings out the issue you have with vi displaying garbage. Perhaps
your console just don't use a font that has the needed glyphs - I had
problems myself with vi (elvis, actually) and UTF-8/cp1250/cp1252
texts.

Anyway, this still doesn't solve your main problem - uploading these
characters to the database. Firstly, you should not rely on the META
tag alone to do the work - you should send an appropriate header. Put
something like this in your script before any of your output - header
("Content-type: text/html; charset: utf-8").

You should do this because if you don't, then your http server sends
this header for you, and that header may contain different charset
information. If the charset information is present in the header,
browsers will disregard the charset set in the meta tag of the document
(as indicated in html specifications) . Also, if you use utf16, you
might want to send BOM character before everything else (this one I
haven't tried personally, but it's recomended in the html
specification).

Also, you should use the 'accept-charset' attribute on the form tag.
This aditionally specifies what charset your script expects from the
form, and most browsers will do their best to indulge it.

In my experience, you shouldn't rely on only one of these - it's best
that you use all three ways to specify the encoding (header, meta tag
and accept-charset attribute).

I hope this helps,

Vladislav

Sep 6 '06 #19
Hi boys and girls,

I've finally succeded in getting those uncommon UTF-8 chars in my DB.

The solution is in the end always seems rather obvious, but I'd like to
say thanks for pushing me in the good direction.

The thing that did the trick was putting
AddType 'text/html; charset=UTF-8' html
in .htaccess

Sending a PHP header with "Content-type: text/html; charset: utf-8"
resulted in an error message that headers had already been sent.

Now I'm still having a problem with a simple mail function that just
does not want to go, but, that's for later.

See you,

Wimmy

Wim Cossement wrote:
Hello,

I was wondering if there are a few good pages and/or examples on how to
process form data correctly for putting it in a MySQL DB.

Since I'm not used to using PHP a lot, I already found out that
addslashes() can be used escape some characters, but I'm having some
more problems with for instance š, Ś and Ķ (since the text is scientifical)
Now some people also throw in htmlspecialchar s() to convert those to
HTML entities, but some nest htmlspecialchar s() in addslashes() and
others do the opposite.

Is there a good and error proof way of ensuring that what one puts in a
textarea gets stored and can be retrieved safe and sound?

Thanks in advance,

Wimmy

--
Being owned by someone used to be called slavery.
Now it's called commitment.
Sep 7 '06 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
11778
by: dave | last post by:
Hello there, I am at my wit's end ! I have used the following script succesfully to upload an image to my web space. But what I really want to be able to do is to update an existing record in a table in MySQL with the path & filename to the image. I have successfully uploaded and performed an update query on the database, but the problem I have is I cannot retain the primary key field in a variable which is then used in a SQL update...
21
3939
by: Stefan Richter | last post by:
Hi, after coding for days on stupid form validations - Like: strings (min / max length), numbers(min / max value), money(min / max value), postcodes(min / max value), telefon numbers, email adresses and so on. I thought it might be a better way to programm an automated, dynamic form validation that works for all kinds of fields, shows the necessary error messages and highlights the coresponding form fields.
0
2387
by: Paul Hamlington | last post by:
Hello, I've been programming in ASP for a little while now and quite an advanced user, but I have come across an unusual problem in which I need assistance. I have built my own image upload, I have two versions of the binary to string conversion one fast, one slow because some servers use chillisoft and therefore the append function in not accessible for a disconnected recordset.
14
5439
by: StumpY | last post by:
HI, I have set up a page with a form which appends data to a .csv file on my asp server, this is to enable a limited number of users to add news threads to this file, which contains the data in the following format (date,headline,article); "181003","news title","news article" "171003","older news title","news article" The form adds the data in reverse chronological order with newest at the top of the list. Now I wish to be able to write...
2
6068
by: Tom Wells | last post by:
I have a little file upload page that I have been able to use to successfully upload files to the C: drive of LocalHost (my machine). I need to be able to upload to a network drive from the intranet server. On the line: dirs = Directory.GetDirectories(currentDir) I get "Access to the path "\\les-net\les\Special Projects\ATSPDF" is denied." How do I get the GetDirectories command to user my user ID and password when it tries to hit the...
0
1427
by: hoenes1 | last post by:
Hi all, I have a standard html form containing several textboxes. Since this is a german application, the boxes are likely to contain special characters like š, Ų, Ŗ, etc. The form is passed to the asp.net application via action="FormHandler.aspx" method="post". Evaluating the values in Request.Form.GetValues(key) yields the content of the textboxes omitting the special characters, for example typing "RŁdiger" (7 chars) in the form...
9
3853
by: Wayne Smith | last post by:
I've come up against a major headache that I can't seem to find a solution for but I'm sure there must be a workaround and I would really be grateful of any help. I'm currently building a web site for a small club I belong to and one of the features I would like to include is the ability to allow users to upload image files. unfortunately the servers web root www folder only allows READ and EXECUTE permissions, which makes it...
21
34476
KevinADC
by: KevinADC | last post by:
Note: You may skip to the end of the article if all you want is the perl code. Introduction Uploading files from a local computer to a remote web server has many useful purposes, the most obvious of which is the sharing of files. For example, you upload images to a server to share them with other people over the Internet. Perl comes ready equipped for uploading files via the CGI.pm module, which has long been a core module and allows users...
18
34896
jhardman
by: jhardman | last post by:
Have you ever wanted to upload files through a form and thought, "I'd really like to use ASP, it surely has that capability, but the tutorial I used to learn ASP didn't mention how to do this."? Have you looked around trying to find simple solutions but didn't want to wade through pages of complex code? Have you balked at paying for premade solutions that are probably overkill for your particular project? I'd like to walk you through the...
0
9944
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, weíll explore What is ONU, What Is Router, ONU & Routerís main usage, and What is the difference between ONU and Router. Letís take a closer look ! Part I. Meaning of...
0
9797
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10762
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10863
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10422
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9586
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectóplanning, coding, testing, and deploymentówithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
4622
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4228
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3241
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.