473,395 Members | 1,341 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Having trouble converting few characters using htmlentities function

hi

We are using the normal html controls (textarea) in the posting form.
The form page has the utf-8 character set.

Users are copying the text from MS Word or Openoffice doc etc.

Our PHP code is handling the conversion of RTF text characters and utf
characters into HTML entities (e.g. & is being converted to & by
the inbuilt php function 'htmlentities')

However many common characters/symbols are not being converted
properly. When I say common, even the ones like '-' (hyphen) are not
being converted by htmlentities. It gets converted to junk characters
like ?? or &((&^.

Is there any fix for this problem?

regards,

Mahesh
Aug 14 '08 #1
4 2550
On Aug 14, 11:24*am, BG Mahesh <mah...@mahesh.comwrote:
The form page has the utf-8 character set.
However many common characters/symbols are not being converted
properly. When I say common, even the ones like '-' (hyphen) are not
being converted by htmlentities. It gets converted to junk characters
like ?? or &((&^.
Hyphen does not need to be converted by htmlentities. It can be in
HTML just like any letter or number.

However, the source of your problem is probably that you are reading
UTF8 data but not outputting it as UTF8. The hyphen may be some
special hyphen which occupies two bytes. If you print this in ASCII or
Latin-1 or anything other than UTF8, something else than a hyphen will
show.
Aug 14 '08 #2
BG Mahesh wrote:
hi

We are using the normal html controls (textarea) in the posting form.
The form page has the utf-8 character set.

Users are copying the text from MS Word or Openoffice doc etc.

Our PHP code is handling the conversion of RTF text characters and utf
characters into HTML entities (e.g. & is being converted to &amp; by
the inbuilt php function 'htmlentities')

However many common characters/symbols are not being converted
properly. When I say common, even the ones like '-' (hyphen) are not
being converted by htmlentities. It gets converted to junk characters
like ?? or &((&^.

Is there any fix for this problem?

regards,

Mahesh
MS Word has a habit of converting a pair of hyphens to a dash (see
AutoCorrect options, tag AutoFormat) and chaning 3 full stops to an elipsis
(see AutoCorrect options, tag AutoCorrect).
It is these that are causing your problems due to the reason that Sjoerd
explains.
Aug 14 '08 #3
I V
On Thu, 14 Aug 2008 02:24:31 -0700, BG Mahesh wrote:
We are using the normal html controls (textarea) in the posting form.
The form page has the utf-8 character set.

Users are copying the text from MS Word or Openoffice doc etc.

Our PHP code is handling the conversion of RTF text characters and utf
characters into HTML entities (e.g. & is being converted to &amp; by the
inbuilt php function 'htmlentities')

However many common characters/symbols are not being converted properly.
When I say common, even the ones like '-' (hyphen) are not being
converted by htmlentities. It gets converted to junk characters like ??
or &((&^.
htmlentities assumes a ISO-8859-1 character set by default; so, it will
mis-interpret the UTF-8 characters supplied by your users. You could
specify the character set explicitly with

htmlentities($some_utf8_string, ENT_COMPAT, 'UTF-8')

or you could use htmlspecialchars, which only converts ampersands and
quote marks, and should pass your UTF-8 characters through unchanged.
Aug 14 '08 #4
On Aug 14, 9:30*pm, I V <ivle...@gmail.comwrote:
On Thu, 14 Aug 2008 02:24:31 -0700, BG Mahesh wrote:
We are using the normal html controls (textarea) in the posting form.
The form page has the utf-8 character set.
Users are copying the text from MS Word or Openoffice doc etc.
Our PHP code is handling the conversion of RTF text characters and utf
characters into HTML entities (e.g. & is being converted to &amp; by the
inbuilt php function 'htmlentities')
However many common characters/symbols are not being converted properly..
When I say common, even the ones like '-' (hyphen) are not being
converted by htmlentities. It gets converted to junk characters like ??
or &((&^.

htmlentities assumes a ISO-8859-1 character set by default; so, it will
mis-interpret the UTF-8 characters supplied by your users. You could
specify the character set explicitly with

htmlentities($some_utf8_string, ENT_COMPAT, 'UTF-8')

or you could use htmlspecialchars, which only converts ampersands and
quote marks, and should pass your UTF-8 characters through unchanged.

Thank you everybody. It works now.

Aug 18 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Olaf Kliemt | last post by:
problem was single and double quotes. headline is a form input field type text. before writing to the DB i use : $headline = mysql_escape_string(stripslashes($headline)); displaying again in...
6
by: Scot Hacker | last post by:
I have a database that has been populated with content pasted out of MS Word, and is full of special characters -- em dashes, curly quotes, curly apostrophes, etc. Now I'm generating plain text...
14
by: Ian Rastall | last post by:
Sorry for the double question. I'm having a terrible time figuring out how to escape apostrophes in my mySQL database. Perhaps they have to be escaped in the PHP, using mysql_real_escape_string? ...
1
by: gene.ellis | last post by:
Put simply, I have a text box, and people commonly cut + paste information into this text box from Microsoft word. The problem is that word has all types of funky characters (smart quotes,...
6
by: Stefan Mueller | last post by:
I read data (e.g. äöüÄÖÜçéàè"') from my MySQL database which I'd like to show in an input box. <?php $mysql_data = "äöüÄÖÜçéàè\"'"; $html_data = addslashes(htmlentities($mysql_data,...
5
by: tkondal | last post by:
Hi all. I just started looking at Python's ctypes lib and I am having trouble using it for a function. For starters, here's my Python code: from ctypes import*; myStringDLL=...
5
by: Timothy Madden | last post by:
Hello Is there a function that will allow me to output text written in utf-8 (from db for example) if my document has Content-Type: text/html; charset=ISO-8859-1 I mean htmlspecialchars()...
2
Ajm113
by: Ajm113 | last post by:
Ok, I want to disable any html tags, but the problem is when I do add in the nl2br function with a htmlentities it displays the tags for the <br>! I even keep seeing rn every time I enter a return in...
5
matheussousuke
by: matheussousuke | last post by:
Hello, I'm using tiny MCE plugin on my oscommerce and it is inserting my website URL when I use insert image function in the emails. The goal is: Make it send the email with the URL...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.