473,394 Members | 1,752 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Getting html entities into the database

Hi guys, I'm having an issue and I'm not sure what is going on here. Any help is appreciated.

I wan't to accept user input of the various html entities in their raw form (ie ♥) and store this in the database to output elsewhere. Here's what I'm doing in my code:

Expand|Select|Wrap|Line Numbers
  1. $caption = htmlentities(mysql_real_escape_string($_POST['t']),ENT_QUOTES,'UTF-8');
  2. mysql_query("UPDATE Table SET caption='$caption' WHERE UserID = '$userID'");
  3.  
Let's say the post value is: Some text goes here that isn't being transferred correctly. I ♥ PHP

Here's what is getting written to the database:

Some text goes here that isn\'t being transferred correctly. I

(sorry I had to break up the ' entity there to get it to show up here -- it is getting into the DB properly, though)

Notice the ♥ is not getting written, along with anything after it... why is this? What am I missing to get this in there?
May 6 '10 #1
9 3986
Atli
5,058 Expert 4TB
Hey.

There doesn't seem to be anything wrong with that. I tried it over here and it inserted the text as expected. After issuing the two lines you pasted I had this in my database:
Some text goes here that isn't being transferred correctly. I ♥ PHP

What do you get if you print the query before you execute it? Like:
Expand|Select|Wrap|Line Numbers
  1. $sql = "UPDATE Table SET caption='$caption' WHERE UserID = '$userID'";
  2. echo "<pre>$sql</pre>";
  3. mysql_query($sql);

One other thing, while I'm at it xD
Unless you have a good reason not to, it's generally best to insert the data into the database in a neutral format; meaning that you should not encode special chars as HTML chars before inserting them into the database. You should only use functions like htmlentities when the data is on the way out, just before you print it into a HTML page.
Expand|Select|Wrap|Line Numbers
  1. <?php
  2. // On the way in...
  3. $caption = mysql_real_escape_string($_POST['p']);
  4. mysql_query("UPDATE `table` SET `caption` = '{$caption}' WHERE stuff='other stuff'");
  5.  
  6. // On the way out...
  7. $result = mysql_query("SELECT `caption` FROM `table`");
  8. while($row = mysql_fetch_assoc($result)) {
  9.     echo htmlentities($row['caption'], ENT_QUOTES, 'UTF-8');
  10. }
  11. ?>
This allows the data to be used for multiple purposes, without having to be decoded by those applications who don't simply plan on printing it as HTML. Like:
Expand|Select|Wrap|Line Numbers
  1. <?php
  2. // Count the number of chars total used for captions...
  3. $result = mysql_query("SELECT `caption` FROM `table`");
  4. $total = 0;
  5. while($row = mysql_fetch_assoc($result)) {
  6.     $total += strlen($row['caption']);
  7. }
  8. ?>
If you encode the text before inserting the data, the $total there would be larger than the text displayed.
May 7 '10 #2
The problem is that a user can enter for example ≠ from their keyboard, but can also type it in in the HTML form &ne;. I wan't to be able to accept both inputs, and put them in how they are entered. The html_entity_decode() function works fine for formatting them when they are pulled out of the db, but I can't seem to get them IN.

The script is taking the form post from an AJAX script (an edit in place thing).

When I enter the same thing right into a variable, it gets into the database just fine, ie:

$caption = "Some text here &hearts;";

gets put in exactly.

However, the same thing when entered from the form puts "Some text here" in the database. Weird. Even weirder is that the form keeps the &hearts; and draws the visueal figure. If I re-submit the form, THEN the &hearts; is written into the database (with my current code):
May 7 '10 #3
I'm attempting to insert special chars in a database table that is charset utf8.

I cam getting the titles from
$markup = file_get_contents('http://sfbay.craigslist.org/ads/');

Let's say for example this is the title I get from that page and I want to store it in the database...
_(―`•★•΄―)_BLONDE_(―`•★•΄―)_w4m - w4m

When i perform the insert only the first two characters get inserted, "_(" and then nothing else is inserted into the title colum. I don't understand why.

I've tried converting the string into utf8, and transliterate the results, but that only results in php dropping the characters that can't be transliterated.
$subject = iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $subject);

Any help would be appreciated
Jul 5 '10 #4
chazzy69
196 100+
Not sure if this is applicable but my understanding of mysql_real_escape_string() function is that it removes illegal characters from a string to be inserted into a sql database, this is done to stop sql injections.

Anyway the i think that some of you characters a getting stripped by that function. I noticed you said it stoped at "-(" when the whole line is _(―`•★•΄―)_BLONDE_(―`•★•΄―)_w4m - w4m , this is becuase the ' character is illegal.
Jul 6 '10 #5
@chazzy69
So do you have any suggestions on how I would go about inserting the illegal characters in my database table? or some way of encoding them in a way that they can be inserted?
Jul 6 '10 #6
chazzy69
196 100+
Apperently this function-
Expand|Select|Wrap|Line Numbers
  1. htmlspecialchars();
  2.  
converts some special characters so you maybe able to use it to achieve what you want.

Other then converting them i know of no other way to achieve what you are looking for.
Jul 6 '10 #7
@chazzy69
If file_get_contents($url); returns this line.
•●••°__Relaxing Asian Style Massage ......Yota - w4m -
and I use htmlenttities() I'm still unable to insert that line into the database. The resulting string is
•●••°__Relaxing Asian Style Massage ......Yota - w4m
The first character is illegal and shows up as a � gremlin when i view the string in utf8 encoding. When i switch to iso-8859-1 encoding the html character displays fine. The database table is using charset utf8. I tried switching the database table to latin1 but still , the illegal character will not insert into the database. The database insert doesn't fail when it hits one of these special html characters, but it doesn't finish the insert, no errors are reported.

I need some kind of transliteration library or something. Any other thoughts?
Jul 6 '10 #8
these are the strings returned from the file_get_contents()

>>>GORGEOUS***&***SEXY<<< - w4m - (Outcalls) pic

Talented Asian Male - (excelsior / outer mission)

♛----EAST INDIAN BARBIE----♛ w4m - w4m - (fremont / union city / newark) pic

.•*¨¨*•-:¦:-•*NeW* GoRgEoUS *MiXED *FUN•-:¦:-•*¨¨*•. - pic

•°!!•° ItALiAn •°!!•° HoTtIe •°!!•° bIg BoOtY •°!!•° eXoTiC - w4m - (palo alto) pic

%%%% YOUNG ACTRACTIVE LADY EXCELLENT MASSAGE %%%%%%%% - (san jose west)

100hh!!!*miXeD maMi*150H!!! - w4m - (san rafael) pic

^^korean ^^ zulia ^^ - w4m - (sunnyvale) pic

~~~~Give yourself a well deserved break and enjoy a relaxing massage. - w4m - (san rafael) pic

•☆•—————♥ LOOKING FOR THE BEST•☆•—————♥ - w4m - (concord / pleasant hill / martinez) pic

Lovely Asian Masseuses Here For You! - w4m - (san rafael) pic

But then here is what is actually stored in the database. (see dbshot.png) Notice how the fulltext column is blank on the entries that start with special html characters.
Attached Images
File Type: jpg dbshot.jpg (19.7 KB, 289 views)
Jul 6 '10 #9
This conversion function appears to work with all special character encodings
Expand|Select|Wrap|Line Numbers
  1.     function convert_charset($item)
  2.     {
  3.         if ($unserialize = unserialize($item))
  4.         {
  5.             foreach ($unserialize as $key => $value)
  6.             {
  7.                 $unserialize[$key] = @iconv('windows-1256', 'UTF-8', $value);
  8.             }
  9.             $serialize = serialize($unserialize);
  10.             return $serialize;
  11.         }
  12.         else
  13.         {
  14.             return @iconv('windows-1256', 'UTF-8', $item);
  15.         }
  16.     } 
  17.  
Jul 11 '10 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: micha | last post by:
my php script gets delivered text that contains special chars (like german umlauts), and these chars may, may partially or may not be coverted into html entities already. i don't know beforhand. ...
7
by: Robert Oschler | last post by:
Is there a module/function to remove all the HTML entities from an HTML document (e.g. - &nbsp, &amp, &apos, etc.)? If not I'll just write one myself but I figured I'd save myself some time. ...
4
by: Geoff Wilkins | last post by:
I must confess I only come here when I have a problem - so my apologies if this has been raised before: Using my IE v.6 browser, document.write doesn't convert HTML entities (e.g. &apos;, &amp;) to...
2
by: Beat Richli | last post by:
Hello i have following problem with ASP (using Interdev, Win2003 Server): if a special character is entered in a textbox, ASP or the Client Browser (IE 6) seems to convert this character in HTML...
0
by: David W. Fenton | last post by:
Well, today I needed to process some data for upload to a web page and it needed higher ASCII characters encoded as HTML entities. So, I wrote a function to do the job, which works with a table...
2
by: Joergen Bech | last post by:
Is there a function in the .Net 1.1 framework that will take, say, a string containing Scandinavian characters and output the corret HTML entities, such as &aelig; &oslash; &aring; etc.
6
by: chernyshevsky | last post by:
How do I force IE to encode characters outside of the current code-page as HTML entities? Right now, when I enter some Cyrillic text into a ISO-8859-1 form, the text submitted ends up being CP1251....
8
by: Steven D'Aprano | last post by:
I have a string containing Latin-1 characters: s = u"© and many more..." I want to convert it to HTML entities: result => "&copy; and many more..." Decimal/hex escapes would be...
6
by: clintonG | last post by:
Can anybody make sense of this crazy and inconsistent results? // IE7 Feed Reading View disabled displays this raw XML <?xml version="1.0" encoding="utf-8" ?> <!-- AT&T HTML entities & XML...
0
by: ty | last post by:
I have a script that takes a xml template then adds data into it using SimpleXMLElement. I then save it using asXML into a mysql database. On my development machine at home it works fine. ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.