By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,377 Members | 1,728 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,377 IT Pros & Developers. It's quick & easy.

Getting html entities into the database

P: 2
Hi guys, I'm having an issue and I'm not sure what is going on here. Any help is appreciated.

I wan't to accept user input of the various html entities in their raw form (ie ♥) and store this in the database to output elsewhere. Here's what I'm doing in my code:

Expand|Select|Wrap|Line Numbers
  1. $caption = htmlentities(mysql_real_escape_string($_POST['t']),ENT_QUOTES,'UTF-8');
  2. mysql_query("UPDATE Table SET caption='$caption' WHERE UserID = '$userID'");
  3.  
Let's say the post value is: Some text goes here that isn't being transferred correctly. I ♥ PHP

Here's what is getting written to the database:

Some text goes here that isn\'t being transferred correctly. I

(sorry I had to break up the ' entity there to get it to show up here -- it is getting into the DB properly, though)

Notice the ♥ is not getting written, along with anything after it... why is this? What am I missing to get this in there?
May 6 '10 #1
Share this Question
Share on Google+
9 Replies


Atli
Expert 5K+
P: 5,058
Hey.

There doesn't seem to be anything wrong with that. I tried it over here and it inserted the text as expected. After issuing the two lines you pasted I had this in my database:
Some text goes here that isn't being transferred correctly. I ♥ PHP

What do you get if you print the query before you execute it? Like:
Expand|Select|Wrap|Line Numbers
  1. $sql = "UPDATE Table SET caption='$caption' WHERE UserID = '$userID'";
  2. echo "<pre>$sql</pre>";
  3. mysql_query($sql);

One other thing, while I'm at it xD
Unless you have a good reason not to, it's generally best to insert the data into the database in a neutral format; meaning that you should not encode special chars as HTML chars before inserting them into the database. You should only use functions like htmlentities when the data is on the way out, just before you print it into a HTML page.
Expand|Select|Wrap|Line Numbers
  1. <?php
  2. // On the way in...
  3. $caption = mysql_real_escape_string($_POST['p']);
  4. mysql_query("UPDATE `table` SET `caption` = '{$caption}' WHERE stuff='other stuff'");
  5.  
  6. // On the way out...
  7. $result = mysql_query("SELECT `caption` FROM `table`");
  8. while($row = mysql_fetch_assoc($result)) {
  9.     echo htmlentities($row['caption'], ENT_QUOTES, 'UTF-8');
  10. }
  11. ?>
This allows the data to be used for multiple purposes, without having to be decoded by those applications who don't simply plan on printing it as HTML. Like:
Expand|Select|Wrap|Line Numbers
  1. <?php
  2. // Count the number of chars total used for captions...
  3. $result = mysql_query("SELECT `caption` FROM `table`");
  4. $total = 0;
  5. while($row = mysql_fetch_assoc($result)) {
  6.     $total += strlen($row['caption']);
  7. }
  8. ?>
If you encode the text before inserting the data, the $total there would be larger than the text displayed.
May 7 '10 #2

P: 2
The problem is that a user can enter for example ≠ from their keyboard, but can also type it in in the HTML form &ne;. I wan't to be able to accept both inputs, and put them in how they are entered. The html_entity_decode() function works fine for formatting them when they are pulled out of the db, but I can't seem to get them IN.

The script is taking the form post from an AJAX script (an edit in place thing).

When I enter the same thing right into a variable, it gets into the database just fine, ie:

$caption = "Some text here &hearts;";

gets put in exactly.

However, the same thing when entered from the form puts "Some text here" in the database. Weird. Even weirder is that the form keeps the &hearts; and draws the visueal figure. If I re-submit the form, THEN the &hearts; is written into the database (with my current code):
May 7 '10 #3

P: 5
I'm attempting to insert special chars in a database table that is charset utf8.

I cam getting the titles from
$markup = file_get_contents('http://sfbay.craigslist.org/ads/');

Let's say for example this is the title I get from that page and I want to store it in the database...
_(―`•★•΄―)_BLONDE_(―`•★•΄―)_w4m - w4m

When i perform the insert only the first two characters get inserted, "_(" and then nothing else is inserted into the title colum. I don't understand why.

I've tried converting the string into utf8, and transliterate the results, but that only results in php dropping the characters that can't be transliterated.
$subject = iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $subject);

Any help would be appreciated
Jul 5 '10 #4

100+
P: 196
Not sure if this is applicable but my understanding of mysql_real_escape_string() function is that it removes illegal characters from a string to be inserted into a sql database, this is done to stop sql injections.

Anyway the i think that some of you characters a getting stripped by that function. I noticed you said it stoped at "-(" when the whole line is _(―`•★•΄―)_BLONDE_(―`•★•΄―)_w4m - w4m , this is becuase the ' character is illegal.
Jul 6 '10 #5

P: 5
@chazzy69
So do you have any suggestions on how I would go about inserting the illegal characters in my database table? or some way of encoding them in a way that they can be inserted?
Jul 6 '10 #6

100+
P: 196
Apperently this function-
Expand|Select|Wrap|Line Numbers
  1. htmlspecialchars();
  2.  
converts some special characters so you maybe able to use it to achieve what you want.

Other then converting them i know of no other way to achieve what you are looking for.
Jul 6 '10 #7

P: 5
@chazzy69
If file_get_contents($url); returns this line.
•●••°__Relaxing Asian Style Massage ......Yota - w4m -
and I use htmlenttities() I'm still unable to insert that line into the database. The resulting string is
•●••°__Relaxing Asian Style Massage ......Yota - w4m
The first character is illegal and shows up as a � gremlin when i view the string in utf8 encoding. When i switch to iso-8859-1 encoding the html character displays fine. The database table is using charset utf8. I tried switching the database table to latin1 but still , the illegal character will not insert into the database. The database insert doesn't fail when it hits one of these special html characters, but it doesn't finish the insert, no errors are reported.

I need some kind of transliteration library or something. Any other thoughts?
Jul 6 '10 #8

P: 5
these are the strings returned from the file_get_contents()

>>>GORGEOUS***&***SEXY<<< - w4m - (Outcalls) pic

Talented Asian Male - (excelsior / outer mission)

♛----EAST INDIAN BARBIE----♛ w4m - w4m - (fremont / union city / newark) pic

.•*¨¨*•-:¦:-•*NeW* GoRgEoUS *MiXED *FUN•-:¦:-•*¨¨*•. - pic

•°!!•° ItALiAn •°!!•° HoTtIe •°!!•° bIg BoOtY •°!!•° eXoTiC - w4m - (palo alto) pic

%%%% YOUNG ACTRACTIVE LADY EXCELLENT MASSAGE %%%%%%%% - (san jose west)

100hh!!!*miXeD maMi*150H!!! - w4m - (san rafael) pic

^^korean ^^ zulia ^^ - w4m - (sunnyvale) pic

~~~~Give yourself a well deserved break and enjoy a relaxing massage. - w4m - (san rafael) pic

•☆•—————♥ LOOKING FOR THE BEST•☆•—————♥ - w4m - (concord / pleasant hill / martinez) pic

Lovely Asian Masseuses Here For You! - w4m - (san rafael) pic

But then here is what is actually stored in the database. (see dbshot.png) Notice how the fulltext column is blank on the entries that start with special html characters.
Attached Images
File Type: jpg dbshot.jpg (19.7 KB, 266 views)
Jul 6 '10 #9

P: 5
This conversion function appears to work with all special character encodings
Expand|Select|Wrap|Line Numbers
  1.     function convert_charset($item)
  2.     {
  3.         if ($unserialize = unserialize($item))
  4.         {
  5.             foreach ($unserialize as $key => $value)
  6.             {
  7.                 $unserialize[$key] = @iconv('windows-1256', 'UTF-8', $value);
  8.             }
  9.             $serialize = serialize($unserialize);
  10.             return $serialize;
  11.         }
  12.         else
  13.         {
  14.             return @iconv('windows-1256', 'UTF-8', $item);
  15.         }
  16.     } 
  17.  
Jul 11 '10 #10

Post your reply

Sign in to post your reply or Sign up for a free account.