472,334 Members | 1,921 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,334 software developers and data experts.

Getting html entities into the database

Hi guys, I'm having an issue and I'm not sure what is going on here. Any help is appreciated.

I wan't to accept user input of the various html entities in their raw form (ie ♥) and store this in the database to output elsewhere. Here's what I'm doing in my code:

Expand|Select|Wrap|Line Numbers
  1. $caption = htmlentities(mysql_real_escape_string($_POST['t']),ENT_QUOTES,'UTF-8');
  2. mysql_query("UPDATE Table SET caption='$caption' WHERE UserID = '$userID'");
Let's say the post value is: Some text goes here that isn't being transferred correctly. I ♥ PHP

Here's what is getting written to the database:

Some text goes here that isn\'t being transferred correctly. I

(sorry I had to break up the ' entity there to get it to show up here -- it is getting into the DB properly, though)

Notice the ♥ is not getting written, along with anything after it... why is this? What am I missing to get this in there?
May 6 '10 #1
9 3897
5,058 Expert 4TB

There doesn't seem to be anything wrong with that. I tried it over here and it inserted the text as expected. After issuing the two lines you pasted I had this in my database:
Some text goes here that isn't being transferred correctly. I ♥ PHP

What do you get if you print the query before you execute it? Like:
Expand|Select|Wrap|Line Numbers
  1. $sql = "UPDATE Table SET caption='$caption' WHERE UserID = '$userID'";
  2. echo "<pre>$sql</pre>";
  3. mysql_query($sql);

One other thing, while I'm at it xD
Unless you have a good reason not to, it's generally best to insert the data into the database in a neutral format; meaning that you should not encode special chars as HTML chars before inserting them into the database. You should only use functions like htmlentities when the data is on the way out, just before you print it into a HTML page.
Expand|Select|Wrap|Line Numbers
  1. <?php
  2. // On the way in...
  3. $caption = mysql_real_escape_string($_POST['p']);
  4. mysql_query("UPDATE `table` SET `caption` = '{$caption}' WHERE stuff='other stuff'");
  6. // On the way out...
  7. $result = mysql_query("SELECT `caption` FROM `table`");
  8. while($row = mysql_fetch_assoc($result)) {
  9.     echo htmlentities($row['caption'], ENT_QUOTES, 'UTF-8');
  10. }
  11. ?>
This allows the data to be used for multiple purposes, without having to be decoded by those applications who don't simply plan on printing it as HTML. Like:
Expand|Select|Wrap|Line Numbers
  1. <?php
  2. // Count the number of chars total used for captions...
  3. $result = mysql_query("SELECT `caption` FROM `table`");
  4. $total = 0;
  5. while($row = mysql_fetch_assoc($result)) {
  6.     $total += strlen($row['caption']);
  7. }
  8. ?>
If you encode the text before inserting the data, the $total there would be larger than the text displayed.
May 7 '10 #2
The problem is that a user can enter for example ≠ from their keyboard, but can also type it in in the HTML form &ne;. I wan't to be able to accept both inputs, and put them in how they are entered. The html_entity_decode() function works fine for formatting them when they are pulled out of the db, but I can't seem to get them IN.

The script is taking the form post from an AJAX script (an edit in place thing).

When I enter the same thing right into a variable, it gets into the database just fine, ie:

$caption = "Some text here &hearts;";

gets put in exactly.

However, the same thing when entered from the form puts "Some text here" in the database. Weird. Even weirder is that the form keeps the &hearts; and draws the visueal figure. If I re-submit the form, THEN the &hearts; is written into the database (with my current code):
May 7 '10 #3
I'm attempting to insert special chars in a database table that is charset utf8.

I cam getting the titles from
$markup = file_get_contents('http://sfbay.craigslist.org/ads/');

Let's say for example this is the title I get from that page and I want to store it in the database...
_(―`•★•΄―)_BLONDE_(―`•★•΄―)_w4m - w4m

When i perform the insert only the first two characters get inserted, "_(" and then nothing else is inserted into the title colum. I don't understand why.

I've tried converting the string into utf8, and transliterate the results, but that only results in php dropping the characters that can't be transliterated.
$subject = iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $subject);

Any help would be appreciated
Jul 5 '10 #4
196 100+
Not sure if this is applicable but my understanding of mysql_real_escape_string() function is that it removes illegal characters from a string to be inserted into a sql database, this is done to stop sql injections.

Anyway the i think that some of you characters a getting stripped by that function. I noticed you said it stoped at "-(" when the whole line is _(―`•★•΄―)_BLONDE_(―`•★•΄―)_w4m - w4m , this is becuase the ' character is illegal.
Jul 6 '10 #5
So do you have any suggestions on how I would go about inserting the illegal characters in my database table? or some way of encoding them in a way that they can be inserted?
Jul 6 '10 #6
196 100+
Apperently this function-
Expand|Select|Wrap|Line Numbers
  1. htmlspecialchars();
converts some special characters so you maybe able to use it to achieve what you want.

Other then converting them i know of no other way to achieve what you are looking for.
Jul 6 '10 #7
If file_get_contents($url); returns this line.
•●••°__Relaxing Asian Style Massage ......Yota - w4m -
and I use htmlenttities() I'm still unable to insert that line into the database. The resulting string is
•●••°__Relaxing Asian Style Massage ......Yota - w4m
The first character is illegal and shows up as a � gremlin when i view the string in utf8 encoding. When i switch to iso-8859-1 encoding the html character displays fine. The database table is using charset utf8. I tried switching the database table to latin1 but still , the illegal character will not insert into the database. The database insert doesn't fail when it hits one of these special html characters, but it doesn't finish the insert, no errors are reported.

I need some kind of transliteration library or something. Any other thoughts?
Jul 6 '10 #8
these are the strings returned from the file_get_contents()

>>>GORGEOUS***&***SEXY<<< - w4m - (Outcalls) pic

Talented Asian Male - (excelsior / outer mission)

♛----EAST INDIAN BARBIE----♛ w4m - w4m - (fremont / union city / newark) pic

.•*¨¨*•-:¦:-•*NeW* GoRgEoUS *MiXED *FUN•-:¦:-•*¨¨*•. - pic

•°!!•° ItALiAn •°!!•° HoTtIe •°!!•° bIg BoOtY •°!!•° eXoTiC - w4m - (palo alto) pic

%%%% YOUNG ACTRACTIVE LADY EXCELLENT MASSAGE %%%%%%%% - (san jose west)

100hh!!!*miXeD maMi*150H!!! - w4m - (san rafael) pic

^^korean ^^ zulia ^^ - w4m - (sunnyvale) pic

~~~~Give yourself a well deserved break and enjoy a relaxing massage. - w4m - (san rafael) pic

•☆•—————♥ LOOKING FOR THE BEST•☆•—————♥ - w4m - (concord / pleasant hill / martinez) pic

Lovely Asian Masseuses Here For You! - w4m - (san rafael) pic

But then here is what is actually stored in the database. (see dbshot.png) Notice how the fulltext column is blank on the entries that start with special html characters.
Attached Images
File Type: jpg dbshot.jpg (19.7 KB, 279 views)
Jul 6 '10 #9
This conversion function appears to work with all special character encodings
Expand|Select|Wrap|Line Numbers
  1.     function convert_charset($item)
  2.     {
  3.         if ($unserialize = unserialize($item))
  4.         {
  5.             foreach ($unserialize as $key => $value)
  6.             {
  7.                 $unserialize[$key] = @iconv('windows-1256', 'UTF-8', $value);
  8.             }
  9.             $serialize = serialize($unserialize);
  10.             return $serialize;
  11.         }
  12.         else
  13.         {
  14.             return @iconv('windows-1256', 'UTF-8', $item);
  15.         }
  16.     } 
Jul 11 '10 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

by: micha | last post by:
my php script gets delivered text that contains special chars (like german umlauts), and these chars may, may partially or may not be coverted into...
by: Robert Oschler | last post by:
Is there a module/function to remove all the HTML entities from an HTML document (e.g. - &nbsp, &amp, &apos, etc.)? If not I'll just write one...
by: Geoff Wilkins | last post by:
I must confess I only come here when I have a problem - so my apologies if this has been raised before: Using my IE v.6 browser, document.write...
by: Beat Richli | last post by:
Hello i have following problem with ASP (using Interdev, Win2003 Server): if a special character is entered in a textbox, ASP or the Client...
by: David W. Fenton | last post by:
Well, today I needed to process some data for upload to a web page and it needed higher ASCII characters encoded as HTML entities. So, I wrote a...
by: Joergen Bech | last post by:
Is there a function in the .Net 1.1 framework that will take, say, a string containing Scandinavian characters and output the corret HTML entities,...
by: chernyshevsky | last post by:
How do I force IE to encode characters outside of the current code-page as HTML entities? Right now, when I enter some Cyrillic text into a...
by: Steven D'Aprano | last post by:
I have a string containing Latin-1 characters: s = u"© and many more..." I want to convert it to HTML entities: result => "&copy; and many...
by: clintonG | last post by:
Can anybody make sense of this crazy and inconsistent results? // IE7 Feed Reading View disabled displays this raw XML <?xml version="1.0"...
by: ty | last post by:
I have a script that takes a xml template then adds data into it using SimpleXMLElement. I then save it using asXML into a mysql database. On my...
by: better678 | last post by:
Question: Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct? Answer: Java is an object-oriented...
by: teenabhardwaj | last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
by: CD Tom | last post by:
This happens in runtime 2013 and 2016. When a report is run and then closed a toolbar shows up and the only way to get it to go away is to right...
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.