Connecting Tech Pros Worldwide Help | Site Map

I can't get rid of wierd characters in my RSS feed

Member
 
Join Date: Jan 2008
Posts: 113
#1: Dec 12 '08
Hi,

I've created a utf8 encoded RSS feed which presents news data drawn from a database. I've set all aspects of my database to utf8 and also saved the text which i have put into the database as utf8 by pasting it into notepad and saving as utf8. So everything should be encoded in utf8 when the RSS feed is presented to the browser, however I am still getting the wierd question mark characters for pound signs :(

Here is my RSS feed code (coldfusion):

Expand|Select|Wrap|Line Numbers
  1. <cfsilent>
  2. <!--- Get News --->
  3. <cfinvoke component="com.news" method="getAll" dsn="#Request.App.dsn#" returnvariable="news" />
  4. </cfsilent>
  5. <!--- If we have news items --->
  6. <cfif news.RecordCount GT 0>
  7. <!--- Serve RSS content-type --->
  8. <cfcontent type="application/rss+xml">
  9. <!--- Output feed --->
  10. <cfcontent reset="true"><?xml version="1.0" encoding="utf-8"?>
  11. <cfoutput>
  12. <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  13.     <channel>
  14.         <title>News RSS Feed</title>
  15.         <link>#Application.siteRoot#</link>
  16.         <description>Welcome to the News RSS Feed</description>
  17.         <lastBuildDate>Wed, 19 Nov 2008 09:05:00 GMT</lastBuildDate>
  18.         <language>en-uk</language>
  19.         <atom:link href="#Application.siteRoot#news/rss/index.cfm" rel="self" type="application/rss+xml" />
  20.  
  21.         <cfloop query="news">
  22.         <!--- Make data xml compliant --->
  23.         <cfscript>
  24.            news.headline = replace(news.headline, "<", "&lt;", "ALL");
  25.            news.body = replace(news.body, "<", "&lt;", "ALL");
  26.            news.date = dateformat(news.date, "ddd, dd mmm yyyy");
  27.            news.time = timeformat(news.time, "HH:mm:ss") & " GMT"; 
  28.         </cfscript>        
  29.         <item>
  30.             <title>#news.headline#</title>
  31.             <link>#Application.siteRoot#news/index.cfm?id=#news.id#</link>
  32.             <guid>#Application.siteRoot#news/index.cfm?id=#news.id#</guid>
  33.             <pubDate>#news.date# #news.time#</pubDate>
  34.             <description>#news.body#</description>
  35.         </item>
  36.         </cfloop>
  37.     </channel>
  38. </rss>
  39. </cfoutput>
  40. <cfelse>
  41. <!--- If we have no news items, relocate to news page --->
  42. <cflocation url="../news/index.cfm" addtoken="no">
  43. </cfif> 
  44.  
Has anyone any suggestions? I've done loads of research but can't find the right answers :(

Thanks in advance,

Chromis
Dormilich's Avatar
Moderator
 
Join Date: Aug 2008
Location: Leipzig, Germany
Posts: 3,629
#2: Dec 12 '08

re: I can't get rid of wierd characters in my RSS feed


does it help if you use &#163;? (I know that may be only a workaraound, but from the code I can tell nothing without the actual feed)

is £ the only character outside the ascii charset? maybe your generator has some problems with utf-8 or doesn't know which charset to use....

regards

PS please don't post your questions in the insights section, ask a moderator to move it to the answers section.
Member
 
Join Date: Jan 2008
Posts: 113
#3: Dec 12 '08

re: I can't get rid of wierd characters in my RSS feed


Hi Dormilich thanks for your reply. My apologies, I am aware of the answers section but I accidentally put it in here, it's very easy to make the mistake sadly (preffered the old layout!).

Yes the only bad character is the pound sign, I've tryed replacing it manually in the database presuming that that would replace the chracter with the utf8 equivalent but it didn't work. If i use the &pound; it breaks the feed. I could use cdata but I need to display paragraph formatting, and using cdata displays the p element tags.
Dormilich's Avatar
Moderator
 
Join Date: Aug 2008
Location: Leipzig, Germany
Posts: 3,629
#4: Dec 12 '08

re: I can't get rid of wierd characters in my RSS feed


&pound; breaks your feed because it's an undefined entity (you'd need a DTD to fix that). have you tried &#163;? this should not break the feed.

regards
Member
 
Join Date: Jan 2008
Posts: 113
#5: Dec 12 '08

re: I can't get rid of wierd characters in my RSS feed


Ok i've replaced all occurences of £ with £ it now works great thanks! Why would the pound sign not be recognised though, do you think that when i saved the file as utf8 it didn't convert the character properly?
Ideally i would like to create a function in coldfusion which will doctor text and make it utf8 compliant, do you know of the best way to do this?

I am most of the way there with the following function, apart from putting some code in to replace the pound signs what other ways could i improve it?

Expand|Select|Wrap|Line Numbers
  1. <cfcomponent>
  2.     <cffunction name="CustomParagraphFormatXMLSafe" access="public" returntype="string">
  3.         <cfargument name="paragraph" type="string" required="yes">
  4.  
  5.         <cfscript>
  6.         /**
  7.          * Returns a XHTML string suitable for insertion into a database in the UTF-8 encoding format.
  8.          * The string is then wrapped with opening and closing paragraph tags whilst ignoring list elements.
  9.          * 
  10.          * @param paragraph String you want XHTML / XML formatted. 
  11.          * @return Returns a string. 
  12.          * @author **** 
  13.          * @version 1.0, December 10th, 2008
  14.          */
  15.  
  16.         var returnValue = '';
  17.         var newParagraph = arguments.paragraph;
  18.         var sqlList = "-- ,'";
  19.         var replacementList = "#chr(38)##chr(35)##chr(52)##chr(53)##chr(59)##chr(38)##chr(35)##chr(52)##chr(53)##chr(59)# , #chr(38)##chr(35)##chr(51)##chr(57)##chr(59)##chr(163)#";
  20.  
  21.         /* Replace pound signs */
  22.         Replace(newParagraph,"£","&pound;");
  23.  
  24.         /* Make sql safe */
  25.         newParagraph = trim(replaceList( newParagraph , sqlList , replacementList ));    
  26.  
  27.         /* Make XML and UTF-8 Safe */
  28.         newParagraph = XMLFormat(CharsetEncode(CharsetDecode(newParagraph,"utf-8"),"utf-8"));
  29.  
  30.         /* Break into paragraphs */
  31.         newParagraph = ListToArray(newParagraph,Chr(13) & Chr(10));
  32.         newParagraphCount = ArrayLen(newParagraph);
  33.  
  34.         for(i=1;i LTE newParagraphCount;i=i+1) {
  35.  
  36.             //WriteOutput(newParagraph[i]);
  37.  
  38.             /* Ignore blank lines */
  39.             if(newParagraph[i] NEQ "") {
  40.  
  41.                 /* Remove excess paragraph elements */
  42.                 REReplace(newParagraph[i], "<?p*>", "", "All");
  43.  
  44.                 /* Loop through array of paragraphs wrapping in p elements, skipping list elements */
  45.                 containsList = REFind("<\/?ul[^>]*>$|<\/?li[^>]*>",newParagraph[i]); //
  46.                 if(containsList EQ 0) { 
  47.                     returnValue = returnValue & "<p>" & newParagraph[i] & "</p>" & Chr(13) & Chr(10);
  48.                 }
  49.                 else {
  50.                     returnValue = returnValue & newParagraph[i] & Chr(13) & Chr(10);                
  51.                 }
  52.             }
  53.         }
  54.         return trim(returnValue);
  55.         </cfscript>
  56.     </cffunction>
  57. </cfcomponent>
  58.  
Dormilich's Avatar
Moderator
 
Join Date: Aug 2008
Location: Leipzig, Germany
Posts: 3,629
#6: Dec 12 '08

re: I can't get rid of wierd characters in my RSS feed


Quote:

Originally Posted by chromis View Post

I am most of the way there with the following function, apart from putting some code in to replace the pound signs what other ways could i improve it?

this is a question more suited in the coldfusion forum. I have never used CF and I'm probably no help there....

regards
Member
 
Join Date: Jan 2008
Posts: 113
#7: Dec 15 '08

re: I can't get rid of wierd characters in my RSS feed


Ok thanks anyway, i'll ask in the cf forum.
Frinavale's Avatar
Site Moderator
 
Join Date: Oct 2006
Location: The Great White North
Posts: 5,066
#8: Dec 16 '08

re: I can't get rid of wierd characters in my RSS feed


I've moved your thread to the ColdFusion forum.
Hopefully you'll get more help here.

-Moderator Frinny
Reply

Tags
characters, feed, invalid, rss, validation