I can't get rid of wierd characters in my RSS feed | Member | | Join Date: Jan 2008
Posts: 113
| |
Hi,
I've created a utf8 encoded RSS feed which presents news data drawn from a database. I've set all aspects of my database to utf8 and also saved the text which i have put into the database as utf8 by pasting it into notepad and saving as utf8. So everything should be encoded in utf8 when the RSS feed is presented to the browser, however I am still getting the wierd question mark characters for pound signs :(
Here is my RSS feed code (coldfusion): -
<cfsilent>
-
<!--- Get News --->
-
<cfinvoke component="com.news" method="getAll" dsn="#Request.App.dsn#" returnvariable="news" />
-
</cfsilent>
-
<!--- If we have news items --->
-
<cfif news.RecordCount GT 0>
-
<!--- Serve RSS content-type --->
-
<cfcontent type="application/rss+xml">
-
<!--- Output feed --->
-
<cfcontent reset="true"><?xml version="1.0" encoding="utf-8"?>
-
<cfoutput>
-
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
-
<channel>
-
<title>News RSS Feed</title>
-
<link>#Application.siteRoot#</link>
-
<description>Welcome to the News RSS Feed</description>
-
<lastBuildDate>Wed, 19 Nov 2008 09:05:00 GMT</lastBuildDate>
-
<language>en-uk</language>
-
<atom:link href="#Application.siteRoot#news/rss/index.cfm" rel="self" type="application/rss+xml" />
-
-
<cfloop query="news">
-
<!--- Make data xml compliant --->
-
<cfscript>
-
news.headline = replace(news.headline, "<", "<", "ALL");
-
news.body = replace(news.body, "<", "<", "ALL");
-
news.date = dateformat(news.date, "ddd, dd mmm yyyy");
-
news.time = timeformat(news.time, "HH:mm:ss") & " GMT";
-
</cfscript>
-
<item>
-
<title>#news.headline#</title>
-
<link>#Application.siteRoot#news/index.cfm?id=#news.id#</link>
-
<guid>#Application.siteRoot#news/index.cfm?id=#news.id#</guid>
-
<pubDate>#news.date# #news.time#</pubDate>
-
<description>#news.body#</description>
-
</item>
-
</cfloop>
-
</channel>
-
</rss>
-
</cfoutput>
-
<cfelse>
-
<!--- If we have no news items, relocate to news page --->
-
<cflocation url="../news/index.cfm" addtoken="no">
-
</cfif>
-
Has anyone any suggestions? I've done loads of research but can't find the right answers :(
Thanks in advance,
Chromis
|  | Moderator | | Join Date: Aug 2008 Location: Leipzig, Germany
Posts: 3,652
| | | re: I can't get rid of wierd characters in my RSS feed
does it help if you use £? (I know that may be only a workaraound, but from the code I can tell nothing without the actual feed)
is £ the only character outside the ascii charset? maybe your generator has some problems with utf-8 or doesn't know which charset to use....
regards
PS please don't post your questions in the insights section, ask a moderator to move it to the answers section.
| | Member | | Join Date: Jan 2008
Posts: 113
| | | re: I can't get rid of wierd characters in my RSS feed
Hi Dormilich thanks for your reply. My apologies, I am aware of the answers section but I accidentally put it in here, it's very easy to make the mistake sadly (preffered the old layout!).
Yes the only bad character is the pound sign, I've tryed replacing it manually in the database presuming that that would replace the chracter with the utf8 equivalent but it didn't work. If i use the £ it breaks the feed. I could use cdata but I need to display paragraph formatting, and using cdata displays the p element tags.
|  | Moderator | | Join Date: Aug 2008 Location: Leipzig, Germany
Posts: 3,652
| | | re: I can't get rid of wierd characters in my RSS feed
£ breaks your feed because it's an undefined entity (you'd need a DTD to fix that). have you tried £? this should not break the feed.
regards
| | Member | | Join Date: Jan 2008
Posts: 113
| | | re: I can't get rid of wierd characters in my RSS feed
Ok i've replaced all occurences of £ with £ it now works great thanks! Why would the pound sign not be recognised though, do you think that when i saved the file as utf8 it didn't convert the character properly?
Ideally i would like to create a function in coldfusion which will doctor text and make it utf8 compliant, do you know of the best way to do this?
I am most of the way there with the following function, apart from putting some code in to replace the pound signs what other ways could i improve it? -
<cfcomponent>
-
<cffunction name="CustomParagraphFormatXMLSafe" access="public" returntype="string">
-
<cfargument name="paragraph" type="string" required="yes">
-
-
<cfscript>
-
/**
-
* Returns a XHTML string suitable for insertion into a database in the UTF-8 encoding format.
-
* The string is then wrapped with opening and closing paragraph tags whilst ignoring list elements.
-
*
-
* @param paragraph String you want XHTML / XML formatted.
-
* @return Returns a string.
-
* @author ****
-
* @version 1.0, December 10th, 2008
-
*/
-
-
var returnValue = '';
-
var newParagraph = arguments.paragraph;
-
var sqlList = "-- ,'";
-
var replacementList = "#chr(38)##chr(35)##chr(52)##chr(53)##chr(59)##chr(38)##chr(35)##chr(52)##chr(53)##chr(59)# , #chr(38)##chr(35)##chr(51)##chr(57)##chr(59)##chr(163)#";
-
-
/* Replace pound signs */
-
Replace(newParagraph,"£","£");
-
-
/* Make sql safe */
-
newParagraph = trim(replaceList( newParagraph , sqlList , replacementList ));
-
-
/* Make XML and UTF-8 Safe */
-
newParagraph = XMLFormat(CharsetEncode(CharsetDecode(newParagraph,"utf-8"),"utf-8"));
-
-
/* Break into paragraphs */
-
newParagraph = ListToArray(newParagraph,Chr(13) & Chr(10));
-
newParagraphCount = ArrayLen(newParagraph);
-
-
for(i=1;i LTE newParagraphCount;i=i+1) {
-
-
//WriteOutput(newParagraph[i]);
-
-
/* Ignore blank lines */
-
if(newParagraph[i] NEQ "") {
-
-
/* Remove excess paragraph elements */
-
REReplace(newParagraph[i], "<?p*>", "", "All");
-
-
/* Loop through array of paragraphs wrapping in p elements, skipping list elements */
-
containsList = REFind("<\/?ul[^>]*>$|<\/?li[^>]*>",newParagraph[i]); //
-
if(containsList EQ 0) {
-
returnValue = returnValue & "<p>" & newParagraph[i] & "</p>" & Chr(13) & Chr(10);
-
}
-
else {
-
returnValue = returnValue & newParagraph[i] & Chr(13) & Chr(10);
-
}
-
}
-
}
-
return trim(returnValue);
-
</cfscript>
-
</cffunction>
-
</cfcomponent>
-
|  | Moderator | | Join Date: Aug 2008 Location: Leipzig, Germany
Posts: 3,652
| | | re: I can't get rid of wierd characters in my RSS feed Quote:
Originally Posted by chromis I am most of the way there with the following function, apart from putting some code in to replace the pound signs what other ways could i improve it? this is a question more suited in the coldfusion forum. I have never used CF and I'm probably no help there....
regards
| | Member | | Join Date: Jan 2008
Posts: 113
| | | re: I can't get rid of wierd characters in my RSS feed
Ok thanks anyway, i'll ask in the cf forum.
|  | Site Moderator | | Join Date: Oct 2006 Location: The Great White North
Posts: 5,131
| | | re: I can't get rid of wierd characters in my RSS feed
I've moved your thread to the ColdFusion forum.
Hopefully you'll get more help here.
-Moderator Frinny
|  | | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 226,449 network members.
|