473,386 Members | 1,823 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

I can't get rid of wierd characters in my RSS feed

113 100+
Hi,

I've created a utf8 encoded RSS feed which presents news data drawn from a database. I've set all aspects of my database to utf8 and also saved the text which i have put into the database as utf8 by pasting it into notepad and saving as utf8. So everything should be encoded in utf8 when the RSS feed is presented to the browser, however I am still getting the wierd question mark characters for pound signs :(

Here is my RSS feed code (coldfusion):

Expand|Select|Wrap|Line Numbers
  1. <cfsilent>
  2. <!--- Get News --->
  3. <cfinvoke component="com.news" method="getAll" dsn="#Request.App.dsn#" returnvariable="news" />
  4. </cfsilent>
  5. <!--- If we have news items --->
  6. <cfif news.RecordCount GT 0>
  7. <!--- Serve RSS content-type --->
  8. <cfcontent type="application/rss+xml">
  9. <!--- Output feed --->
  10. <cfcontent reset="true"><?xml version="1.0" encoding="utf-8"?>
  11. <cfoutput>
  12. <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  13.     <channel>
  14.         <title>News RSS Feed</title>
  15.         <link>#Application.siteRoot#</link>
  16.         <description>Welcome to the News RSS Feed</description>
  17.         <lastBuildDate>Wed, 19 Nov 2008 09:05:00 GMT</lastBuildDate>
  18.         <language>en-uk</language>
  19.         <atom:link href="#Application.siteRoot#news/rss/index.cfm" rel="self" type="application/rss+xml" />
  20.  
  21.         <cfloop query="news">
  22.         <!--- Make data xml compliant --->
  23.         <cfscript>
  24.            news.headline = replace(news.headline, "<", "&lt;", "ALL");
  25.            news.body = replace(news.body, "<", "&lt;", "ALL");
  26.            news.date = dateformat(news.date, "ddd, dd mmm yyyy");
  27.            news.time = timeformat(news.time, "HH:mm:ss") & " GMT"; 
  28.         </cfscript>        
  29.         <item>
  30.             <title>#news.headline#</title>
  31.             <link>#Application.siteRoot#news/index.cfm?id=#news.id#</link>
  32.             <guid>#Application.siteRoot#news/index.cfm?id=#news.id#</guid>
  33.             <pubDate>#news.date# #news.time#</pubDate>
  34.             <description>#news.body#</description>
  35.         </item>
  36.         </cfloop>
  37.     </channel>
  38. </rss>
  39. </cfoutput>
  40. <cfelse>
  41. <!--- If we have no news items, relocate to news page --->
  42. <cflocation url="../news/index.cfm" addtoken="no">
  43. </cfif> 
  44.  
Has anyone any suggestions? I've done loads of research but can't find the right answers :(

Thanks in advance,

Chromis
Dec 12 '08 #1
7 4290
Dormilich
8,658 Expert Mod 8TB
does it help if you use &#163;? (I know that may be only a workaraound, but from the code I can tell nothing without the actual feed)

is £ the only character outside the ascii charset? maybe your generator has some problems with utf-8 or doesn't know which charset to use....

regards

PS please don't post your questions in the insights section, ask a moderator to move it to the answers section.
Dec 12 '08 #2
chromis
113 100+
Hi Dormilich thanks for your reply. My apologies, I am aware of the answers section but I accidentally put it in here, it's very easy to make the mistake sadly (preffered the old layout!).

Yes the only bad character is the pound sign, I've tryed replacing it manually in the database presuming that that would replace the chracter with the utf8 equivalent but it didn't work. If i use the &pound; it breaks the feed. I could use cdata but I need to display paragraph formatting, and using cdata displays the p element tags.
Dec 12 '08 #3
Dormilich
8,658 Expert Mod 8TB
&pound; breaks your feed because it's an undefined entity (you'd need a DTD to fix that). have you tried &#163;? this should not break the feed.

regards
Dec 12 '08 #4
chromis
113 100+
Ok i've replaced all occurences of £ with £ it now works great thanks! Why would the pound sign not be recognised though, do you think that when i saved the file as utf8 it didn't convert the character properly?
Ideally i would like to create a function in coldfusion which will doctor text and make it utf8 compliant, do you know of the best way to do this?

I am most of the way there with the following function, apart from putting some code in to replace the pound signs what other ways could i improve it?

Expand|Select|Wrap|Line Numbers
  1. <cfcomponent>
  2.     <cffunction name="CustomParagraphFormatXMLSafe" access="public" returntype="string">
  3.         <cfargument name="paragraph" type="string" required="yes">
  4.  
  5.         <cfscript>
  6.         /**
  7.          * Returns a XHTML string suitable for insertion into a database in the UTF-8 encoding format.
  8.          * The string is then wrapped with opening and closing paragraph tags whilst ignoring list elements.
  9.          * 
  10.          * @param paragraph String you want XHTML / XML formatted. 
  11.          * @return Returns a string. 
  12.          * @author **** 
  13.          * @version 1.0, December 10th, 2008
  14.          */
  15.  
  16.         var returnValue = '';
  17.         var newParagraph = arguments.paragraph;
  18.         var sqlList = "-- ,'";
  19.         var replacementList = "#chr(38)##chr(35)##chr(52)##chr(53)##chr(59)##chr(38)##chr(35)##chr(52)##chr(53)##chr(59)# , #chr(38)##chr(35)##chr(51)##chr(57)##chr(59)##chr(163)#";
  20.  
  21.         /* Replace pound signs */
  22.         Replace(newParagraph,"£","&pound;");
  23.  
  24.         /* Make sql safe */
  25.         newParagraph = trim(replaceList( newParagraph , sqlList , replacementList ));    
  26.  
  27.         /* Make XML and UTF-8 Safe */
  28.         newParagraph = XMLFormat(CharsetEncode(CharsetDecode(newParagraph,"utf-8"),"utf-8"));
  29.  
  30.         /* Break into paragraphs */
  31.         newParagraph = ListToArray(newParagraph,Chr(13) & Chr(10));
  32.         newParagraphCount = ArrayLen(newParagraph);
  33.  
  34.         for(i=1;i LTE newParagraphCount;i=i+1) {
  35.  
  36.             //WriteOutput(newParagraph[i]);
  37.  
  38.             /* Ignore blank lines */
  39.             if(newParagraph[i] NEQ "") {
  40.  
  41.                 /* Remove excess paragraph elements */
  42.                 REReplace(newParagraph[i], "<?p*>", "", "All");
  43.  
  44.                 /* Loop through array of paragraphs wrapping in p elements, skipping list elements */
  45.                 containsList = REFind("<\/?ul[^>]*>$|<\/?li[^>]*>",newParagraph[i]); //
  46.                 if(containsList EQ 0) { 
  47.                     returnValue = returnValue & "<p>" & newParagraph[i] & "</p>" & Chr(13) & Chr(10);
  48.                 }
  49.                 else {
  50.                     returnValue = returnValue & newParagraph[i] & Chr(13) & Chr(10);                
  51.                 }
  52.             }
  53.         }
  54.         return trim(returnValue);
  55.         </cfscript>
  56.     </cffunction>
  57. </cfcomponent>
  58.  
Dec 12 '08 #5
Dormilich
8,658 Expert Mod 8TB
@chromis
this is a question more suited in the coldfusion forum. I have never used CF and I'm probably no help there....

regards
Dec 12 '08 #6
chromis
113 100+
Ok thanks anyway, i'll ask in the cf forum.
Dec 15 '08 #7
Frinavale
9,735 Expert Mod 8TB
I've moved your thread to the ColdFusion forum.
Hopefully you'll get more help here.

-Moderator Frinny
Dec 16 '08 #8

Sign in to post your reply or Sign up for a free account.

Similar topics

9
by: lawrence | last post by:
I'm running this page: http://www.krubner.com/rss/page938.xml through this validator: http://rss.scripting.com/?url=http%3A%2F%2Fwww.krubner.com%2Frss%2Fpage938.xml
4
by: lkrubner | last post by:
Whenever users write a post in Microsoft Word and then post it to their weblogs using my PHP software, their RSS feed ends up being corrupted with garbage characters which violate the...
4
by: intl04 | last post by:
I have a memo field that is included in some Access reports I created. Is there some way for the memo field to display nicely formatted text, with line breaks between paragraphs? Or is it necessary...
3
by: Sathyaish | last post by:
A practice excercise from K&R. Kindly read the comments within the program. I'd be very grateful to people who helped. Why is it that I get the wierd face-like characters on the screen instead of...
2
by: Buddy Ackerman | last post by:
Apparently .NET strips these white space characters (MSXML doesn't) regardless of what the output method is set to. I'm using <xsl:text> </xsl:text> to output a tab character and...
2
by: lawrence k | last post by:
2 years ago I asked, on this newsgroup, how to weed out non-UTF-8 characters from my RSS feed. I was told that I could not do so with certainty, but I could try various tricks that would give me...
1
by: =?Utf-8?B?RGlmZmlkZW50?= | last post by:
Hi All, I have created an RSS feed reader. However, the feed that I am trying to read has some invalid characters which my reader does not like. I have no control on the RSS feed but I would...
14
by: jt | last post by:
hello everyone.., i'm using ubuntu 8.04 OS. I'm not able to output the non-printable ascii chatacters. for eg. printf("%c",1); // nothing is outputted..... is there any way to output these...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.