Connecting Tech Pros Worldwide Forums | Help | Site Map

Correct way to format strings for entry into RSS Feed

Member
 
Join Date: Jan 2008
Posts: 113
#1: Dec 10 '08
Hi,

I've been trying to create a class which will format text copy and pasted from a word document into an XML / XHTML compliant string complete with paragraphs to then be inserted into a database and in turn an RSS feed.
I'm 90% there, but I would like to know whether what i have done is correct or the best way to do it.

Here is my class:

Expand|Select|Wrap|Line Numbers
  1. <cfcomponent>
  2.     <cffunction name="CustomParagraphFormatXMLSafe" access="public" returntype="string">
  3.         <cfargument name="paragraph" type="string" required="yes">
  4.  
  5.         <cfscript>
  6.         /**
  7.          * Returns a XHTML string suitable for insertion into a database in the UTF-8 encoding format.
  8.          * The string is then wrapped with opening and closing paragraph tags whilst ignoring list elements.
  9.          * 
  10.          * @param paragraph String you want XHTML / XML formatted. 
  11.          * @return Returns a string. 
  12.          * @author **** 
  13.          * @version 1.0, December 10th, 2008
  14.          */
  15.  
  16.         var returnValue = '';
  17.         var newParagraph = arguments.paragraph;
  18.         var sqlList = "-- ,'";
  19.         var replacementList = "#chr(38)##chr(35)##chr(52)##chr(53)##chr(59)##chr(38)##chr(35)##chr(52)##chr(53)##chr(59)# , #chr(38)##chr(35)##chr(51)##chr(57)##chr(59)##chr(163)#";
  20.  
  21.         /* Make sql safe */
  22.         newParagraph = trim(replaceList( newParagraph , sqlList , replacementList ));    
  23.  
  24.         /* Make XML and UTF-8 Safe */
  25.         newParagraph = XMLFormat(CharsetEncode(CharsetDecode(newParagraph,"utf-8"),"utf-8"));
  26.  
  27.         /* Break into paragraphs */
  28.         newParagraph = ListToArray(newParagraph,Chr(13) & Chr(10));
  29.         newParagraphCount = ArrayLen(newParagraph);
  30.  
  31.         for(i=1;i LTE newParagraphCount;i=i+1) {
  32.  
  33.             //WriteOutput(newParagraph[i]);
  34.  
  35.             /* Ignore blank lines */
  36.             if(newParagraph[i] NEQ "") {
  37.  
  38.                 /* Remove excess paragraph elements */
  39.                 REReplace(newParagraph[i], "<?p*>", "", "All");
  40.  
  41.                 /* Loop through array of paragraphs wrapping in p elements, skipping list elements */
  42.                 containsList = REFind("<\/?ul[^>]*>$|<\/?li[^>]*>",newParagraph[i]); //
  43.                 if(containsList EQ 0) { 
  44.                     returnValue = returnValue & "<p>" & newParagraph[i] & "</p>" & Chr(13) & Chr(10);
  45.                 }
  46.                 else {
  47.                     returnValue = returnValue & newParagraph[i] & Chr(13) & Chr(10);                
  48.                 }
  49.             }
  50.         }
  51.         return trim(returnValue);
  52.         </cfscript>
  53.     </cffunction>
  54. </cfcomponent>
  55.  
My reasoning for using the char encode decode was that if there were characters outside of the utf-8 character encoding format then these would be taken care of, is this correct? The sql list was something i lifted from someone else function to ensure that the string is sql safe.

Another avenue i considered exploring was to create a large list of incorrect characters "£# etc and then replace them with the chr() equivalent using the ReplaceList.

Any ideas or feedback are welcome.

Thanks,

Chromis

acoder's Avatar
Site Moderator
 
Join Date: Nov 2006
Location: UK
Posts: 14,581
#2: Dec 11 '08

re: Correct way to format strings for entry into RSS Feed


I don't have experience in RSS feeds specifically, but the validation does look right.

I would say rather than a large list of incorrect characters, how about a list of valid characters or a reg exp.
Member
 
Join Date: Jan 2008
Posts: 113
#3: Dec 11 '08

re: Correct way to format strings for entry into RSS Feed


The thing is the content will be coming from a user (copy and pasted from word for instance) so any characters could be input through it, so the ideal solution would be to convert the incorrect characters rather than delete them. Should i carry on down this route?
acoder's Avatar
Site Moderator
 
Join Date: Nov 2006
Location: UK
Posts: 14,581
#4: Dec 11 '08

re: Correct way to format strings for entry into RSS Feed


Oh, I see. In that case, that sounds right. I was thinking more in terms of validation.
Reply

Tags
feed, rss, validation