473,320 Members | 1,722 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Correct way to format strings for entry into RSS Feed

113 100+
Hi,

I've been trying to create a class which will format text copy and pasted from a word document into an XML / XHTML compliant string complete with paragraphs to then be inserted into a database and in turn an RSS feed.
I'm 90% there, but I would like to know whether what i have done is correct or the best way to do it.

Here is my class:

Expand|Select|Wrap|Line Numbers
  1. <cfcomponent>
  2.     <cffunction name="CustomParagraphFormatXMLSafe" access="public" returntype="string">
  3.         <cfargument name="paragraph" type="string" required="yes">
  4.  
  5.         <cfscript>
  6.         /**
  7.          * Returns a XHTML string suitable for insertion into a database in the UTF-8 encoding format.
  8.          * The string is then wrapped with opening and closing paragraph tags whilst ignoring list elements.
  9.          * 
  10.          * @param paragraph String you want XHTML / XML formatted. 
  11.          * @return Returns a string. 
  12.          * @author **** 
  13.          * @version 1.0, December 10th, 2008
  14.          */
  15.  
  16.         var returnValue = '';
  17.         var newParagraph = arguments.paragraph;
  18.         var sqlList = "-- ,'";
  19.         var replacementList = "#chr(38)##chr(35)##chr(52)##chr(53)##chr(59)##chr(38)##chr(35)##chr(52)##chr(53)##chr(59)# , #chr(38)##chr(35)##chr(51)##chr(57)##chr(59)##chr(163)#";
  20.  
  21.         /* Make sql safe */
  22.         newParagraph = trim(replaceList( newParagraph , sqlList , replacementList ));    
  23.  
  24.         /* Make XML and UTF-8 Safe */
  25.         newParagraph = XMLFormat(CharsetEncode(CharsetDecode(newParagraph,"utf-8"),"utf-8"));
  26.  
  27.         /* Break into paragraphs */
  28.         newParagraph = ListToArray(newParagraph,Chr(13) & Chr(10));
  29.         newParagraphCount = ArrayLen(newParagraph);
  30.  
  31.         for(i=1;i LTE newParagraphCount;i=i+1) {
  32.  
  33.             //WriteOutput(newParagraph[i]);
  34.  
  35.             /* Ignore blank lines */
  36.             if(newParagraph[i] NEQ "") {
  37.  
  38.                 /* Remove excess paragraph elements */
  39.                 REReplace(newParagraph[i], "<?p*>", "", "All");
  40.  
  41.                 /* Loop through array of paragraphs wrapping in p elements, skipping list elements */
  42.                 containsList = REFind("<\/?ul[^>]*>$|<\/?li[^>]*>",newParagraph[i]); //
  43.                 if(containsList EQ 0) { 
  44.                     returnValue = returnValue & "<p>" & newParagraph[i] & "</p>" & Chr(13) & Chr(10);
  45.                 }
  46.                 else {
  47.                     returnValue = returnValue & newParagraph[i] & Chr(13) & Chr(10);                
  48.                 }
  49.             }
  50.         }
  51.         return trim(returnValue);
  52.         </cfscript>
  53.     </cffunction>
  54. </cfcomponent>
  55.  
My reasoning for using the char encode decode was that if there were characters outside of the utf-8 character encoding format then these would be taken care of, is this correct? The sql list was something i lifted from someone else function to ensure that the string is sql safe.

Another avenue i considered exploring was to create a large list of incorrect characters "£# etc and then replace them with the chr() equivalent using the ReplaceList.

Any ideas or feedback are welcome.

Thanks,

Chromis
Dec 10 '08 #1
3 2767
acoder
16,027 Expert Mod 8TB
I don't have experience in RSS feeds specifically, but the validation does look right.

I would say rather than a large list of incorrect characters, how about a list of valid characters or a reg exp.
Dec 11 '08 #2
chromis
113 100+
The thing is the content will be coming from a user (copy and pasted from word for instance) so any characters could be input through it, so the ideal solution would be to convert the incorrect characters rather than delete them. Should i carry on down this route?
Dec 11 '08 #3
acoder
16,027 Expert Mod 8TB
Oh, I see. In that case, that sounds right. I was thinking more in terms of validation.
Dec 11 '08 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

7
by: marduk | last post by:
I have a weird request. I want to be able to say def myvalues(): while True: # stuff that determines a new somevalue yield somevalue x = "Hello, %s, this is a %s with %s and %s on top of...
12
by: neutrino | last post by:
Greetings to the Python gurus, I have a binary file and wish to see the "raw" content of it. So I open it in binary mode, and read one byte at a time to a variable, which will be of the string...
4
by: intl04 | last post by:
I have a memo field that is included in some Access reports I created. Is there some way for the memo field to display nicely formatted text, with line breaks between paragraphs? Or is it necessary...
15
by: Fritz Switzer | last post by:
I'd like to have a string assigned the value of a DateTime.AddMinutes(amount) so that the string is formatted in "HH:MM" format. For example: DateTime.Now.AddMinutes(30) returns "00:30" ...
7
by: Mike | last post by:
List, I call this a "Parsing Problem", but it could be called formatting or regular expressions as well. I have a set of data that was formerly processed on an OS390 (hence a lot of...
5
by: Burak | last post by:
Hello, I would like to format the string "11304200" into "11-3042.00". Can I do this with String.Format method? I have not come across any good documentation. Thank you,
4
by: AWesner | last post by:
For readability sake I’m going to first state that: LF = Line Feed CHR(10) CR = Carriage Return CHR(13) Since Rich Text Format is a standard formalized by Microsoft Corporation I get to ask...
2
by: JP SIngh | last post by:
Hi All We are creating a multi-region ASP application which will be using SQL Server 2000. As our users exist in multiple location i.e. UK, US, Australia how can we distinguish that the date...
0
by: cftygv | last post by:
Hi, I have a dll that was developed in VC ++. I need to call the API in C#. But on run timr iu get an error that no entry point found for function ... in DLL. and when trying to call the...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.