I've been trying to create a class which will format text copy and pasted from a word document into an XML / XHTML compliant string complete with paragraphs to then be inserted into a database and in turn an RSS feed.
I'm 90% there, but I would like to know whether what i have done is correct or the best way to do it.
Here is my class:
Expand|Select|Wrap|Line Numbers
- <cfcomponent>
- <cffunction name="CustomParagraphFormatXMLSafe" access="public" returntype="string">
- <cfargument name="paragraph" type="string" required="yes">
- <cfscript>
- /**
- * Returns a XHTML string suitable for insertion into a database in the UTF-8 encoding format.
- * The string is then wrapped with opening and closing paragraph tags whilst ignoring list elements.
- *
- * @param paragraph String you want XHTML / XML formatted.
- * @return Returns a string.
- * @author ****
- * @version 1.0, December 10th, 2008
- */
- var returnValue = '';
- var newParagraph = arguments.paragraph;
- var sqlList = "-- ,'";
- var replacementList = "#chr(38)##chr(35)##chr(52)##chr(53)##chr(59)##chr(38)##chr(35)##chr(52)##chr(53)##chr(59)# , #chr(38)##chr(35)##chr(51)##chr(57)##chr(59)##chr(163)#";
- /* Make sql safe */
- newParagraph = trim(replaceList( newParagraph , sqlList , replacementList ));
- /* Make XML and UTF-8 Safe */
- newParagraph = XMLFormat(CharsetEncode(CharsetDecode(newParagraph,"utf-8"),"utf-8"));
- /* Break into paragraphs */
- newParagraph = ListToArray(newParagraph,Chr(13) & Chr(10));
- newParagraphCount = ArrayLen(newParagraph);
- for(i=1;i LTE newParagraphCount;i=i+1) {
- //WriteOutput(newParagraph[i]);
- /* Ignore blank lines */
- if(newParagraph[i] NEQ "") {
- /* Remove excess paragraph elements */
- REReplace(newParagraph[i], "<?p*>", "", "All");
- /* Loop through array of paragraphs wrapping in p elements, skipping list elements */
- containsList = REFind("<\/?ul[^>]*>$|<\/?li[^>]*>",newParagraph[i]); //
- if(containsList EQ 0) {
- returnValue = returnValue & "<p>" & newParagraph[i] & "</p>" & Chr(13) & Chr(10);
- }
- else {
- returnValue = returnValue & newParagraph[i] & Chr(13) & Chr(10);
- }
- }
- }
- return trim(returnValue);
- </cfscript>
- </cffunction>
- </cfcomponent>
Another avenue i considered exploring was to create a large list of incorrect characters "£# etc and then replace them with the chr() equivalent using the ReplaceList.
Any ideas or feedback are welcome.
Thanks,
Chromis