Hi,
I've been trying to create a class which will format text copy and pasted from a word document into an XML / XHTML compliant string complete with paragraphs to then be inserted into a database and in turn an RSS feed.
I'm 90% there, but I would like to know whether what i have done is correct or the best way to do it.
Here is my class:
-
<cfcomponent>
-
<cffunction name="CustomParagraphFormatXMLSafe" access="public" returntype="string">
-
<cfargument name="paragraph" type="string" required="yes">
-
-
<cfscript>
-
/**
-
* Returns a XHTML string suitable for insertion into a database in the UTF-8 encoding format.
-
* The string is then wrapped with opening and closing paragraph tags whilst ignoring list elements.
-
*
-
* @param paragraph String you want XHTML / XML formatted.
-
* @return Returns a string.
-
* @author ****
-
* @version 1.0, December 10th, 2008
-
*/
-
-
var returnValue = '';
-
var newParagraph = arguments.paragraph;
-
var sqlList = "-- ,'";
-
var replacementList = "#chr(38)##chr(35)##chr(52)##chr(53)##chr(59)##chr(38)##chr(35)##chr(52)##chr(53)##chr(59)# , #chr(38)##chr(35)##chr(51)##chr(57)##chr(59)##chr(163)#";
-
-
/* Make sql safe */
-
newParagraph = trim(replaceList( newParagraph , sqlList , replacementList ));
-
-
/* Make XML and UTF-8 Safe */
-
newParagraph = XMLFormat(CharsetEncode(CharsetDecode(newParagraph,"utf-8"),"utf-8"));
-
-
/* Break into paragraphs */
-
newParagraph = ListToArray(newParagraph,Chr(13) & Chr(10));
-
newParagraphCount = ArrayLen(newParagraph);
-
-
for(i=1;i LTE newParagraphCount;i=i+1) {
-
-
//WriteOutput(newParagraph[i]);
-
-
/* Ignore blank lines */
-
if(newParagraph[i] NEQ "") {
-
-
/* Remove excess paragraph elements */
-
REReplace(newParagraph[i], "<?p*>", "", "All");
-
-
/* Loop through array of paragraphs wrapping in p elements, skipping list elements */
-
containsList = REFind("<\/?ul[^>]*>$|<\/?li[^>]*>",newParagraph[i]); //
-
if(containsList EQ 0) {
-
returnValue = returnValue & "<p>" & newParagraph[i] & "</p>" & Chr(13) & Chr(10);
-
}
-
else {
-
returnValue = returnValue & newParagraph[i] & Chr(13) & Chr(10);
-
}
-
}
-
}
-
return trim(returnValue);
-
</cfscript>
-
</cffunction>
-
</cfcomponent>
-
My reasoning for using the char encode decode was that if there were characters outside of the utf-8 character encoding format then these would be taken care of, is this correct? The sql list was something i lifted from someone else function to ensure that the string is sql safe.
Another avenue i considered exploring was to create a large list of incorrect characters "£# etc and then replace them with the chr() equivalent using the ReplaceList.
Any ideas or feedback are welcome.
Thanks,
Chromis