Connecting Tech Pros Worldwide Help | Site Map

Validating XML/XHTML in email

  #1  
Old October 23rd, 2008, 04:35 PM
Kenneth Porter
Guest
 
Posts: n/a
I'm thinking it might be a good idea to use the "quality" of an XML/XHTML
email's structure as a metric for spamminess. More errors are likely to
imply spam. Does there exist a lightweight validator that can quickly
produce a metric of how many errors exist in a message? Ideally this would
be something I could invoke from a Perl process, perhaps over a pipe to a
validation server (similar to the way ClamAV and SpamAssassin can be
invoked).
  #2  
Old October 23rd, 2008, 11:05 PM
Peter Flynn
Guest
 
Posts: n/a

re: Validating XML/XHTML in email


Kenneth Porter wrote:
Quote:
I'm thinking it might be a good idea to use the "quality" of an XML/XHTML
email's structure as a metric for spamminess. More errors are likely to
imply spam. Does there exist a lightweight validator that can quickly
produce a metric of how many errors exist in a message? Ideally this would
be something I could invoke from a Perl process, perhaps over a pipe to a
validation server (similar to the way ClamAV and SpamAssassin can be
invoked).

onsgmls -wxml -s -E 5000 xml.dcl yourfile.xml 2>&1 | grep ':E:' | wc -l

onsgmls is in the OpenSP package.

///Peter
--
XML FAQ: http://xml.silmaril.ie/
  #3  
Old October 24th, 2008, 11:45 AM
Kenneth Porter
Guest
 
Posts: n/a

re: Validating XML/XHTML in email


Peter Flynn <peter.nosp@m.silmaril.iewrote in news:6mcaioFg2irhU1
@mid.individual.net:
Quote:
onsgmls -wxml -s -E 5000 xml.dcl yourfile.xml 2>&1 | grep ':E:' | wc -l
>
onsgmls is in the OpenSP package.
That sounds good. Now to see what's involved in incorporating that into a
SpamAssassin plugin....
  #4  
Old October 24th, 2008, 08:25 PM
Kenneth Porter
Guest
 
Posts: n/a

re: Validating XML/XHTML in email


Peter Flynn <peter.nosp@m.silmaril.iewrote in news:6mcaioFg2irhU1
@mid.individual.net:
Quote:
onsgmls -wxml -s -E 5000 xml.dcl yourfile.xml 2>&1 | grep ':E:' | wc -l
>
onsgmls is in the OpenSP package.
With that hint I found that "tidy -eq" gives a pretty good result. To
normalize the score, I figure it makes sense to divide the resulting line
count by the byte count of the input file.
  #5  
Old October 28th, 2008, 12:05 AM
Peter Flynn
Guest
 
Posts: n/a

re: Validating XML/XHTML in email


Kenneth Porter wrote:
Quote:
Peter Flynn <peter.nosp@m.silmaril.iewrote in news:6mcaioFg2irhU1
@mid.individual.net:
>
Quote:
>onsgmls -wxml -s -E 5000 xml.dcl yourfile.xml 2>&1 | grep ':E:' | wc -l
>>
>onsgmls is in the OpenSP package.
>
With that hint I found that "tidy -eq" gives a pretty good result. To
normalize the score, I figure it makes sense to divide the resulting line
count by the byte count of the input file.
Ah. If it's only HTML you're handling, Tidy will be much easier to work
with. OpenSP requires well-formed XML at least, which would mean running
Tidy on the HTML first anyway.

///Peter
Closed Thread


Similar Threads
Thread Thread Starter Forum Replies Last Post
Cannot set background color (XML+CSS) michael_quinlivan@hotmail.com answers 2 July 24th, 2005 01:42 AM
Looking for arguments in favor of valid markup Jukka K. Korpela answers 38 July 20th, 2005 06:24 PM
XHTML user agent behavior regarding empty elements Mikko Ohtamaa answers 23 July 20th, 2005 04:49 PM
xhtml in xml element Wole Ogunremi answers 6 July 20th, 2005 08:58 AM