473,396 Members | 1,987 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Why is this valid HTML?

As you may know, spammer use this technique to get by filters.

<!H>It<!W> is<!N> <!K>a<!L> w<!Q>el<!Q>l <!X>k<!O>now<!B>n <!F>f<!G>a<!V>c<!O>t
<!S>th<!B>at p<!R>eopl<!J>e<!G> <!Z>who <!V>p<!U>o<!P>ss<!F>e<!L>s<!U> <!S>a
<!J>de<!S>gr<!T>ee <!W>a<!K>r<!I>e<!V> l<!O>o<!D>o<!W>k<!C>ed <!J>upo<!R>n<!K>
a<!U>s<!G> <!X>th<!O>e <!E>elit<!U>e<!N><BR>
<!T>
If yo<!Q>u <!B>ha<!C>ve<!Y> a<!S> d<!Q>eg<!E>r<!Y>ee<!E>, yo<!F>u<!N> a<!Z>re<!
M>
<!D>a<!O>l<!G>m<!S>o<!R>st as<!Z>sur<!C>ed to <!D>g<!R>ain<!I>
l<!P>ev<!Z>e<!O>r<!Y>ag<!M>e<!H> <!H>in <!K>t<!N>h<!V>e<!Z> w<!T>o<!Y>rk
<!X>p<!R>lac<!P>e
<!M>
Why would that be considered valid HTML adn viewable by all major browsers?
Jul 20 '05 #1
12 2403
Mr. Clean wrote:
As you may know, spammer use this technique to get by filters.

<!H>It<!W> is<!N> <!K>a<!L> w<!Q>el<!Q>l <!X>k<!O>now<!B>n <!F>f<!G>a<!V>c<!O>t
<!S>th<!B>at p<!R>eopl<!J>e<!G> <!Z>who <!V>p<!U>o<!P>ss<!F>e<!L>s<!U> <!S>a
<!J>de<!S>gr<!T>ee <!W>a<!K>r<!I>e<!V> l<!O>o<!D>o<!W>k<!C>ed <!J>upo<!R>n<!K>
a<!U>s<!G> <!X>th<!O>e <!E>elit<!U>e<!N><BR>
<!T>
If yo<!Q>u <!B>ha<!C>ve<!Y> a<!S> d<!Q>eg<!E>r<!Y>ee<!E>, yo<!F>u<!N> a<!Z>re<!
M>
<!D>a<!O>l<!G>m<!S>o<!R>st as<!Z>sur<!C>ed to <!D>g<!R>ain<!I>
l<!P>ev<!Z>e<!O>r<!Y>ag<!M>e<!H> <!H>in <!K>t<!N>h<!V>e<!Z> w<!T>o<!Y>rk
<!X>p<!R>lac<!P>e
<!M>
Why would that be considered valid HTML
AFAICS, it's not.
adn viewable by all major browsers?


Browsers are supposed to do error correction, and ignore that which
they don't understand, while still attempting to render that which
they do. However, if you're talking about spamming, that's email,
thus news readers, not browsers, correct?

--
Brian
follow the directions in my address to email me

Jul 20 '05 #2
Mr. Clean <mr*****@protctorandgamble.com> wrote:
As you may know, spammer use this technique to get by filters.

<!H>It<!W> is<!N> <!K>a<!L> w<!Q>el<!Q>l <!X>k<!O>now<!B>n <!F>f<!G>a<!V>c<!O>t
<!S>th<!B>at p<!R>eopl<!J>e<!G> <!Z>who <!V>p<!U>o<!P>ss<!F>e<!L>s<!U> <!S>a
<!J>de<!S>gr<!T>ee <!W>a<!K>r<!I>e<!V> l<!O>o<!D>o<!W>k<!C>ed <!J>upo<!R>n<!K>
a<!U>s<!G> <!X>th<!O>e <!E>elit<!U>e<!N><BR>
<!T>
If yo<!Q>u <!B>ha<!C>ve<!Y> a<!S> d<!Q>eg<!E>r<!Y>ee<!E>, yo<!F>u<!N> a<!Z>re<!
M>
<!D>a<!O>l<!G>m<!S>o<!R>st as<!Z>sur<!C>ed to <!D>g<!R>ain<!I>
l<!P>ev<!Z>e<!O>r<!Y>ag<!M>e<!H> <!H>in <!K>t<!N>h<!V>e<!Z> w<!T>o<!Y>rk
<!X>p<!R>lac<!P>e
<!M>
Why would that be considered valid HTML adn viewable by all major browsers?


It's not valid HTML.
See
http://validator.w3.org/check?uri=ht...st%2Fspam.html

On the other hand browsers ignore unknown markup and so don't display
any of the phoney SGML declarations.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net> <http://steve.pugh.net/>
Jul 20 '05 #3
"Mr. Clean" <mr*****@protctorandgamble.com> wrote in message
news:MP************************@news-server.austin.rr.com...
As you may know, spammer use this technique to get by filters.

<!H>It<!W> is<!N> <!K>a<!L> w<!Q>el<!Q>l <!X>k<!O>now<!B>n <!F>f<!G>a<!V>c<!O>t <!S>th<!B>at p<!R>eopl<!J>e<!G> <!Z>who <!V>p<!U>o<!P>ss<!F>e<!L>s<!U> <!S>a <!J>de<!S>gr<!T>ee <!W>a<!K>r<!I>e<!V> l<!O>o<!D>o<!W>k<!C>ed <!J>upo<!R>n<!K> a<!U>s<!G> <!X>th<!O>e <!E>elit<!U>e<!N><BR>
<!T>
If yo<!Q>u <!B>ha<!C>ve<!Y> a<!S> d<!Q>eg<!E>r<!Y>ee<!E>, yo<!F>u<!N> a<!Z>re<! M>
<!D>a<!O>l<!G>m<!S>o<!R>st as<!Z>sur<!C>ed to <!D>g<!R>ain<!I>
l<!P>ev<!Z>e<!O>r<!Y>ag<!M>e<!H> <!H>in <!K>t<!N>h<!V>e<!Z> w<!T>o<!Y>rk
<!X>p<!R>lac<!P>e
<!M>


Just make a filter that identifies as spam any email containing (say) 3 or
more instances of a '<' character followed by a non-alphabetic character
(this can be done in Eudora, not sure about other email programs): attempts
to obfuscate spam like this just makes it easier to positively identify
spam.
Jul 20 '05 #4
Lars G. Svensson wrote:
On Thu, 10 Jul 2003 14:30:36 GMT, C A Upsdell <cupsdell0311XXX@-@-
@XXXrogers.com> wrote:

[snip]

Just make a filter that identifies as spam any email containing (say) 3
or more instances of a '<' character followed by a non-alphabetic
character
[snip] What if someone posts a DTD in a plain text mail?
[snip example]
The above message certainly has three (or more!) instances of a '<'
followed by a non-alphabetic char but I wouldn't say it's spam. (I know,
you can't have it all...)


How is the email encoded? If it's encoded as HTML, then the DTD shouldn't
trigger the filters as it will be encoded as &lt! rather than <!. If it's
encoded as plain text, then it's not using this spamming technique. All
you want to match is the literal string <! in HTML encoded mails.

--
Jim Dabell

Jul 20 '05 #5
On Fri, 11 Jul 2003 09:56:54 +0100, Jim Dabell <ji********@jimdabell.com>
wrote:
Lars G. Svensson wrote:
On Thu, 10 Jul 2003 14:30:36 GMT, C A Upsdell <cupsdell0311XXX@-@-
@XXXrogers.com> wrote:

[snip]

Just make a filter that identifies as spam any email containing (say) 3 ^^^ or more instances of a '<' character followed by a non-alphabetic
character

[snip]
What if someone posts a DTD in a plain text mail?


[snip example]
The above message certainly has three (or more!) instances of a '<'
followed by a non-alphabetic char but I wouldn't say it's spam. (I know,
you can't have it all...)


How is the email encoded? If it's encoded as HTML, then the DTD
shouldn't
trigger the filters as it will be encoded as &lt! rather than <!. If
it's
encoded as plain text, then it's not using this spamming technique. All
you want to match is the literal string <! in HTML encoded mails.

True. (Think first, post later). Then, however, we're not considering _any_
email any more, but any _html_ email...

--
Lars

Jul 20 '05 #6
Lars G. Svensson wrote:
On Fri, 11 Jul 2003 09:56:54 +0100, Jim Dabell <ji********@jimdabell.com>
wrote:

[snip]
How is the email encoded? If it's encoded as HTML, then the DTD
shouldn't
trigger the filters as it will be encoded as &lt! rather than <!. If
it's
encoded as plain text, then it's not using this spamming technique. All
you want to match is the literal string <! in HTML encoded mails.

True. (Think first, post later). Then, however, we're not considering
_any_ email any more, but any _html_ email...


Some people would say that you should just filter out all HTML email
altogether :)

--
Jim Dabell

Jul 20 '05 #7
"Lars G. Svensson" <sv******@dbf.ddb.de> wrote in message
news:op**************@news.cis.dfn.de...
Just make a filter that identifies as spam any email containing (say) 3
^^^
or more instances of a '<' character followed by a non-alphabetic
character [snip]
What if someone posts a DTD in a plain text mail?


[snip example]
The above message certainly has three (or more!) instances of a '<'
followed by a non-alphabetic char but I wouldn't say it's spam. (I

know, you can't have it all...)


How is the email encoded? If it's encoded as HTML, then the DTD
shouldn't
trigger the filters as it will be encoded as &lt! rather than <!. If
it's
encoded as plain text, then it's not using this spamming technique. All
you want to match is the literal string <! in HTML encoded mails.

True. (Think first, post later). Then, however, we're not considering

_any_ email any more, but any _html_ email...


Which is the type of email in which such obfuscation was used, and a detail
that I left as an exercise for the attentive reader. Fortunately it is
trivial to make a filter that identifies HTML email.

Jul 20 '05 #8
"Jim Dabell" <ji********@jimdabell.com> wrote in message
news:J0********************@giganews.com...
Lars G. Svensson wrote:
On Thu, 10 Jul 2003 14:30:36 GMT, C A Upsdell <cupsdell0311XXX@-@-
@XXXrogers.com> wrote:

[snip]

Just make a filter that identifies as spam any email containing (say) 3
or more instances of a '<' character followed by a non-alphabetic
character

[snip]
What if someone posts a DTD in a plain text mail?


[snip example]
The above message certainly has three (or more!) instances of a '<'
followed by a non-alphabetic char but I wouldn't say it's spam. (I know,
you can't have it all...)


How is the email encoded? If it's encoded as HTML, then the DTD shouldn't
trigger the filters as it will be encoded as &lt! rather than <!. If it's
encoded as plain text, then it's not using this spamming technique. All
you want to match is the literal string <! in HTML encoded mails.


The example might not have effectively illustrated the point, but that
doesn't render the point invalid. Are you (or C A Upsdell, who suggested
the filter in the first place) implying that all well-commented HTML mail
should be regarded as spam? Perhaps the sender uses an HTML template for
email which contains comments to assist users in its customization, hide
embedded copyright information, etc. While it's certainly a waste of
bandwidth to do that, it would be rash to automatically regard such messages
as spam.

-David Safar, Devil's Advocate
Jul 20 '05 #9
"David Safar" <gw******@pacbell.net> wrote in message
news:4M**************@newssvr19.news.prodigy.com.. .

How is the email encoded? If it's encoded as HTML, then the DTD shouldn't trigger the filters as it will be encoded as &lt! rather than <!. If it's encoded as plain text, then it's not using this spamming technique. All
you want to match is the literal string <! in HTML encoded mails.
The example might not have effectively illustrated the point, but that
doesn't render the point invalid. Are you (or C A Upsdell, who suggested
the filter in the first place) implying that all well-commented HTML mail
should be regarded as spam? Perhaps the sender uses an HTML template for
email which contains comments to assist users in its customization, hide
embedded copyright information, etc. While it's certainly a waste of
bandwidth to do that, it would be rash to automatically regard such

messages as spam.


Few filters can identify spam with 100% accuracy: there is usually some
element of doubt. What I do is try to ensure that probable spam is
identified with as few false positives and negatives as possible. Probable
spam is put in the Trash folder and coloured red (Eudora can do this) to
make it easy to identify: and I always review the Trash folder, before
deleting messages from it, to pick out any false positives; these are rare.
I also examine false negatives to decide whether my filters need to be
improved.

My filters do not identify all HTML email as probable spam: but it does
identify HTML email that match certain other filter criteria.


Jul 20 '05 #10
Tim
On Sat, 12 Jul 2003 19:12:35 GMT,
"D. Stussy" <kd****@bde-arc.ampr.org> wrote:
Most of the HTML is completely useless and adds NOTHING to either
the formatting or content of the message. Only in very rare cases
(5% or less) does the HTML code do anything that the plain-text
version can't convey as information somehow.


While I agree that HTML messages are nasty, unwanted, and should never
be used unless invited, it does actually have the ability to do at least
one thing better than plain text (though I've never seen it done that
way): Quoting prior text.

The >>> business used in plain text messages, with disassociated (or
often missing), author attributions means that many messages can't be
attributed to a particular author, particularly in threads with a lot of
contributions. Though more judicious snipping of prior comments
certainly helps (less gumph to read through, is easier).

The emulation of that method in HTML postings is no better; in fact
worse, by the time you've added a pile of (nonsensical)) div elements,
&gt; character entities, etc.

However, it does have the potential to properly use the blockquote
element around a contributors text (marking it as a quote, and who by).
Even though the usual problems of how to use cite with blockquote, still
exists; given a good client, it'd be an improvement.

(I did read a, fairly good, document on how it could be done, which did
explain it well; but I can't recall the address at the moment, nor can I
recollect whether it was an RFC or just an essay.)

But, considering the vast use of plain text clients (which couldn't use
that information, and would break it for the next person to quote using
HTML), and the vast number of HTML clients which make a pigs breakfast
of messages (and have numerous security flaws), we're better off with a
plain text system.

--
My "from" address is totally fake. (Hint: If I wanted e-mails from
complete strangers, I'd have put a real one, there.) Reply to usenet
postings in the same place as you read the message you're replying to.
Jul 20 '05 #11
On Sun, Jul 13, Tim inscribed on the eternal scroll:
The emulation of that method in HTML postings is no better; in fact
worse, by the time you've added a pile of (nonsensical)) div elements,
&gt; character entities, etc.
amen to that...
However, it does have the potential to properly use the blockquote
element around a contributors text (marking it as a quote, and who by).
Even though the usual problems of how to use cite with blockquote, still
exists; given a good client, it'd be an improvement.
Well, the cite= attribute of blockquote takes a URI as value, does it
not, and news:messageid is a proper URI (once it's been URLencoded,
anyway), so actually the idea falls neatly into place.
But, considering the vast use of plain text clients (which couldn't use
that information, and would break it for the next person to quote using
HTML), and the vast number of HTML clients which make a pigs breakfast
of messages (and have numerous security flaws), we're better off with a
plain text system.


Indeed: by discussing the technical detail, I don't mean to imply that
I'd actually support doing that, at least not without a great deal of
careful thought and planning, and the availability of well-defended
client software.

Jul 20 '05 #12
Tim
On Sun, Jul 13, Tim inscribed on the eternal scroll:
The emulation of that method in HTML postings is no better; in fact
worse, by the time you've added a pile of (nonsensical)) div elements,
&gt; character entities, etc.

On Sun, 13 Jul 2003 00:00:50 +0200,
"Alan J. Flavell" <fl*****@mail.cern.ch> wrote:
amen to that...
Could /they/ possibly do anything worse than they usually do? I don't
think so. (Do I really need to identify who "they" are?) :-\
However, it does have the potential to properly use the blockquote
element around a contributors text (marking it as a quote, and who by).
Even though the usual problems of how to use cite with blockquote, still
exists; given a good client, it'd be an improvement. Well, the cite= attribute of blockquote takes a URI as value, does it
not, and news:messageid is a proper URI (once it's been URLencoded,
anyway), so actually the idea falls neatly into place.
I was thinking more about the problem (discussed here, only recently),
about providing that cite information (or other information, such as
identifying the author in more human sensible terms than a message id)
as part of the display. Attributes usually do their tricks behind the
scenes.
But, considering the vast use of plain text clients (which couldn't use
that information, and would break it for the next person to quote using
HTML), and the vast number of HTML clients which make a pigs breakfast
of messages (and have numerous security flaws), we're better off with a
plain text system.

Indeed: by discussing the technical detail, I don't mean to imply that
I'd actually support doing that, at least not without a great deal of
careful thought and planning, and the availability of well-defended
client software.


Same here. Like a lot of things, the idea's nice, but I seriously doubt
there ever being an intelligent application. "They" could have
implemented HTML mail, that way, right from the start. But they didn't,
nor probably ever will. I doubt they see what was wrong with how they
did HTML mail (i.e. using HTML in a way that produces an effect, but
with no meaning - a la the typical misuse of HTML).

It only takes a few minutes of intelligent thought to realise that if
you were quoting a block of text, that you'd use a blockquote element;
and that if you wanted to do anything more special with it, you do that
inside of the blockquote (be that HTML or styling).

--
My "from" address is totally fake. (Hint: If I wanted e-mails from
complete strangers, I'd have put a real one, there.) Reply to usenet
postings in the same place as you read the message you're replying to.
Jul 20 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: MuffinMan | last post by:
Hi, I'm trying to create a function which gives the total of time. Now the problem is that the function resets the int. $hour. How can I get them to be saved... I tried global $hour etc... but...
14
by: Fabian | last post by:
function getImage(id,x,y) { if (mpdef == undefined) { // set to default image document.getElementById(id).src = terrdef; } else { document.getElementById(id).src = terrdef ]; } return; }
39
by: Holly | last post by:
I'm trying to validate my code and I can't figure out what kind of doctype I have. The validator can't tell me anything because it can't move beyond the doctype declaration. ...
23
by: James Aguilar | last post by:
Someone showed me something today that I didn't understand. This doesn't seem like it should be valid C++. Specifically, I don't understand how the commas are accepted after the function...
2
by: rked | last post by:
I get nameSPAN1 is undefined when I place cursor in comments box.. <%@ LANGUAGE="VBScript" %> <% DIM ipAddress ipAddress=Request.Servervariables("REMOTE_HOST") %> <html> <head> <meta...
13
by: Arie Mijnlieff | last post by:
Hi ! I have an html file (http://www.kpc.nl/home.html) which i send to the w3 validator as well as to a an online HTML tidy script. The w3 validator (validator.w3.org) claims the frameset tag...
12
by: thomas_jedenfelt_1 | last post by:
Hi everyone, Is the W3C HTML Validator in error when it returns <br /> as valid for HTML 4.01 Strict doctype? In March 2004 , the Validator returned <br />, <hr /> and <img /> as invalid for...
6
by: scottyman | last post by:
I can't make this script work properly. I've gone as far as I can with it and the rest is out of my ability. I can do some html editing but I'm lost in the Java world. The script at the bottom of...
9
by: Wayne Smith | last post by:
I've come up against a major headache that I can't seem to find a solution for but I'm sure there must be a workaround and I would really be grateful of any help. I'm currently building a web...
9
by: Steve | last post by:
Hi; I've being going through some legacy code on an old JSP site I have been patching. I noticed that when I save the JSP down to my PC as an HTML file I get this javascript error in IE 6 ( ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.