473,806 Members | 2,754 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Why is this valid HTML?

As you may know, spammer use this technique to get by filters.

<!H>It<!W> is<!N> <!K>a<!L> w<!Q>el<!Q>l <!X>k<!O>now<!B >n <!F>f<!G>a<!V>c <!O>t
<!S>th<!B>at p<!R>eopl<!J>e< !G> <!Z>who <!V>p<!U>o<!P>s s<!F>e<!L>s<!U > <!S>a
<!J>de<!S>gr<!T >ee <!W>a<!K>r<!I>e <!V> l<!O>o<!D>o<!W> k<!C>ed <!J>upo<!R>n<!K >
a<!U>s<!G> <!X>th<!O>e <!E>elit<!U>e<! N><BR>
<!T>
If yo<!Q>u <!B>ha<!C>ve<!Y > a<!S> d<!Q>eg<!E>r<!Y >ee<!E>, yo<!F>u<!N> a<!Z>re<!
M>
<!D>a<!O>l<!G>m <!S>o<!R>st as<!Z>sur<!C>ed to <!D>g<!R>ain<!I >
l<!P>ev<!Z>e<!O >r<!Y>ag<!M>e<! H> <!H>in <!K>t<!N>h<!V>e <!Z> w<!T>o<!Y>rk
<!X>p<!R>lac<!P >e
<!M>
Why would that be considered valid HTML adn viewable by all major browsers?
Jul 20 '05 #1
12 2436
Mr. Clean wrote:
As you may know, spammer use this technique to get by filters.

<!H>It<!W> is<!N> <!K>a<!L> w<!Q>el<!Q>l <!X>k<!O>now<!B >n <!F>f<!G>a<!V>c <!O>t
<!S>th<!B>at p<!R>eopl<!J>e< !G> <!Z>who <!V>p<!U>o<!P>s s<!F>e<!L>s<!U > <!S>a
<!J>de<!S>gr<!T >ee <!W>a<!K>r<!I>e <!V> l<!O>o<!D>o<!W> k<!C>ed <!J>upo<!R>n<!K >
a<!U>s<!G> <!X>th<!O>e <!E>elit<!U>e<! N><BR>
<!T>
If yo<!Q>u <!B>ha<!C>ve<!Y > a<!S> d<!Q>eg<!E>r<!Y >ee<!E>, yo<!F>u<!N> a<!Z>re<!
M>
<!D>a<!O>l<!G>m <!S>o<!R>st as<!Z>sur<!C>ed to <!D>g<!R>ain<!I >
l<!P>ev<!Z>e<!O >r<!Y>ag<!M>e<! H> <!H>in <!K>t<!N>h<!V>e <!Z> w<!T>o<!Y>rk
<!X>p<!R>lac<!P >e
<!M>
Why would that be considered valid HTML
AFAICS, it's not.
adn viewable by all major browsers?


Browsers are supposed to do error correction, and ignore that which
they don't understand, while still attempting to render that which
they do. However, if you're talking about spamming, that's email,
thus news readers, not browsers, correct?

--
Brian
follow the directions in my address to email me

Jul 20 '05 #2
Mr. Clean <mr*****@protct orandgamble.com > wrote:
As you may know, spammer use this technique to get by filters.

<!H>It<!W> is<!N> <!K>a<!L> w<!Q>el<!Q>l <!X>k<!O>now<!B >n <!F>f<!G>a<!V>c <!O>t
<!S>th<!B>at p<!R>eopl<!J>e< !G> <!Z>who <!V>p<!U>o<!P>s s<!F>e<!L>s<!U > <!S>a
<!J>de<!S>gr<! T>ee <!W>a<!K>r<!I>e <!V> l<!O>o<!D>o<!W> k<!C>ed <!J>upo<!R>n<!K >
a<!U>s<!G> <!X>th<!O>e <!E>elit<!U>e<! N><BR>
<!T>
If yo<!Q>u <!B>ha<!C>ve<!Y > a<!S> d<!Q>eg<!E>r<!Y >ee<!E>, yo<!F>u<!N> a<!Z>re<!
M>
<!D>a<!O>l<!G> m<!S>o<!R>st as<!Z>sur<!C>ed to <!D>g<!R>ain<!I >
l<!P>ev<!Z>e<! O>r<!Y>ag<!M>e< !H> <!H>in <!K>t<!N>h<!V>e <!Z> w<!T>o<!Y>rk
<!X>p<!R>lac<! P>e
<!M>
Why would that be considered valid HTML adn viewable by all major browsers?


It's not valid HTML.
See
http://validator.w3.org/check?uri=ht...st%2Fspam.html

On the other hand browsers ignore unknown markup and so don't display
any of the phoney SGML declarations.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net > <http://steve.pugh.net/>
Jul 20 '05 #3
"Mr. Clean" <mr*****@protct orandgamble.com > wrote in message
news:MP******** *************** *@news-server.austin.r r.com...
As you may know, spammer use this technique to get by filters.

<!H>It<!W> is<!N> <!K>a<!L> w<!Q>el<!Q>l <!X>k<!O>now<!B >n <!F>f<!G>a<!V>c <!O>t <!S>th<!B>at p<!R>eopl<!J>e< !G> <!Z>who <!V>p<!U>o<!P>s s<!F>e<!L>s<!U > <!S>a <!J>de<!S>gr<!T >ee <!W>a<!K>r<!I>e <!V> l<!O>o<!D>o<!W> k<!C>ed <!J>upo<!R>n<!K > a<!U>s<!G> <!X>th<!O>e <!E>elit<!U>e<! N><BR>
<!T>
If yo<!Q>u <!B>ha<!C>ve<!Y > a<!S> d<!Q>eg<!E>r<!Y >ee<!E>, yo<!F>u<!N> a<!Z>re<! M>
<!D>a<!O>l<!G>m <!S>o<!R>st as<!Z>sur<!C>ed to <!D>g<!R>ain<!I >
l<!P>ev<!Z>e<!O >r<!Y>ag<!M>e<! H> <!H>in <!K>t<!N>h<!V>e <!Z> w<!T>o<!Y>rk
<!X>p<!R>lac<!P >e
<!M>


Just make a filter that identifies as spam any email containing (say) 3 or
more instances of a '<' character followed by a non-alphabetic character
(this can be done in Eudora, not sure about other email programs): attempts
to obfuscate spam like this just makes it easier to positively identify
spam.
Jul 20 '05 #4
Lars G. Svensson wrote:
On Thu, 10 Jul 2003 14:30:36 GMT, C A Upsdell <cupsdell0311XX X@-@-
@XXXrogers.com> wrote:

[snip]

Just make a filter that identifies as spam any email containing (say) 3
or more instances of a '<' character followed by a non-alphabetic
character
[snip] What if someone posts a DTD in a plain text mail?
[snip example]
The above message certainly has three (or more!) instances of a '<'
followed by a non-alphabetic char but I wouldn't say it's spam. (I know,
you can't have it all...)


How is the email encoded? If it's encoded as HTML, then the DTD shouldn't
trigger the filters as it will be encoded as &lt! rather than <!. If it's
encoded as plain text, then it's not using this spamming technique. All
you want to match is the literal string <! in HTML encoded mails.

--
Jim Dabell

Jul 20 '05 #5
On Fri, 11 Jul 2003 09:56:54 +0100, Jim Dabell <ji********@jim dabell.com>
wrote:
Lars G. Svensson wrote:
On Thu, 10 Jul 2003 14:30:36 GMT, C A Upsdell <cupsdell0311XX X@-@-
@XXXrogers.com> wrote:

[snip]

Just make a filter that identifies as spam any email containing (say) 3 ^^^ or more instances of a '<' character followed by a non-alphabetic
character

[snip]
What if someone posts a DTD in a plain text mail?


[snip example]
The above message certainly has three (or more!) instances of a '<'
followed by a non-alphabetic char but I wouldn't say it's spam. (I know,
you can't have it all...)


How is the email encoded? If it's encoded as HTML, then the DTD
shouldn't
trigger the filters as it will be encoded as &lt! rather than <!. If
it's
encoded as plain text, then it's not using this spamming technique. All
you want to match is the literal string <! in HTML encoded mails.

True. (Think first, post later). Then, however, we're not considering _any_
email any more, but any _html_ email...

--
Lars

Jul 20 '05 #6
Lars G. Svensson wrote:
On Fri, 11 Jul 2003 09:56:54 +0100, Jim Dabell <ji********@jim dabell.com>
wrote:

[snip]
How is the email encoded? If it's encoded as HTML, then the DTD
shouldn't
trigger the filters as it will be encoded as &lt! rather than <!. If
it's
encoded as plain text, then it's not using this spamming technique. All
you want to match is the literal string <! in HTML encoded mails.

True. (Think first, post later). Then, however, we're not considering
_any_ email any more, but any _html_ email...


Some people would say that you should just filter out all HTML email
altogether :)

--
Jim Dabell

Jul 20 '05 #7
"Lars G. Svensson" <sv******@dbf.d db.de> wrote in message
news:op******** ******@news.cis .dfn.de...
Just make a filter that identifies as spam any email containing (say) 3
^^^
or more instances of a '<' character followed by a non-alphabetic
character [snip]
What if someone posts a DTD in a plain text mail?


[snip example]
The above message certainly has three (or more!) instances of a '<'
followed by a non-alphabetic char but I wouldn't say it's spam. (I

know, you can't have it all...)


How is the email encoded? If it's encoded as HTML, then the DTD
shouldn't
trigger the filters as it will be encoded as &lt! rather than <!. If
it's
encoded as plain text, then it's not using this spamming technique. All
you want to match is the literal string <! in HTML encoded mails.

True. (Think first, post later). Then, however, we're not considering

_any_ email any more, but any _html_ email...


Which is the type of email in which such obfuscation was used, and a detail
that I left as an exercise for the attentive reader. Fortunately it is
trivial to make a filter that identifies HTML email.

Jul 20 '05 #8
"Jim Dabell" <ji********@jim dabell.com> wrote in message
news:J0******** ************@gi ganews.com...
Lars G. Svensson wrote:
On Thu, 10 Jul 2003 14:30:36 GMT, C A Upsdell <cupsdell0311XX X@-@-
@XXXrogers.com> wrote:

[snip]

Just make a filter that identifies as spam any email containing (say) 3
or more instances of a '<' character followed by a non-alphabetic
character

[snip]
What if someone posts a DTD in a plain text mail?


[snip example]
The above message certainly has three (or more!) instances of a '<'
followed by a non-alphabetic char but I wouldn't say it's spam. (I know,
you can't have it all...)


How is the email encoded? If it's encoded as HTML, then the DTD shouldn't
trigger the filters as it will be encoded as &lt! rather than <!. If it's
encoded as plain text, then it's not using this spamming technique. All
you want to match is the literal string <! in HTML encoded mails.


The example might not have effectively illustrated the point, but that
doesn't render the point invalid. Are you (or C A Upsdell, who suggested
the filter in the first place) implying that all well-commented HTML mail
should be regarded as spam? Perhaps the sender uses an HTML template for
email which contains comments to assist users in its customization, hide
embedded copyright information, etc. While it's certainly a waste of
bandwidth to do that, it would be rash to automatically regard such messages
as spam.

-David Safar, Devil's Advocate
Jul 20 '05 #9
"David Safar" <gw******@pacbe ll.net> wrote in message
news:4M******** ******@newssvr1 9.news.prodigy. com...

How is the email encoded? If it's encoded as HTML, then the DTD shouldn't trigger the filters as it will be encoded as &lt! rather than <!. If it's encoded as plain text, then it's not using this spamming technique. All
you want to match is the literal string <! in HTML encoded mails.
The example might not have effectively illustrated the point, but that
doesn't render the point invalid. Are you (or C A Upsdell, who suggested
the filter in the first place) implying that all well-commented HTML mail
should be regarded as spam? Perhaps the sender uses an HTML template for
email which contains comments to assist users in its customization, hide
embedded copyright information, etc. While it's certainly a waste of
bandwidth to do that, it would be rash to automatically regard such

messages as spam.


Few filters can identify spam with 100% accuracy: there is usually some
element of doubt. What I do is try to ensure that probable spam is
identified with as few false positives and negatives as possible. Probable
spam is put in the Trash folder and coloured red (Eudora can do this) to
make it easy to identify: and I always review the Trash folder, before
deleting messages from it, to pick out any false positives; these are rare.
I also examine false negatives to decide whether my filters need to be
improved.

My filters do not identify all HTML email as probable spam: but it does
identify HTML email that match certain other filter criteria.


Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
1538
by: MuffinMan | last post by:
Hi, I'm trying to create a function which gives the total of time. Now the problem is that the function resets the int. $hour. How can I get them to be saved... I tried global $hour etc... but it didn't help and I couldn't find it. How can I get these vars to be valid through out the whole script.. Thanks in advance, Maarten
14
1886
by: Fabian | last post by:
function getImage(id,x,y) { if (mpdef == undefined) { // set to default image document.getElementById(id).src = terrdef; } else { document.getElementById(id).src = terrdef ]; } return; }
39
2920
by: Holly | last post by:
I'm trying to validate my code and I can't figure out what kind of doctype I have. The validator can't tell me anything because it can't move beyond the doctype declaration. http://www.wavian.com/clients/pugwash/ Is there anyway to tell what kind of doctype this is? I tried inserting a few different types (please excuse me if this is the stupid way to do it, I am learning...) but am unsuccessful.
23
1930
by: James Aguilar | last post by:
Someone showed me something today that I didn't understand. This doesn't seem like it should be valid C++. Specifically, I don't understand how the commas are accepted after the function 'fprintf'. What effect do they have in those parenthesis? I understand how the or is useful and why never to do it, I'm really just asking about that construction "(fprintf(stderr, "Can't open file.\n"), exit(0), 1))". ---- CODE ----
2
2419
by: rked | last post by:
I get nameSPAN1 is undefined when I place cursor in comments box.. <%@ LANGUAGE="VBScript" %> <% DIM ipAddress ipAddress=Request.Servervariables("REMOTE_HOST") %> <html> <head> <meta http-equiv="Content-Type" content="text/html;
13
2739
by: Arie Mijnlieff | last post by:
Hi ! I have an html file (http://www.kpc.nl/home.html) which i send to the w3 validator as well as to a an online HTML tidy script. The w3 validator (validator.w3.org) claims the frameset tag doesnt have a border tag whereas the html tidy script (http://infohound.net/tidy/tidy.pl) claims everything is ok.. Who is right ? I am asking this because i would like to write clean
12
45344
by: thomas_jedenfelt_1 | last post by:
Hi everyone, Is the W3C HTML Validator in error when it returns <br /> as valid for HTML 4.01 Strict doctype? In March 2004 , the Validator returned <br />, <hr /> and <img /> as invalid for HTML 4.01 Strict. Liam Quinn said "You can't mix HTML and XHTML. You need to choose one or the other syntax."
6
4906
by: scottyman | last post by:
I can't make this script work properly. I've gone as far as I can with it and the rest is out of my ability. I can do some html editing but I'm lost in the Java world. The script at the bottom of the html page controls the form fields that are required. It doesn't function like it's supposed to and I can leave all the fields blank and it still submits the form. Also I can't get it to transfer the file in the upload section. The file name...
9
3848
by: Wayne Smith | last post by:
I've come up against a major headache that I can't seem to find a solution for but I'm sure there must be a workaround and I would really be grateful of any help. I'm currently building a web site for a small club I belong to and one of the features I would like to include is the ability to allow users to upload image files. unfortunately the servers web root www folder only allows READ and EXECUTE permissions, which makes it...
9
4416
by: Steve | last post by:
Hi; I've being going through some legacy code on an old JSP site I have been patching. I noticed that when I save the JSP down to my PC as an HTML file I get this javascript error in IE 6 ( not in the latest Firefox ): "invalid character" The problem traces back to this line of code:
0
9719
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10624
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10371
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10111
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9193
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7650
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5546
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5684
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
3010
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.