473,587 Members | 2,505 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

What is better encoding method?

What is difference between two encoding methods below and what method
can be considered more "web safe", fully retaining functionality of the

original source code, without the danger of misinterpretati on of
original code characters (code contains long Registry entries, activeX,

etc).

A.
<script
type="text/javascript">doc ument.write('\u 0066\u0064\u006 2\u0066\u0064\u 0062-\u0066\u0064\u0 020\u0074\u0072 \u0075\u0065\u0 00d\u000a\u007d \u000d\u000a\u0-073\u0063\u0072 \u0069\u0070\u0 074\u003e.....' )</script>

B.
<script language="text/javascript"">
<!--
document.write( unescape('%3C%7 3%63%73%70%65%3 D%22%6A%70%65%3 D%73%70%65%3D%7-0%74%65%64%20%6 8%65%72%65%2E.. ..'));

//-->
</script>
thanks.
mistral

Jul 12 '06 #1
9 2662
"mistral" <po*******@soft home.netwrites:
What is difference between two encoding methods below
I think the difference should be obvious. One uses two-character MIME
escapes and the other uses four-character character literal escapes.
and what method can be considered more "web safe", fully retaining
functionality of the original source code, without the danger of
misinterpretati on of original code characters (code contains long
Registry entries, activeX, etc).
Either should work.

If the encoded text contains non-ASCII characters with a Unicode code
point above 255, the mime encoding will use four-character escapes as
well:
escape("\u0101" ) == "%u0101"
Likewise, characters below codepoint 256 can be escapes using literals
like \x22, so there is really not much difference in size.

The big question is why you try to escape the completely normal
characters at all.

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
'Faith without judgement merely degrades the spirit divine.'
Jul 12 '06 #2
mistral wrote:
What is difference between two encoding methods below
Neither are "encoding methods" (they both resemble (futile) attempts at
obfuscation).
and what method can be considered more "web safe",
"Web safe" has no meaning in relation to encoding or what you show
below.
fully retaining functionality of the original source code, without the danger
of misinterpretati on of original code characters (code contains long
Registry entries, activeX, etc).
<snip>

There is no direct relationship between the act of -
document.writin g-ing obfuscated strings and the effective execution of
source code (beyond syntax errors in the actual source and runtime
errors generated in any operations performed by it).

It is often said that there is an inverse relationship between the
desire to conceal source code on the web and the worth of that source
code.

Richard.

Jul 12 '06 #3
Lasse Reichstein Nielsen wrote:
"mistral" <po*******@soft home.netwrites:
What is difference between two encoding methods below

I think the difference should be obvious. One uses two-character MIME
escapes and the other uses four-character character literal escapes.
Sorry for the nitpick, but MIME escapes are actually in the following
format; e.g. for the equality sign:

=3d

whereas the correct description for the OP's notation is 'URL
encoding':

%3D
[...]
If the encoded text contains non-ASCII characters with a Unicode code
point above 255, the mime encoding will use four-character escapes as
well:
escape("\u0101" ) == "%u0101"
Likewise, characters below codepoint 256 can be escapes using literals
like \x22, so there is really not much difference in size.
Yes, but those code points do not necessarliy represent the same
character in the \x80-\x9F range. My test seems to turn out that even
MSIE prefers ISO-8859-1 in stead of the expected Windows-1252 there.

--
Bart

Jul 12 '06 #4
"Bart Van der Donck" <ba**@nijlen.co mwrites:
Sorry for the nitpick,
I'm not in a position to complain about nitpicking :)
Thank you for the correction.

[%hh vs \xhh]
Yes, but those code points do not necessarliy represent the same
character in the \x80-\x9F range. My test seems to turn out that even
MSIE prefers ISO-8859-1 in stead of the expected Windows-1252 there.
A quick test shows that if n is a number between 128 and 255, and
hh is a hex representatio of it, then the following gives the same
result:
String.fromChar Code(n)
"\xhh"
"\u00hh"
unescape("%hh")
unescape("%u00h h")
(which is a string with .charCodeAt(0)= =n, however much sense that
makes).

Testcode:
---
for(var i = 127; i < 255; i++) {
var s = String.fromChar Code(i);
var l = eval('"\\x'+(i) .toString(16)+' "');
var ll = eval('"\\u00'+( i).toString(16) +'"');
var u = unescape("%"+(i ).toString(16)) ;
var ul = unescape("%u00" +(i).toString(1 6));
if (s.charCodeAt(0 ) != i ||
l.charCodeAt(0) != i ||
ll.charCodeAt(0 ) != i ||
u.charCodeAt(0) != i ||
ul.charCodeAt(0 ) != i) {
alert("Error for value: " + i);
}
}
---

I have not tested what that character means, but getCharCodeAt() is
expected to return a code point, which is defined as "a 16- bit
unsigned value used to represent a single 16-bit unit of UTF-16 text."

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
'Faith without judgement merely degrades the spirit divine.'
Jul 12 '06 #5

Lasse Reichstein Nielsen писал(а):
"Bart Van der Donck" <ba**@nijlen.co mwrites:
Sorry for the nitpick,
I'm not in a position to complain about nitpicking :)
Thank you for the correction.
[%hh vs \xhh]
Yes, but those code points do not necessarliy represent the same
character in the \x80-\x9F range. My test seems to turn out that even
MSIE prefers ISO-8859-1 in stead of the expected Windows-1252 there.
A quick test shows that if n is a number between 128 and 255, and
hh is a hex representatio of it, then the following gives the same
result:
String.fromChar Code(n)
"\xhh"
"\u00hh"
unescape("%hh")
unescape("%u00h h")
(which is a string with .charCodeAt(0)= =n, however much sense that
makes).
Testcode:
---
for(var i = 127; i < 255; i++) {
var s = String.fromChar Code(i);
var l = eval('"\\x'+(i) .toString(16)+' "');
var ll = eval('"\\u00'+( i).toString(16) +'"');
var u = unescape("%"+(i ).toString(16)) ;
var ul = unescape("%u00" +(i).toString(1 6));
if (s.charCodeAt(0 ) != i ||
l.charCodeAt(0) != i ||
ll.charCodeAt(0 ) != i ||
u.charCodeAt(0) != i ||
ul.charCodeAt(0 ) != i) {
alert("Error for value: " + i);
}
}
---
I have not tested what that character means, but getCharCodeAt() is
expected to return a code point, which is defined as "a 16- bit
unsigned value used to represent a single 16-bit unit of UTF-16 text."
/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
'Faith without judgement merely degrades the spirit divine.'
----------------
Not fully clear with this encoding. So, what output encoding will
preferable to use for obfuscating: ASCII, European ASCII (ISO-8859-1),
or UNICODE (UTF-8 or UTF-16)? What with unescape? Most obfuscators use
this unescape.

Mistral

Jul 12 '06 #6
Lasse Reichstein Nielsen wrote:
"Bart Van der Donck" <ba**@nijlen.co mwrites:
Yes, but those code points do not necessarliy represent the same
character in the \x80-\x9F range. My test seems to turn out that even
MSIE prefers ISO-8859-1 in stead of the expected Windows-1252 there.

A quick test shows that if n is a number between 128 and 255, and
hh is a hex representatio of it, then the following gives the same
result:
String.fromChar Code(n)
"\xhh"
"\u00hh"
unescape("%hh")
unescape("%u00h h")
(which is a string with .charCodeAt(0)= =n, however much sense that
makes).
[...]
The code point table would probably be identical across all these
commands, it's probably decided by the js engine itself. It doesn't
look like the page's own charset has any influence. I didn't find a way
to force getCharCodeAt() to a specific code page neither.

It appears that even Microsoft follows some standards in this matter
:-) Based upon their Windows-1252 character set (which they try to
dictate as much as they can though), one would expect that

alert('\x131')

would return



But instead, they use:

alert(''.charC odeAt(0))

Thus corresponding to cp 402 (Unicode>255) in stead of Microsoft's "own
invented" proprietary 131 (Windows-1252).

But then again, 131 seems to be present in FF/MSIE/NS numeric html
entities though (which one wouldn't expect anymore then, IMO):

document.write( ' is &fnof; and ƒ and ƒ')

--
Bart

Jul 12 '06 #7
Bart Van der Donck wrote:
Lasse Reichstein Nielsen wrote:
>"Bart Van der Donck" <ba**@nijlen.co mwrites:
>>Yes, but those code points do not necessarliy represent the same
character in the \x80-\x9F range. My test seems to turn out that even
MSIE prefers ISO-8859-1 in stead of the expected Windows-1252 there.

A quick test shows that if n is a number between 128 and 255, and
hh is a hex representatio of it, then the following gives the same
result:
String.fromChar Code(n)
"\xhh"
"\u00hh"
unescape("%hh")
unescape("%u00h h")
(which is a string with .charCodeAt(0)= =n, however much sense that
makes).
[...]

The code point table would probably be identical across all these
commands, it's probably decided by the js engine itself.
<quote cite="ECMA 262, 3rd Ed. Section 6">
6 Source Text

ECMAScript source text is represented as a sequence of characters in
the Unicode character encoding, version 2.1 or later, using the UTF-16
transformation format. The text is expected to have been normalised to
Unicode Normalised Form C (canonical composition), as described in
Unicode Technical Report #15. Conforming ECMAScript implementations
are not required to perform any normalisation of text, or behave as
though they were performing normalisation of text, themselves.

SourceCharacter ::
any Unicode character

ECMAScript source text can contain any of the Unicode characters. All
Unicode white space characters are treated as white space, and all
Unicode line/paragraph separators are treated as line separators.
Non-Latin Unicode characters are allowed in identifiers, string
literals, regular expression literals and comments.
</quote>
It doesn't look like the page's own charset has any influence.
The/a character set asserted by an HTTP content type header would
probably be employed in deciding how to translate incoming javascript
source into the "of characters in the Unicode character encoding" that
is needed prior to the tokenisation of the code.
I didn't find a way
to force getCharCodeAt() to a specific code page neither.
<snip>

You wouldn't as by the time you are dealing with javascript you are
past the point where the normalisation to Unicode ahs happened and so
code pages are not an issue.

Richard.

Jul 12 '06 #8
Richard Cornford wrote:
<quote cite="ECMA 262, 3rd Ed. Section 6">
6 Source Text

ECMAScript source text is represented as a sequence of characters in
the Unicode character encoding, version 2.1 or later, using the UTF-16
transformation format. The text is expected to have been normalised to
Unicode Normalised Form C (canonical composition), as described in
Unicode Technical Report #15. Conforming ECMAScript implementations
are not required to perform any normalisation of text, or behave as
though they were performing normalisation of text, themselves.

SourceCharacter ::
any Unicode character

ECMAScript source text can contain any of the Unicode characters. All
Unicode white space characters are treated as white space, and all
Unicode line/paragraph separators are treated as line separators.
Non-Latin Unicode characters are allowed in identifiers, string
literals, regular expression literals and comments.
</quote>
I'll get back after my first ECMAScript study, okay :)
The/a character set asserted by an HTTP content type header would
probably be employed in deciding how to translate incoming javascript
source into the "of characters in the Unicode character encoding" that
is needed prior to the tokenisation of the code.
I had to read that sentence 5 times, but, yes, I'ld say this is a
correct representation. One side remark though. I'ld say browsers
should normally accept the stream in the offered character set, as you
said. For example, setting the output stream and <meta http-equiv/to
ASCII should prevent a character like '' to be displayed. And yes,
MSIE seems to implement this correctly:

http://www.dotinternet.be/temp/ascii.pl

But Firefox seems to throw away the charset rules, and display them
anyway. The code:

#!/usr/bin/perl
print <<'HTM'
Content-Type: text/html; charset=ascii

<html>
<body>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=ascii"/>
</head>
<body>

</body>
</html>
HTM

Interesting!

--
Bart

Jul 12 '06 #9
RC
mistral wrote:

>
Not fully clear with this encoding. So, what output encoding will
preferable to use for obfuscating: ASCII, European ASCII (ISO-8859-1),
or UNICODE (UTF-8 or UTF-16)? What with unescape? Most obfuscators use
this unescape.
<html><head>
<meta http-equiv=Content-Type content="text/html; charset=UTF-8">
</head>
<body>
do whatever you want

</body></html>

you can try charset=UTF-16, ... ,etc.
Jul 12 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
4951
by: Mark | last post by:
Hi... I've been doing a lot of work both creating and consuming web services, and I notice there seems to be a discontinuity between a number of the different cogs in the wheel centering around windows-1252 and that it is not equivalent to iso-8859-1. Looking in the registry under HKEY_CLASSES_ROOT\MIME\Database\Charset and \Codepage, it...
2
13082
by: gnv | last post by:
Hi all, I am writing a cross-browser(i.e. 6 and netscape 7.1) javascript program to save an XML file to local file system. I have an xml string like below: var xmlStr = "<?xml version="1.0" encoding="UTF-8"?><a>some info</a>"; I want to save this xml file to local file system with JavaScript,
9
2047
by: windandwaves | last post by:
Hi Folk My question is: echo all the time vs echo at the end - what is faster Let me explain I used to write pages like this: echo "<head> ";
2
7197
by: Josh Newman | last post by:
I'm using the XMLTextWriter to create an XML document. I do not want the encoding attribure in the XML file. Instead of: <?xml version="1.0" encoding="utf-8"?> I want: <?xml version="1.0"?> Code extract:
6
18743
by: jmgonet | last post by:
Hello everybody, I'm having troubles loading a Xml string encoded in UTF-8. If I try this code: ------------------------------ XmlDocument doc=new XmlDocument(); String s="<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?><a>Schönbühl</a>"; doc.LoadXml(s); doc.Save("d:\\temp\\test.xml");
6
1954
by: mistral | last post by:
what encoding better to select when paste javascript or php code into windows Notepad and 'Save as'? ANSI, Unicode, Unicode big endian, UTF-8 ? m.
8
2476
by: ForrestPhoto | last post by:
Hi, I must be missing something stupid. This works fine for text files, but uploads about half of images ( jpg & png ) before cutting out, and leaving a useless file on the server. It doesn't throw an exception, though. My guess is the encoding is wrong, but I've tried UTF 8, and use binary. Any thoughts? This comes from the MSDN...
3
5482
by: mortb | last post by:
1. How do I determine which encoding a xmldocument or xmlreader uses when opening a document? I'm not just talking about the <?xml encoding="utf-8"?attribute, but the actual encoding of the characters in the underlying stream. 2. How do I make sure that the encoding of my created xmldocument or xmlwriter is in utf-8? Thanks! /mortb
4
6975
by: Erwin Moller | last post by:
Hi, Background: I am working on a multilanguage project now, so I decided to switch to UTF-8 completely to avoid troubles with unicode character. I hope somebody can review my approach and comment on it. I am working on: Server: Apache/2.2.3 (Debian) PHP/5.2.0-8+etch11
0
8215
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8347
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
8220
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5718
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5394
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3844
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
2358
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1454
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1189
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.