What is better encoding method?

mistral

What is difference between two encoding methods below and what method
can be considered more "web safe", fully retaining functionality of the

original source code, without the danger of misinterpretati on of
original code characters (code contains long Registry entries, activeX,

etc).

A.
<script
type="text/javascript">doc ument.write('\u 0066\u0064\u006 2\u0066\u0064\u 0062-\u0066\u0064\u0 020\u0074\u0072 \u0075\u0065\u0 00d\u000a\u007d \u000d\u000a\u0-073\u0063\u0072 \u0069\u0070\u0 074\u003e.....' )</script>

B.
<script language="text/javascript"">

</script>
thanks.
mistral

Jul 12 '06 #1

Subscribe Reply

2662

Lasse Reichstein Nielsen

"mistral" <po*******@soft home.netwrites:

What is difference between two encoding methods below

I think the difference should be obvious. One uses two-character MIME
escapes and the other uses four-character character literal escapes.

and what method can be considered more "web safe", fully retaining
functionality of the original source code, without the danger of
misinterpretati on of original code characters (code contains long
Registry entries, activeX, etc).

Either should work.

If the encoded text contains non-ASCII characters with a Unicode code
point above 255, the mime encoding will use four-character escapes as
well:
escape("\u0101" ) == "%u0101"
Likewise, characters below codepoint 256 can be escapes using literals
like \x22, so there is really not much difference in size.

The big question is why you try to escape the completely normal
characters at all.

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
'Faith without judgement merely degrades the spirit divine.'

Jul 12 '06 #2

Richard Cornford

mistral wrote:

What is difference between two encoding methods below

Neither are "encoding methods" (they both resemble (futile) attempts at
obfuscation).

and what method can be considered more "web safe",

"Web safe" has no meaning in relation to encoding or what you show
below.

fully retaining functionality of the original source code, without the danger
of misinterpretati on of original code characters (code contains long
Registry entries, activeX, etc).

<snip>

There is no direct relationship between the act of -
document.writin g-ing obfuscated strings and the effective execution of
source code (beyond syntax errors in the actual source and runtime
errors generated in any operations performed by it).

It is often said that there is an inverse relationship between the
desire to conceal source code on the web and the worth of that source
code.

Richard.

Jul 12 '06 #3

Bart Van der Donck

Lasse Reichstein Nielsen wrote:

"mistral" <po*******@soft home.netwrites:
What is difference between two encoding methods below

I think the difference should be obvious. One uses two-character MIME
escapes and the other uses four-character character literal escapes.

Sorry for the nitpick, but MIME escapes are actually in the following
format; e.g. for the equality sign:

=3d

whereas the correct description for the OP's notation is 'URL
encoding':

%3D

[...]
If the encoded text contains non-ASCII characters with a Unicode code
point above 255, the mime encoding will use four-character escapes as
well:
escape("\u0101" ) == "%u0101"
Likewise, characters below codepoint 256 can be escapes using literals
like \x22, so there is really not much difference in size.

Yes, but those code points do not necessarliy represent the same
character in the \x80-\x9F range. My test seems to turn out that even
MSIE prefers ISO-8859-1 in stead of the expected Windows-1252 there.

--
Bart

Jul 12 '06 #4

Lasse Reichstein Nielsen

"Bart Van der Donck" <ba**@nijlen.co mwrites:

Sorry for the nitpick,

I'm not in a position to complain about nitpicking :)
Thank you for the correction.

[%hh vs \xhh]

Yes, but those code points do not necessarliy represent the same
character in the \x80-\x9F range. My test seems to turn out that even
MSIE prefers ISO-8859-1 in stead of the expected Windows-1252 there.

A quick test shows that if n is a number between 128 and 255, and
hh is a hex representatio of it, then the following gives the same
result:
String.fromChar Code(n)
"\xhh"
"\u00hh"
unescape("%hh")
unescape("%u00h h")
(which is a string with .charCodeAt(0)= =n, however much sense that
makes).

Testcode:
---
for(var i = 127; i < 255; i++) {
var s = String.fromChar Code(i);
var l = eval('"\\x'+(i) .toString(16)+' "');
var ll = eval('"\\u00'+( i).toString(16) +'"');
var u = unescape("%"+(i ).toString(16)) ;
var ul = unescape("%u00" +(i).toString(1 6));
if (s.charCodeAt(0 ) != i ||
l.charCodeAt(0) != i ||
ll.charCodeAt(0 ) != i ||
u.charCodeAt(0) != i ||
ul.charCodeAt(0 ) != i) {
alert("Error for value: " + i);
}
}
---

I have not tested what that character means, but getCharCodeAt() is
expected to return a code point, which is defined as "a 16- bit
unsigned value used to represent a single 16-bit unit of UTF-16 text."

/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
'Faith without judgement merely degrades the spirit divine.'

Jul 12 '06 #5

mistral

Lasse Reichstein Nielsen Ð¿Ð¸ÑÐ°Ð»(Ð°):

"Bart Van der Donck" <ba**@nijlen.co mwrites:

Sorry for the nitpick,

I'm not in a position to complain about nitpicking :)
Thank you for the correction.

[%hh vs \xhh]
Yes, but those code points do not necessarliy represent the same
character in the \x80-\x9F range. My test seems to turn out that even
MSIE prefers ISO-8859-1 in stead of the expected Windows-1252 there.

A quick test shows that if n is a number between 128 and 255, and
hh is a hex representatio of it, then the following gives the same
result:
String.fromChar Code(n)
"\xhh"
"\u00hh"
unescape("%hh")
unescape("%u00h h")
(which is a string with .charCodeAt(0)= =n, however much sense that
makes).

Testcode:

---
for(var i = 127; i < 255; i++) {
var s = String.fromChar Code(i);
var l = eval('"\\x'+(i) .toString(16)+' "');
var ll = eval('"\\u00'+( i).toString(16) +'"');
var u = unescape("%"+(i ).toString(16)) ;
var ul = unescape("%u00" +(i).toString(1 6));
if (s.charCodeAt(0 ) != i ||
l.charCodeAt(0) != i ||
ll.charCodeAt(0 ) != i ||
u.charCodeAt(0) != i ||
ul.charCodeAt(0 ) != i) {
alert("Error for value: " + i);
}
}
---

I have not tested what that character means, but getCharCodeAt() is
expected to return a code point, which is defined as "a 16- bit
unsigned value used to represent a single 16-bit unit of UTF-16 text."

/L

Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
'Faith without judgement merely degrades the spirit divine.'

----------------
Not fully clear with this encoding. So, what output encoding will
preferable to use for obfuscating: ASCII, European ASCII (ISO-8859-1),
or UNICODE (UTF-8 or UTF-16)? What with unescape? Most obfuscators use
this unescape.

Mistral

Jul 12 '06 #6

Bart Van der Donck

Lasse Reichstein Nielsen wrote:

"Bart Van der Donck" <ba**@nijlen.co mwrites:
Yes, but those code points do not necessarliy represent the same
character in the \x80-\x9F range. My test seems to turn out that even
MSIE prefers ISO-8859-1 in stead of the expected Windows-1252 there.

A quick test shows that if n is a number between 128 and 255, and
hh is a hex representatio of it, then the following gives the same
result:
String.fromChar Code(n)
"\xhh"
"\u00hh"
unescape("%hh")
unescape("%u00h h")
(which is a string with .charCodeAt(0)= =n, however much sense that
makes).
[...]

The code point table would probably be identical across all these
commands, it's probably decided by the js engine itself. It doesn't
look like the page's own charset has any influence. I didn't find a way
to force getCharCodeAt() to a specific code page neither.

It appears that even Microsoft follows some standards in this matter
:-) Based upon their Windows-1252 character set (which they try to
dictate as much as they can though), one would expect that

alert('\x131')

would return

ƒ

But instead, they use:

alert('ƒ'.charC odeAt(0))

Thus corresponding to cp 402 (Unicode>255) in stead of Microsoft's "own
invented" proprietary 131 (Windows-1252).

But then again, 131 seems to be present in FF/MSIE/NS numeric html
entities though (which one wouldn't expect anymore then, IMO):

document.write( 'ƒ is &fnof; and ƒ and ƒ')

--
Bart

Jul 12 '06 #7

Richard Cornford

Bart Van der Donck wrote:

Lasse Reichstein Nielsen wrote:
>"Bart Van der Donck" <ba**@nijlen.co mwrites:
>>Yes, but those code points do not necessarliy represent the same
character in the \x80-\x9F range. My test seems to turn out that even
MSIE prefers ISO-8859-1 in stead of the expected Windows-1252 there.

A quick test shows that if n is a number between 128 and 255, and
hh is a hex representatio of it, then the following gives the same
result:
String.fromChar Code(n)
"\xhh"
"\u00hh"
unescape("%hh")
unescape("%u00h h")
(which is a string with .charCodeAt(0)= =n, however much sense that
makes).
[...]

The code point table would probably be identical across all these
commands, it's probably decided by the js engine itself.

<quote cite="ECMA 262, 3rd Ed. Section 6">
6 Source Text

ECMAScript source text is represented as a sequence of characters in
the Unicode character encoding, version 2.1 or later, using the UTF-16
transformation format. The text is expected to have been normalised to
Unicode Normalised Form C (canonical composition), as described in
Unicode Technical Report #15. Conforming ECMAScript implementations
are not required to perform any normalisation of text, or behave as
though they were performing normalisation of text, themselves.

SourceCharacter ::
any Unicode character

ECMAScript source text can contain any of the Unicode characters. All
Unicode white space characters are treated as white space, and all
Unicode line/paragraph separators are treated as line separators.
Non-Latin Unicode characters are allowed in identifiers, string
literals, regular expression literals and comments.
</quote>

It doesn't look like the page's own charset has any influence.

The/a character set asserted by an HTTP content type header would
probably be employed in deciding how to translate incoming javascript
source into the "of characters in the Unicode character encoding" that
is needed prior to the tokenisation of the code.

I didn't find a way
to force getCharCodeAt() to a specific code page neither.

<snip>

You wouldn't as by the time you are dealing with javascript you are
past the point where the normalisation to Unicode ahs happened and so
code pages are not an issue.

Richard.

Jul 12 '06 #8

Bart Van der Donck

Richard Cornford wrote:

<quote cite="ECMA 262, 3rd Ed. Section 6">
6 Source Text

ECMAScript source text is represented as a sequence of characters in
the Unicode character encoding, version 2.1 or later, using the UTF-16
transformation format. The text is expected to have been normalised to
Unicode Normalised Form C (canonical composition), as described in
Unicode Technical Report #15. Conforming ECMAScript implementations
are not required to perform any normalisation of text, or behave as
though they were performing normalisation of text, themselves.

SourceCharacter ::
any Unicode character

ECMAScript source text can contain any of the Unicode characters. All
Unicode white space characters are treated as white space, and all
Unicode line/paragraph separators are treated as line separators.
Non-Latin Unicode characters are allowed in identifiers, string
literals, regular expression literals and comments.
</quote>

I'll get back after my first ECMAScript study, okay :)

The/a character set asserted by an HTTP content type header would
probably be employed in deciding how to translate incoming javascript
source into the "of characters in the Unicode character encoding" that
is needed prior to the tokenisation of the code.

I had to read that sentence 5 times, but, yes, I'ld say this is a
correct representation. One side remark though. I'ld say browsers
should normally accept the stream in the offered character set, as you
said. For example, setting the output stream and <meta http-equiv/to
ASCII should prevent a character like 'é' to be displayed. And yes,
MSIE seems to implement this correctly:

http://www.dotinternet.be/temp/ascii.pl

But Firefox seems to throw away the charset rules, and display them
anyway. The code:

#!/usr/bin/perl
print <<'HTM'
Content-Type: text/html; charset=ascii

<html>
<body>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=ascii"/>
</head>
<body>
é
</body>
</html>
HTM

Interesting!

--
Bart

Jul 12 '06 #9

mistral wrote:

>
Not fully clear with this encoding. So, what output encoding will
preferable to use for obfuscating: ASCII, European ASCII (ISO-8859-1),
or UNICODE (UTF-8 or UTF-16)? What with unescape? Most obfuscators use
this unescape.

<html><head>
<meta http-equiv=Content-Type content="text/html; charset=UTF-8">
</head>
<body>
do whatever you want

</body></html>

you can try charset=UTF-16, ... ,etc.

Jul 12 '06 #10

Similar topics

4951

xml, character encoding, asp question

by: Mark | last post by:

Hi... I've been doing a lot of work both creating and consuming web services, and I notice there seems to be a discontinuity between a number of the different cogs in the wheel centering around windows-1252 and that it is not equivalent to iso-8859-1. Looking in the registry under HKEY_CLASSES_ROOT\MIME\Database\Charset and \Codepage, it...

ASP / Active Server Pages

13082

String encoding Converting and Save File Problem in IE

by: gnv | last post by:

Hi all, I am writing a cross-browser(i.e. 6 and netscape 7.1) javascript program to save an XML file to local file system. I have an xml string like below: var xmlStr = "<?xml version="1.0" encoding="UTF-8"?><a>some info</a>"; I want to save this xml file to local file system with JavaScript,

Javascript

2047

echo all the time vs echo at the end - what is faster

by: windandwaves | last post by:

Hi Folk My question is: echo all the time vs echo at the end - what is faster Let me explain I used to write pages like this: echo "<head> ";

PHP

7197

Suppress encoding attribute - XMLTextWriter

by: Josh Newman | last post by:

I'm using the XMLTextWriter to create an XML document. I do not want the encoding attribure in the XML file. Instead of: <?xml version="1.0" encoding="utf-8"?> I want: <?xml version="1.0"?> Code extract:

.NET Framework

18743

LoadXML and UTF-8 encoding

by: jmgonet | last post by:

Hello everybody, I'm having troubles loading a Xml string encoded in UTF-8. If I try this code: ------------------------------ XmlDocument doc=new XmlDocument(); String s="<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?><a>SchÃ¶nbÃ¼hl</a>"; doc.LoadXml(s); doc.Save("d:\\temp\\test.xml");

.NET Framework

1954

What encoding better to select ?

by: mistral | last post by:

what encoding better to select when paste javascript or php code into windows Notepad and 'Save as'? ANSI, Unicode, Unicode big endian, UTF-8 ? m.

Javascript

2476

What's wrong with this FTP code?

by: ForrestPhoto | last post by:

Hi, I must be missing something stupid. This works fine for text files, but uploads about half of images ( jpg & png ) before cutting out, and leaving a useless file on the server. It doesn't throw an exception, though. My guess is the encoding is wrong, but I've tried UTF 8, and use binary. Any thoughts? This comes from the MSDN...

C# / C Sharp

5482

Encoding

by: mortb | last post by:

1. How do I determine which encoding a xmldocument or xmlreader uses when opening a document? I'm not just talking about the <?xml encoding="utf-8"?attribute, but the actual encoding of the characters in the underlying stream. 2. How do I make sure that the encoding of my created xmldocument or xmlwriter is in utf-8? Thanks! /mortb

.NET Framework

6975

What is mb_internal_encoding() excactly?

by: Erwin Moller | last post by:

Hi, Background: I am working on a multilanguage project now, so I decided to switch to UTF-8 completely to avoid troubles with unicode character. I hope somebody can review my approach and comment on it. I am working on: Server: Apache/2.2.3 (Debian) PHP/5.2.0-8+etch11

PHP

8215

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...

C / C++

8347

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...

Online Marketing

8220

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...

General

5718

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...

Microsoft Access / VBA

5394

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...

C# / C Sharp

3844

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...

Networking - Hardware / Configuration

2358

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1454

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

1189

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

General