VK wrote:
Does anyone have a reputable reference about internal string storage in
JavaScript? (for some particular implementation I mean).
I suppose you actually mean RAM with "internal string storage". Well,
the code for Open Source browsers such as Opera/Firefox is available.
Not an easy subject, I'ld say, and it's probably quite tough to
understand those primitive mechanisms if you're not familiar with them.
Say having 1,048,576 characters long string from the geometric
progression:
function generateLargeString() {
var s = 'a';
for (var i=1; i<21; ++i) {
s = s.concat(s);
}
return s;
}
- the internal size should be 2 mebibytes and not 1 (?) if strings are
indeed stored as Unicode 16-bit.
You can't put a number like 1 or 2 MB on the required "internal size"
(=RAM!). It heavily depends on the js engine, O.S., intermediary
levels... because javascript is such a high-level language. The
required RAM for a variable that holds 1,024,576 characters will always
be more than 1MB (and I would suspect quite a lot more than that).
From the other hand it would be tempting for an engine developer do
not spend extra bytes on ASCII chars...
Those bytes are "spent" in javascript, because each character is parsed
as its full Unicode code point.
So does anyone know of any documented engine optimizations on the
matter? Would be expected on some engine to have the string from above
twice smaller than say
function generateLargeString() {
// 1200 ETHIOPIC SYLLABLE HA
var s = String.fromCharCode(0x1200);
for (var i=1; i<21; ++i) {
s = s.concat(s);
}
return s;
}
My tests seem to confirm that ASCII or not doesn't matter, which is
actually logic, because javascript uses Unicode internally for every
character. My benchmarks on MSIE 6.0.29 (WinXP):
document.write(Date()+' - ')
s="a"
for (var i=1; i<100000; ++i) {
s+="a"
}
document.write(Date())
Thu Dec 14 12:44:15 2006 - Thu Dec 14 12:44:18 2006
Thu Dec 14 12:44:34 2006 - Thu Dec 14 12:44:37 2006
Thu Dec 14 12:45:24 2006 - Thu Dec 14 12:45:27 2006
Thu Dec 14 12:45:39 2006 - Thu Dec 14 12:45:42 2006
Thu Dec 14 12:45:56 2006 - Thu Dec 14 12:45:59 2006
Thu Dec 14 12:46:06 2006 - Thu Dec 14 12:46:09 2006
Thu Dec 14 12:46:17 2006 - Thu Dec 14 12:46:20 2006
document.write(Date()+' - ')
u="\u0945"
for (var i=1; i<100000; ++i) {
u+="\u0945"
}
document.write(Date())
Thu Dec 14 12:47:20 2006 - Thu Dec 14 12:47:23 2006
Thu Dec 14 12:48:01 2006 - Thu Dec 14 12:48:04 2006
Thu Dec 14 12:48:13 2006 - Thu Dec 14 12:48:16 2006
Thu Dec 14 12:48:24 2006 - Thu Dec 14 12:48:27 2006
Thu Dec 14 12:48:33 2006 - Thu Dec 14 12:48:36 2006
Thu Dec 14 12:48:50 2006 - Thu Dec 14 12:48:53 2006
Thu Dec 14 12:49:06 2006 - Thu Dec 14 12:49:09 2006
on FireFox 1.0.4. (mind the 1000000 in stead of the 100000, which makes
Firefox much faster than MSIE in this regard!):
document.write(Date()+' - ')
s="a"
for (var i=1; i<1000000; ++i) {
s+="a"
}
document.write(Date())
12:53:21 - 12:53:25
12:53:43 - 12:53:46
12:54:00 - 12:54:04
document.write(Date()+' - ')
u="\u0945"
for (var i=1; i<1000000; ++i) {
u+="\u0945"
}
document.write(Date())
12:50:29 - 12:50:32
12:51:03 - 12:51:06
12:51:48 - 12:51:51
I abbreviated Firefox' date strings for readability purposes.
--
Bart