473,395 Members | 1,706 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

String comparison problem

Hi,

How would one go about comparing 2 strings one of which may contain
special entities (eg "cassé" and "cassé")?
I tried to find a way to take the second string and do a replace
whenever such entities are encountered but this implies creating some
sort of lookup table containing not all but a good number of entity
codes. Unless I am mistaken, javascript does not any function to replace
an entity-infested string with a decoded string, pretty much like php's
html_entity_decode. Another way, probably better (but I don't know),
would be to encode the first string.

Any ideas?

Thanks
Jun 1 '07 #1
3 10686
VK
On Jun 2, 2:38 am, Henri <yeah_ri...@donteventry.comwrote:
How would one go about comparing 2 strings one of which may contain
special entities (eg "cassé" and "cassé")?
Unless there is some Google Groups server "optimization" here, I see
in the first case a word containing character e accent aigue and in
the second case a word containing numeric HTML entity "#233". In such
case these are two completely different issues here.
Javascript operates in Unicode, so it internally sees any string
literal as a Unicode sequence, no matter what the actual page encoding
is. If you need to sort and transform strings according to current
locale, use locale-specific string manipulation methods:
string1.localeCompare(string2)
and
toLocaleLowerCase()
toLocaleUpperCase()

In the second case (with HTML entity) it all depends from were are you
retrieving this string. If you are getting it from the content of a
loaded page, then by the time you are retrieving it the entities are
already parsed so for Javascript it is the same Unicode string as in
the first case, so you don't need to bother with extra transformation.
If it is a string literal "cassé" then obviously for Javascript
it is just a character sequence "c-a-s-s-&-#-2-3-3-;" and it has
nothing to do with "cassé". In this case either use RegExp to replace
entities by custom table; or insert the string into (hidden) HTML
element and read back the parsed value.

Jun 2 '07 #2
VK wrote:
On Jun 2, 2:38 am, Henri <yeah_ri...@donteventry.comwrote:
>How would one go about comparing 2 strings one of which may contain
special entities (eg "cassé" and "cassé")?

Unless there is some Google Groups server "optimization" here, I see
in the first case a word containing character e accent aigue and in
the second case a word containing numeric HTML entity "#233". In such
case these are two completely different issues here.
Javascript operates in Unicode, so it internally sees any string
literal as a Unicode sequence, no matter what the actual page encoding
is. If you need to sort and transform strings according to current
locale, use locale-specific string manipulation methods:
string1.localeCompare(string2)
and
toLocaleLowerCase()
toLocaleUpperCase()

In the second case (with HTML entity) it all depends from were are you
retrieving this string. If you are getting it from the content of a
loaded page, then by the time you are retrieving it the entities are
already parsed so for Javascript it is the same Unicode string as in
the first case, so you don't need to bother with extra transformation.
If it is a string literal "cassé" then obviously for Javascript
it is just a character sequence "c-a-s-s-&-#-2-3-3-;" and it has
nothing to do with "cassé". In this case either use RegExp to replace
entities by custom table; or insert the string into (hidden) HTML
element and read back the parsed value.
That's the case and I've started experimenting with the replace
function. Calling, for instance, str.replace(/é/,"é") does produce
a "normalized" string. I have to generalize this in order to be able to
take into account most accented characters.
Thank you for your response.
Jun 2 '07 #3
VK wrote:
On Jun 2, 2:38 am, Henri <yeah_ri...@donteventry.comwrote:
>How would one go about comparing 2 strings one of which may contain
special entities (eg "cassé" and "cassé")?

Unless there is some Google Groups server "optimization" here, I see
in the first case a word containing character e accent aigue and in
the second case a word containing numeric HTML entity "#233". In such
case these are two completely different issues here.
Javascript operates in Unicode, so it internally sees any string
literal as a Unicode sequence, no matter what the actual page encoding
is. If you need to sort and transform strings according to current
locale, use locale-specific string manipulation methods:
string1.localeCompare(string2)
and
toLocaleLowerCase()
toLocaleUpperCase()

In the second case (with HTML entity) it all depends from were are you
retrieving this string. If you are getting it from the content of a
loaded page, then by the time you are retrieving it the entities are
already parsed so for Javascript it is the same Unicode string as in
the first case, so you don't need to bother with extra transformation.
If it is a string literal "cassé" then obviously for Javascript
it is just a character sequence "c-a-s-s-&-#-2-3-3-;" and it has
nothing to do with "cassé". In this case either use RegExp to replace
entities by custom table; or insert the string into (hidden) HTML
element and read back the parsed value.
To replace an entity-encoded string by it's decoded equivalent:

String.prototype.normalize = function() {

return this.replace(/&#([0-9]{1,7});/,
function (str, p1, p2, offset, s) {
return String.fromCharCode(p1);
}
);

}

if s = "cassé" then using s.normalize() returns "cassé"

Henri
Jun 2 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Neil Zanella | last post by:
Hello, Consider the following program. There are two C style string stack variables and one C style string heap variable. The compiler may or may not optimize the space taken up by the two stack...
51
by: Alan | last post by:
hi all, I want to define a constant length string, say 4 then in a function at some time, I want to set the string to a constant value, say a below is my code but it fails what is the correct...
46
by: yadurajj | last post by:
Hello i am newbie trying to learn C..I need to know about string comparisons in C, without using a library function,...recently I was asked this in an interview..I can write a small program but I...
4
by: Dim | last post by:
I found that C# has some buggy ways to process string across methods. I have a class with on global string var and a method where i add / remove from this string Consider it a buffer... with some...
19
by: David zhu | last post by:
I've got different result when comparing two strings using "==" and string.Compare(). The two strings seems to have same value "1202002" in the quick watch, and both have the same length 7 which I...
5
by: BILL | last post by:
Hi Everyone, I've been looking through these .NET groups and can't find the exact answer I want, so I'm asking. Can someone let me know the best way (you feel) to search a C# string for an...
5
by: MaSTeR | last post by:
Can anyone provide a practical short example of why in C# I shouldn't compare two strings with == ? If I write this in JAVA String string1 = "Widget"; if (string1 == "Widget") ...
26
by: Neville Lang | last post by:
Hi all, I am having a memory blank at the moment. I have been writing in C# for a number of years and now need to do something in VB.NET, so forgive me such a primitive question. In C#, I...
12
by: ujjc001 | last post by:
Here's one for ya. I want to create a relational operator from a string object, i.e. I want to somehow be able to say: string opString = ">="; int i1 = "20"; int i2 = "10"; if (i1...
6
by: aznimah | last post by:
hi, i'm work on image comparison. i'm using the similarity measurement which i need to: 1) convert the image into the binary form since the algorithm that i've use works with binary data for the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.