473,385 Members | 1,890 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

hashtable and casing

Doing a project that makes heavy use of domain names such as
"www.yahoo.com."
Domain names preserve case but are concidered equal if names are same but
case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitiveHashCodeProvider. That works fine. The benifit
is I can store the domain name and not worry about case and return the user
supplied case without storing an state, etc. However, this comes at a cost
because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is very
fast and if interned, then really fast. I could store domain name as all
lower case and then store a bitArray that tells me what chars where upper
case. However that seems like a pain and still requires at least 32 bytes
for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that is
used for all compare, endswith, hash, etc operations. However this doubles
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!

--
William Stacey, MVP
Nov 16 '05 #1
10 1256
Go back a couple of days and look up that IndexOf I helped a user with.
You should probably write your own custom string operations for this one
that give you maximum speed with the trade-off, that you won't be culture
aware. In this case, your case insens work does not need to be culture aware
it simply has to follow the RFC's for domain names which are fairly strict.
--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:uT**************@TK2MSFTNGP10.phx.gbl...
Doing a project that makes heavy use of domain names such as
"www.yahoo.com."
Domain names preserve case but are concidered equal if names are same but
case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitiveHashCodeProvider. That works fine. The benifit
is I can store the domain name and not worry about case and return the user
supplied case without storing an state, etc. However, this comes at a cost
because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is very
fast and if interned, then really fast. I could store domain name as all
lower case and then store a bitArray that tells me what chars where upper
case. However that seems like a pain and still requires at least 32 bytes
for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that is
used for all compare, endswith, hash, etc operations. However this doubles
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!

--
William Stacey, MVP

Nov 16 '05 #2
Why do you have to return the domain back to the user in the case it was
entered (it would actually make more sense to correct it to lower case I
think, because all domains are in lower case and anything else they enter is
probably mistyped).

You could even correct the domain by doing a reverse DNS lookup and storing
the result.

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:uT**************@TK2MSFTNGP10.phx.gbl...
Doing a project that makes heavy use of domain names such as
"www.yahoo.com."
Domain names preserve case but are concidered equal if names are same but
case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitiveHashCodeProvider. That works fine. The benifit is I can store the domain name and not worry about case and return the user supplied case without storing an state, etc. However, this comes at a cost because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is very fast and if interned, then really fast. I could store domain name as all
lower case and then store a bitArray that tells me what chars where upper
case. However that seems like a pain and still requires at least 32 bytes
for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that is used for all compare, endswith, hash, etc operations. However this doubles the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!

--
William Stacey, MVP

Nov 16 '05 #3
> think, because all domains are in lower case and anything else they enter
is
probably mistyped).
That would be nice, but not allowed by the 1034-1035 . You must preserve
the case of domain names and labels.

You could even correct the domain by doing a reverse DNS lookup and storing the result.

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:uT**************@TK2MSFTNGP10.phx.gbl...
Doing a project that makes heavy use of domain names such as
"www.yahoo.com."
Domain names preserve case but are concidered equal if names are same but case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitiveHashCodeProvider. That works fine. The

benifit
is I can store the domain name and not worry about case and return the

user
supplied case without storing an state, etc. However, this comes at a

cost
because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is

very
fast and if interned, then really fast. I could store domain name as all lower case and then store a bitArray that tells me what chars where upper case. However that seems like a pain and still requires at least 32 bytes for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that

is
used for all compare, endswith, hash, etc operations. However this

doubles
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!

--
William Stacey, MVP



Nov 16 '05 #4
but but... why?

Internet explorer doesn't for a start. Enter a URL in messed up case, and
it'll correct the domain et al when it displays the page.

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
think, because all domains are in lower case and anything else they enter
is
probably mistyped).


That would be nice, but not allowed by the 1034-1035 . You must preserve
the case of domain names and labels.

You could even correct the domain by doing a reverse DNS lookup and

storing
the result.

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:uT**************@TK2MSFTNGP10.phx.gbl...
Doing a project that makes heavy use of domain names such as
"www.yahoo.com."
Domain names preserve case but are concidered equal if names are same but case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitiveHashCodeProvider. That works fine. The

benifit
is I can store the domain name and not worry about case and return the

user
supplied case without storing an state, etc. However, this comes at a

cost
because all string compare operations now must be case sensitive such as endswith, etc. If case was all "lower" for example, string compares is very
fast and if interned, then really fast. I could store domain name as all lower case and then store a bitArray that tells me what chars where upper case. However that seems like a pain and still requires at least 32 bytes for a 255 char domain name, or 3 bytes for a 20 char name. I could
also store both the original case as a string and the lower case version that is
used for all compare, endswith, hash, etc operations. However this

doubles
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance

I think, but was wonder what others think? Cheers!

--
William Stacey, MVP


Nov 16 '05 #5
That is IE (the application) that downcases it. The resolver and the dns
servers preserve the case that was entered when the RR was created. Utils
like dig and nslookup can be used to see that case is preserved. IE should
be the benchmark in this case.

--
William Stacey, MVP

"John Wood" <sp**@isannoying.com> wrote in message
news:#s*************@tk2msftngp13.phx.gbl...
but but... why?

Internet explorer doesn't for a start. Enter a URL in messed up case, and
it'll correct the domain et al when it displays the page.

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
think, because all domains are in lower case and anything else they enter
is
probably mistyped).
That would be nice, but not allowed by the 1034-1035 . You must preserve the case of domain names and labels.

You could even correct the domain by doing a reverse DNS lookup and

storing
the result.

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:uT**************@TK2MSFTNGP10.phx.gbl...
> Doing a project that makes heavy use of domain names such as
> "www.yahoo.com."
> Domain names preserve case but are concidered equal if names are same
but
> case is different.
> I know I can store these names as keys in a hashtable with case
insens > comparer and CaseInsensitiveHashCodeProvider. That works fine. The
benifit
> is I can store the domain name and not worry about case and return the user
> supplied case without storing an state, etc. However, this comes at a cost
> because all string compare operations now must be case sensitive

such as > endswith, etc. If case was all "lower" for example, string compares is very
> fast and if interned, then really fast. I could store domain name
as all
> lower case and then store a bitArray that tells me what chars where upper
> case. However that seems like a pain and still requires at least 32

bytes
> for a 255 char domain name, or 3 bytes for a 20 char name. I could

also > store both the original case as a string and the lower case version that is
> used for all compare, endswith, hash, etc operations. However this
doubles
> the storage needed. This can be leveraged with string interning for
> duplicates. That is my most attractive option in terms of
performance I > think, but was wonder what others think? Cheers!
>
> --
> William Stacey, MVP
>
>



Nov 16 '05 #6
well i'm not saying that's the wrong thing to do... just interested in why
it's so important. Surely it's more important to reflect the intent of the
company/person hosting the site, than the person who entered in the site
name?

That's a bit like someone mispronouncing your name, and you continuing that
mispronunciation, rather than either correcting them, or ignoring them and
continuing with the correct pronunciation.

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:em**************@TK2MSFTNGP10.phx.gbl...
That is IE (the application) that downcases it. The resolver and the dns
servers preserve the case that was entered when the RR was created. Utils
like dig and nslookup can be used to see that case is preserved. IE should be the benchmark in this case.

--
William Stacey, MVP

"John Wood" <sp**@isannoying.com> wrote in message
news:#s*************@tk2msftngp13.phx.gbl...
but but... why?

Internet explorer doesn't for a start. Enter a URL in messed up case, and
it'll correct the domain et al when it displays the page.

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
> think, because all domains are in lower case and anything else they enter
is
> probably mistyped).

That would be nice, but not allowed by the 1034-1035 . You must preserve the case of domain names and labels.

>
> You could even correct the domain by doing a reverse DNS lookup and
storing
> the result.
>
> "William Stacey [MVP]" <st***********@mvps.org> wrote in message
> news:uT**************@TK2MSFTNGP10.phx.gbl...
> > Doing a project that makes heavy use of domain names such as
> > "www.yahoo.com."
> > Domain names preserve case but are concidered equal if names are same but
> > case is different.
> > I know I can store these names as keys in a hashtable with case insens > > comparer and CaseInsensitiveHashCodeProvider. That works fine. The > benifit
> > is I can store the domain name and not worry about case and return the > user
> > supplied case without storing an state, etc. However, this comes at a
> cost
> > because all string compare operations now must be case sensitive such
as
> > endswith, etc. If case was all "lower" for example, string

compares is
> very
> > fast and if interned, then really fast. I could store domain name as all
> > lower case and then store a bitArray that tells me what chars

where upper
> > case. However that seems like a pain and still requires at least 32 bytes
> > for a 255 char domain name, or 3 bytes for a 20 char name. I could also
> > store both the original case as a string and the lower case
version that
> is
> > used for all compare, endswith, hash, etc operations. However

this > doubles
> > the storage needed. This can be leveraged with string interning for > > duplicates. That is my most attractive option in terms of

performance
I
> > think, but was wonder what others think? Cheers!
> >
> > --
> > William Stacey, MVP
> >
> >
>
>


Nov 16 '05 #7
If you do an axfr, for example, you will see the case of all your rrs in the
zone in the case you entered.
If you do "dig abc.test.com", the server will return "abc.test.com" even if
the case on the server is "ABC.test.com."
If you do "dig abC.test.com", the server will return "abC.test.com" - or the
same case as your question. The match is case insensitive. Not sure I know
how to comment other then that is how it works currently. I think the
important point is that the server maintains case, but does case insensitive
matching, so it does not matter what case the QName is sent in. Cheers,

--
William Stacey, MVP

"John Wood" <sp**@isannoying.com> wrote in message
news:eX**************@tk2msftngp13.phx.gbl...
well i'm not saying that's the wrong thing to do... just interested in why
it's so important. Surely it's more important to reflect the intent of the
company/person hosting the site, than the person who entered in the site
name?

That's a bit like someone mispronouncing your name, and you continuing that mispronunciation, rather than either correcting them, or ignoring them and
continuing with the correct pronunciation.

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:em**************@TK2MSFTNGP10.phx.gbl...
That is IE (the application) that downcases it. The resolver and the dns
servers preserve the case that was entered when the RR was created. Utils like dig and nslookup can be used to see that case is preserved. IE should
be the benchmark in this case.

--
William Stacey, MVP

"John Wood" <sp**@isannoying.com> wrote in message
news:#s*************@tk2msftngp13.phx.gbl...
but but... why?

Internet explorer doesn't for a start. Enter a URL in messed up case, and it'll correct the domain et al when it displays the page.

"William Stacey [MVP]" <st***********@mvps.org> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
> > think, because all domains are in lower case and anything else they enter
> is
> > probably mistyped).
>
> That would be nice, but not allowed by the 1034-1035 . You must

preserve
> the case of domain names and labels.
>
> >
> > You could even correct the domain by doing a reverse DNS lookup and > storing
> > the result.
> >
> > "William Stacey [MVP]" <st***********@mvps.org> wrote in message
> > news:uT**************@TK2MSFTNGP10.phx.gbl...
> > > Doing a project that makes heavy use of domain names such as
> > > "www.yahoo.com."
> > > Domain names preserve case but are concidered equal if names are

same
> but
> > > case is different.
> > > I know I can store these names as keys in a hashtable with case

insens
> > > comparer and CaseInsensitiveHashCodeProvider. That works fine. The > > benifit
> > > is I can store the domain name and not worry about case and
return the
> > user
> > > supplied case without storing an state, etc. However, this
comes at
a
> > cost
> > > because all string compare operations now must be case sensitive such
as
> > > endswith, etc. If case was all "lower" for example, string

compares is
> > very
> > > fast and if interned, then really fast. I could store domain
name as
> all
> > > lower case and then store a bitArray that tells me what chars

where > upper
> > > case. However that seems like a pain and still requires at
least 32 > bytes
> > > for a 255 char domain name, or 3 bytes for a 20 char name. I could also
> > > store both the original case as a string and the lower case version that
> > is
> > > used for all compare, endswith, hash, etc operations. However this > > doubles
> > > the storage needed. This can be leveraged with string interning for > > > duplicates. That is my most attractive option in terms of

performance
I
> > > think, but was wonder what others think? Cheers!
> > >
> > > --
> > > William Stacey, MVP
> > >
> > >
> >
> >
>



Nov 16 '05 #8
Hi William,

First of all, I would like to confirm my understanding of your issue. From
your description, I understand that you need to know the best way to do
case-insensitive compare and storage. If there is any misunderstanding,
please feel free to let me know.

As far as I know, it's hard to get both of time complexity and space
complexity. When more performance is get, we will lose much room for
storage. When less memory is used, we'll get better performance.

So I think whether to choose time or space depends on the project. When the
server is very fast and not much users are accessing the service
simultaneously, we can save the URL with the original case and compare with
CaseInsensitiveHashCodeProvider. If the users are doing the compare
frequently, we can try to save the text in two editions and compare with
the lower cased edition.

HTH. If anything is unclear, please feel free to reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 16 '05 #9
Agreed. I decided on caseinsensitive hcp and storing the domain names as
entered in what ever case they are. This does mean I can't intern them and
do quick object.refequals testing or simple string.equal(a) testing.
However, after you factor in that each request would require downcasing
(slow) and time of interning or getting intern pool ref to string, it takes
more time to do those two things. Hope you get what I mean. Cheers!

--
William Stacey, MVP

"Kevin Yu [MSFT]" <v-****@online.microsoft.com> wrote in message
news:i3**************@cpmsftngxa10.phx.gbl...
Hi William,

First of all, I would like to confirm my understanding of your issue. From
your description, I understand that you need to know the best way to do
case-insensitive compare and storage. If there is any misunderstanding,
please feel free to let me know.

As far as I know, it's hard to get both of time complexity and space
complexity. When more performance is get, we will lose much room for
storage. When less memory is used, we'll get better performance.

So I think whether to choose time or space depends on the project. When the server is very fast and not much users are accessing the service
simultaneously, we can save the URL with the original case and compare with CaseInsensitiveHashCodeProvider. If the users are doing the compare
frequently, we can try to save the text in two editions and compare with
the lower cased edition.

HTH. If anything is unclear, please feel free to reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."


Nov 16 '05 #10
Hi William,

It was glad to know that you have had the problem resolved. Thanks for
sharing your experience with all the people here. If you have any
questions, please feel free to post them in the community.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 16 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: thechaosengine | last post by:
Hi all, Is it ok to use a string as the key for Hashtable entries? I want to use the name of entity in question, which I know will always be unique. Do I have to do anything fancy equality-wise...
5
by: francois | last post by:
First of all I would to to apologize for resending this post again but I feel like my last post as been spoiled Here I go for my problem: Hi, I have a webservice that I am using and I would...
5
by: Cyrus | last post by:
I have a question regarding synchronization across multiple threads for a Hashtable. Currently I have a Threadpool that is creating worker threads based on requests to read/write to a hashtable....
8
by: SenthilVel | last post by:
how to get the corresponding values for a given Key in hashtable ??
33
by: Ken | last post by:
I have a C# Program where multiple threads will operate on a same Hashtable. This Hashtable is synchronized by using Hashtable.Synchronized(myHashtable) method, so no further Lock statements are...
16
by: Sreekanth | last post by:
Hello, Is there any better collection than HashTable in terms of performance, when the type of the key is integer? Regards, Sreekanth.
3
by: Fred | last post by:
I'm trying to build a hashtable and a arraylist as object value I'm not able to retrieve stored object from the hashtable. Hashtable mp = new Hashtable(); // THE HASHTABLE ArrayList...
2
by: PAzevedo | last post by:
I have this Hashtable of Hashtables, and I'm accessing this object from multiple threads, now the Hashtable object is thread safe for reading, but not for writing, so I lock the object every time I...
2
by: archana | last post by:
Hi all, I am having one confusion regarding hashtable. I am having function in which i am passing hashtable as reference. In function i am creating one hashtable which is local to that...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.