473,767 Members | 7,953 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

hashtable and casing

Doing a project that makes heavy use of domain names such as
"www.yahoo.com. "
Domain names preserve case but are concidered equal if names are same but
case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitive HashCodeProvide r. That works fine. The benifit
is I can store the domain name and not worry about case and return the user
supplied case without storing an state, etc. However, this comes at a cost
because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is very
fast and if interned, then really fast. I could store domain name as all
lower case and then store a bitArray that tells me what chars where upper
case. However that seems like a pain and still requires at least 32 bytes
for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that is
used for all compare, endswith, hash, etc operations. However this doubles
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!

--
William Stacey, MVP
Nov 16 '05 #1
10 1283
Go back a couple of days and look up that IndexOf I helped a user with.
You should probably write your own custom string operations for this one
that give you maximum speed with the trade-off, that you won't be culture
aware. In this case, your case insens work does not need to be culture aware
it simply has to follow the RFC's for domain names which are fairly strict.
--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

"William Stacey [MVP]" <st***********@ mvps.org> wrote in message
news:uT******** ******@TK2MSFTN GP10.phx.gbl...
Doing a project that makes heavy use of domain names such as
"www.yahoo.com. "
Domain names preserve case but are concidered equal if names are same but
case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitive HashCodeProvide r. That works fine. The benifit
is I can store the domain name and not worry about case and return the user
supplied case without storing an state, etc. However, this comes at a cost
because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is very
fast and if interned, then really fast. I could store domain name as all
lower case and then store a bitArray that tells me what chars where upper
case. However that seems like a pain and still requires at least 32 bytes
for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that is
used for all compare, endswith, hash, etc operations. However this doubles
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!

--
William Stacey, MVP

Nov 16 '05 #2
Why do you have to return the domain back to the user in the case it was
entered (it would actually make more sense to correct it to lower case I
think, because all domains are in lower case and anything else they enter is
probably mistyped).

You could even correct the domain by doing a reverse DNS lookup and storing
the result.

"William Stacey [MVP]" <st***********@ mvps.org> wrote in message
news:uT******** ******@TK2MSFTN GP10.phx.gbl...
Doing a project that makes heavy use of domain names such as
"www.yahoo.com. "
Domain names preserve case but are concidered equal if names are same but
case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitive HashCodeProvide r. That works fine. The benifit is I can store the domain name and not worry about case and return the user supplied case without storing an state, etc. However, this comes at a cost because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is very fast and if interned, then really fast. I could store domain name as all
lower case and then store a bitArray that tells me what chars where upper
case. However that seems like a pain and still requires at least 32 bytes
for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that is used for all compare, endswith, hash, etc operations. However this doubles the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!

--
William Stacey, MVP

Nov 16 '05 #3
> think, because all domains are in lower case and anything else they enter
is
probably mistyped).
That would be nice, but not allowed by the 1034-1035 . You must preserve
the case of domain names and labels.

You could even correct the domain by doing a reverse DNS lookup and storing the result.

"William Stacey [MVP]" <st***********@ mvps.org> wrote in message
news:uT******** ******@TK2MSFTN GP10.phx.gbl...
Doing a project that makes heavy use of domain names such as
"www.yahoo.com. "
Domain names preserve case but are concidered equal if names are same but case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitive HashCodeProvide r. That works fine. The

benifit
is I can store the domain name and not worry about case and return the

user
supplied case without storing an state, etc. However, this comes at a

cost
because all string compare operations now must be case sensitive such as
endswith, etc. If case was all "lower" for example, string compares is

very
fast and if interned, then really fast. I could store domain name as all lower case and then store a bitArray that tells me what chars where upper case. However that seems like a pain and still requires at least 32 bytes for a 255 char domain name, or 3 bytes for a 20 char name. I could also
store both the original case as a string and the lower case version that

is
used for all compare, endswith, hash, etc operations. However this

doubles
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance I
think, but was wonder what others think? Cheers!

--
William Stacey, MVP



Nov 16 '05 #4
but but... why?

Internet explorer doesn't for a start. Enter a URL in messed up case, and
it'll correct the domain et al when it displays the page.

"William Stacey [MVP]" <st***********@ mvps.org> wrote in message
news:%2******** ********@TK2MSF TNGP09.phx.gbl. ..
think, because all domains are in lower case and anything else they enter
is
probably mistyped).


That would be nice, but not allowed by the 1034-1035 . You must preserve
the case of domain names and labels.

You could even correct the domain by doing a reverse DNS lookup and

storing
the result.

"William Stacey [MVP]" <st***********@ mvps.org> wrote in message
news:uT******** ******@TK2MSFTN GP10.phx.gbl...
Doing a project that makes heavy use of domain names such as
"www.yahoo.com. "
Domain names preserve case but are concidered equal if names are same but case is different.
I know I can store these names as keys in a hashtable with case insens
comparer and CaseInsensitive HashCodeProvide r. That works fine. The

benifit
is I can store the domain name and not worry about case and return the

user
supplied case without storing an state, etc. However, this comes at a

cost
because all string compare operations now must be case sensitive such as endswith, etc. If case was all "lower" for example, string compares is very
fast and if interned, then really fast. I could store domain name as all lower case and then store a bitArray that tells me what chars where upper case. However that seems like a pain and still requires at least 32 bytes for a 255 char domain name, or 3 bytes for a 20 char name. I could
also store both the original case as a string and the lower case version that is
used for all compare, endswith, hash, etc operations. However this

doubles
the storage needed. This can be leveraged with string interning for
duplicates. That is my most attractive option in terms of performance

I think, but was wonder what others think? Cheers!

--
William Stacey, MVP


Nov 16 '05 #5
That is IE (the application) that downcases it. The resolver and the dns
servers preserve the case that was entered when the RR was created. Utils
like dig and nslookup can be used to see that case is preserved. IE should
be the benchmark in this case.

--
William Stacey, MVP

"John Wood" <sp**@isannoyin g.com> wrote in message
news:#s******** *****@tk2msftng p13.phx.gbl...
but but... why?

Internet explorer doesn't for a start. Enter a URL in messed up case, and
it'll correct the domain et al when it displays the page.

"William Stacey [MVP]" <st***********@ mvps.org> wrote in message
news:%2******** ********@TK2MSF TNGP09.phx.gbl. ..
think, because all domains are in lower case and anything else they enter
is
probably mistyped).
That would be nice, but not allowed by the 1034-1035 . You must preserve the case of domain names and labels.

You could even correct the domain by doing a reverse DNS lookup and

storing
the result.

"William Stacey [MVP]" <st***********@ mvps.org> wrote in message
news:uT******** ******@TK2MSFTN GP10.phx.gbl...
> Doing a project that makes heavy use of domain names such as
> "www.yahoo.com. "
> Domain names preserve case but are concidered equal if names are same
but
> case is different.
> I know I can store these names as keys in a hashtable with case
insens > comparer and CaseInsensitive HashCodeProvide r. That works fine. The
benifit
> is I can store the domain name and not worry about case and return the user
> supplied case without storing an state, etc. However, this comes at a cost
> because all string compare operations now must be case sensitive

such as > endswith, etc. If case was all "lower" for example, string compares is very
> fast and if interned, then really fast. I could store domain name
as all
> lower case and then store a bitArray that tells me what chars where upper
> case. However that seems like a pain and still requires at least 32

bytes
> for a 255 char domain name, or 3 bytes for a 20 char name. I could

also > store both the original case as a string and the lower case version that is
> used for all compare, endswith, hash, etc operations. However this
doubles
> the storage needed. This can be leveraged with string interning for
> duplicates. That is my most attractive option in terms of
performance I > think, but was wonder what others think? Cheers!
>
> --
> William Stacey, MVP
>
>



Nov 16 '05 #6
well i'm not saying that's the wrong thing to do... just interested in why
it's so important. Surely it's more important to reflect the intent of the
company/person hosting the site, than the person who entered in the site
name?

That's a bit like someone mispronouncing your name, and you continuing that
mispronunciatio n, rather than either correcting them, or ignoring them and
continuing with the correct pronunciation.

"William Stacey [MVP]" <st***********@ mvps.org> wrote in message
news:em******** ******@TK2MSFTN GP10.phx.gbl...
That is IE (the application) that downcases it. The resolver and the dns
servers preserve the case that was entered when the RR was created. Utils
like dig and nslookup can be used to see that case is preserved. IE should be the benchmark in this case.

--
William Stacey, MVP

"John Wood" <sp**@isannoyin g.com> wrote in message
news:#s******** *****@tk2msftng p13.phx.gbl...
but but... why?

Internet explorer doesn't for a start. Enter a URL in messed up case, and
it'll correct the domain et al when it displays the page.

"William Stacey [MVP]" <st***********@ mvps.org> wrote in message
news:%2******** ********@TK2MSF TNGP09.phx.gbl. ..
> think, because all domains are in lower case and anything else they enter
is
> probably mistyped).

That would be nice, but not allowed by the 1034-1035 . You must preserve the case of domain names and labels.

>
> You could even correct the domain by doing a reverse DNS lookup and
storing
> the result.
>
> "William Stacey [MVP]" <st***********@ mvps.org> wrote in message
> news:uT******** ******@TK2MSFTN GP10.phx.gbl...
> > Doing a project that makes heavy use of domain names such as
> > "www.yahoo.com. "
> > Domain names preserve case but are concidered equal if names are same but
> > case is different.
> > I know I can store these names as keys in a hashtable with case insens > > comparer and CaseInsensitive HashCodeProvide r. That works fine. The > benifit
> > is I can store the domain name and not worry about case and return the > user
> > supplied case without storing an state, etc. However, this comes at a
> cost
> > because all string compare operations now must be case sensitive such
as
> > endswith, etc. If case was all "lower" for example, string

compares is
> very
> > fast and if interned, then really fast. I could store domain name as all
> > lower case and then store a bitArray that tells me what chars

where upper
> > case. However that seems like a pain and still requires at least 32 bytes
> > for a 255 char domain name, or 3 bytes for a 20 char name. I could also
> > store both the original case as a string and the lower case
version that
> is
> > used for all compare, endswith, hash, etc operations. However

this > doubles
> > the storage needed. This can be leveraged with string interning for > > duplicates. That is my most attractive option in terms of

performance
I
> > think, but was wonder what others think? Cheers!
> >
> > --
> > William Stacey, MVP
> >
> >
>
>


Nov 16 '05 #7
If you do an axfr, for example, you will see the case of all your rrs in the
zone in the case you entered.
If you do "dig abc.test.com", the server will return "abc.test.c om" even if
the case on the server is "ABC.test.c om."
If you do "dig abC.test.com", the server will return "abC.test.c om" - or the
same case as your question. The match is case insensitive. Not sure I know
how to comment other then that is how it works currently. I think the
important point is that the server maintains case, but does case insensitive
matching, so it does not matter what case the QName is sent in. Cheers,

--
William Stacey, MVP

"John Wood" <sp**@isannoyin g.com> wrote in message
news:eX******** ******@tk2msftn gp13.phx.gbl...
well i'm not saying that's the wrong thing to do... just interested in why
it's so important. Surely it's more important to reflect the intent of the
company/person hosting the site, than the person who entered in the site
name?

That's a bit like someone mispronouncing your name, and you continuing that mispronunciatio n, rather than either correcting them, or ignoring them and
continuing with the correct pronunciation.

"William Stacey [MVP]" <st***********@ mvps.org> wrote in message
news:em******** ******@TK2MSFTN GP10.phx.gbl...
That is IE (the application) that downcases it. The resolver and the dns
servers preserve the case that was entered when the RR was created. Utils like dig and nslookup can be used to see that case is preserved. IE should
be the benchmark in this case.

--
William Stacey, MVP

"John Wood" <sp**@isannoyin g.com> wrote in message
news:#s******** *****@tk2msftng p13.phx.gbl...
but but... why?

Internet explorer doesn't for a start. Enter a URL in messed up case, and it'll correct the domain et al when it displays the page.

"William Stacey [MVP]" <st***********@ mvps.org> wrote in message
news:%2******** ********@TK2MSF TNGP09.phx.gbl. ..
> > think, because all domains are in lower case and anything else they enter
> is
> > probably mistyped).
>
> That would be nice, but not allowed by the 1034-1035 . You must

preserve
> the case of domain names and labels.
>
> >
> > You could even correct the domain by doing a reverse DNS lookup and > storing
> > the result.
> >
> > "William Stacey [MVP]" <st***********@ mvps.org> wrote in message
> > news:uT******** ******@TK2MSFTN GP10.phx.gbl...
> > > Doing a project that makes heavy use of domain names such as
> > > "www.yahoo.com. "
> > > Domain names preserve case but are concidered equal if names are

same
> but
> > > case is different.
> > > I know I can store these names as keys in a hashtable with case

insens
> > > comparer and CaseInsensitive HashCodeProvide r. That works fine. The > > benifit
> > > is I can store the domain name and not worry about case and
return the
> > user
> > > supplied case without storing an state, etc. However, this
comes at
a
> > cost
> > > because all string compare operations now must be case sensitive such
as
> > > endswith, etc. If case was all "lower" for example, string

compares is
> > very
> > > fast and if interned, then really fast. I could store domain
name as
> all
> > > lower case and then store a bitArray that tells me what chars

where > upper
> > > case. However that seems like a pain and still requires at
least 32 > bytes
> > > for a 255 char domain name, or 3 bytes for a 20 char name. I could also
> > > store both the original case as a string and the lower case version that
> > is
> > > used for all compare, endswith, hash, etc operations. However this > > doubles
> > > the storage needed. This can be leveraged with string interning for > > > duplicates. That is my most attractive option in terms of

performance
I
> > > think, but was wonder what others think? Cheers!
> > >
> > > --
> > > William Stacey, MVP
> > >
> > >
> >
> >
>



Nov 16 '05 #8
Hi William,

First of all, I would like to confirm my understanding of your issue. From
your description, I understand that you need to know the best way to do
case-insensitive compare and storage. If there is any misunderstandin g,
please feel free to let me know.

As far as I know, it's hard to get both of time complexity and space
complexity. When more performance is get, we will lose much room for
storage. When less memory is used, we'll get better performance.

So I think whether to choose time or space depends on the project. When the
server is very fast and not much users are accessing the service
simultaneously, we can save the URL with the original case and compare with
CaseInsensitive HashCodeProvide r. If the users are doing the compare
frequently, we can try to save the text in two editions and compare with
the lower cased edition.

HTH. If anything is unclear, please feel free to reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 16 '05 #9
Agreed. I decided on caseinsensitive hcp and storing the domain names as
entered in what ever case they are. This does mean I can't intern them and
do quick object.refequal s testing or simple string.equal(a) testing.
However, after you factor in that each request would require downcasing
(slow) and time of interning or getting intern pool ref to string, it takes
more time to do those two things. Hope you get what I mean. Cheers!

--
William Stacey, MVP

"Kevin Yu [MSFT]" <v-****@online.mic rosoft.com> wrote in message
news:i3******** ******@cpmsftng xa10.phx.gbl...
Hi William,

First of all, I would like to confirm my understanding of your issue. From
your description, I understand that you need to know the best way to do
case-insensitive compare and storage. If there is any misunderstandin g,
please feel free to let me know.

As far as I know, it's hard to get both of time complexity and space
complexity. When more performance is get, we will lose much room for
storage. When less memory is used, we'll get better performance.

So I think whether to choose time or space depends on the project. When the server is very fast and not much users are accessing the service
simultaneously, we can save the URL with the original case and compare with CaseInsensitive HashCodeProvide r. If the users are doing the compare
frequently, we can try to save the text in two editions and compare with
the lower cased edition.

HTH. If anything is unclear, please feel free to reply to the post.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."


Nov 16 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
3768
by: thechaosengine | last post by:
Hi all, Is it ok to use a string as the key for Hashtable entries? I want to use the name of entity in question, which I know will always be unique. Do I have to do anything fancy equality-wise or are there any caveats I should be aware of? Thanks to anyone who can advise. Kindest Regards
5
2826
by: francois | last post by:
First of all I would to to apologize for resending this post again but I feel like my last post as been spoiled Here I go for my problem: Hi, I have a webservice that I am using and I would like it to return an XML serialized version of an object.
5
15584
by: Cyrus | last post by:
I have a question regarding synchronization across multiple threads for a Hashtable. Currently I have a Threadpool that is creating worker threads based on requests to read/write to a hashtable. One function of the Hashtable is to iterate through its keys, which apparently is inherently not thread-safe. Other functions of the Hashtable include adding/modifying/deleting. To solve the synchronization issues I am doing two things: 1. Lock...
8
59519
by: SenthilVel | last post by:
how to get the corresponding values for a given Key in hashtable ??
33
3321
by: Ken | last post by:
I have a C# Program where multiple threads will operate on a same Hashtable. This Hashtable is synchronized by using Hashtable.Synchronized(myHashtable) method, so no further Lock statements are used before adding, removing or iterating the Hashtable. The program runs in a high workload environment. After running a few days, now it suddenly catchs this Exception when inserting a pair of key and object, stacktrace =...
16
696
by: Sreekanth | last post by:
Hello, Is there any better collection than HashTable in terms of performance, when the type of the key is integer? Regards, Sreekanth.
3
9695
by: Fred | last post by:
I'm trying to build a hashtable and a arraylist as object value I'm not able to retrieve stored object from the hashtable. Hashtable mp = new Hashtable(); // THE HASHTABLE ArrayList atemp = new ArrayList(); // THE ARRAY StreamWriter sw = new StreamWriter(@"C:\temp\fred.html");
2
3152
by: PAzevedo | last post by:
I have this Hashtable of Hashtables, and I'm accessing this object from multiple threads, now the Hashtable object is thread safe for reading, but not for writing, so I lock the object every time I need to write to it, but now it occurred to me that maybe I could just lock one of the Hashtables inside without locking the entire object, but then I thought maybe some thread could instruct the outside Hashtable to remove an inside Hashtable...
2
3881
by: archana | last post by:
Hi all, I am having one confusion regarding hashtable. I am having function in which i am passing hashtable as reference. In function i am creating one hashtable which is local to that function. Then i am setting this hash table to hashtable which i am passing as ref. So my question is how scope is mention when i am assigning local
0
10170
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10014
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9960
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9841
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8840
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7384
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5425
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3931
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2808
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.