473,322 Members | 1,431 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Int versus Char primay key performance

Hi,

My company has a scenario where we would like to change the data type
of an existing primary key from an integer to a char, but we are
concerned about the performance implications of doing so. The script
for the two tables that we need to modify is listed below. Table
FR_Sessions contains a column named TransmissionID which is currently
an integer. This table contains about 1 million rows of data. Table
FR_VTracking table also contains the TransmissionID as part of it's
primary key and it contains about 35 millions rows of data. These two
tables are frequently joined on TransmissionID (FR_Sessions is the
parent). The TransmissionID column is used primarily for joins and is
not typically displayed.

We need like to change the TransmissionID data type from int to
char(7), and I had a few questions:

1) Would this introduce significant performance degradation? I have
read that char keys/indexes are slower than int/numeric.
2) Are there collation options (or any other optimizations) that we
could use to minimize the performance hit of the char(7)...if so which
ones?

I am a software architect by trade, not a database guru, so please go
easy on my if I overlooked something obvious :)

Any suggestions or information would be greatly appreciated.

Thanks,
Tim

-------------------

CREATE TABLE [FR_Sessions] (
[TransmissionID] [int] IDENTITY (1, 1) NOT NULL ,
[PTUID] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[PortNum] [numeric](6, 0) NOT NULL CONSTRAINT [DF_FR_Sessions_PortNum]
DEFAULT (0),
[CloseStatus] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[RecvBytes] [int] NULL ,
[SendBytes] [int] NULL ,
[EndDT] [datetime] NULL CONSTRAINT [DF_FR_Sessions_EndDT] DEFAULT
(getutcdate()),
[LocalEndDT] [datetime] NULL ,
[TotalTime] [int] NULL ,
[OffenderID] [numeric](9, 0) NULL ,
[UploadStatus] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL CONSTRAINT [DF_FR_Sessions_UploadStatus] DEFAULT ('N'),
[SchedBatchID] [numeric](18, 0) NULL ,
[SWVersion] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DLST] [bit] NULL ,
[TZO] [smallint] NULL ,
[Processed] [bit] NOT NULL CONSTRAINT [DF_FR_Sessions_Processed]
DEFAULT (0),
[CallerID] [varchar] (13) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PeerIP] [varchar] (16) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[XtraInfo] [varchar] (1024) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[IdType] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
CONSTRAINT [PK_FR_Sessions] PRIMARY KEY CLUSTERED
(
[TransmissionID]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [FR_VTracking] (
[TransmissionID] [int] NOT NULL ,
[FrameNum] [int] NOT NULL ,
[LatDegrees] [float] NOT NULL ,
[LonDegrees] [float] NOT NULL ,
[Altitude] [float] NOT NULL ,
[Velocity] [float] NOT NULL ,
[NumPositions] [smallint] NOT NULL ,
[NavMode] [smallint] NOT NULL ,
[Units] [smallint] NOT NULL ,
[GPSTrackingID] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[dtStamp] [datetime] NULL ,
CONSTRAINT [PK_FR_VTracking] PRIMARY KEY CLUSTERED
(
[TransmissionID],
[FrameNum]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

Jul 23 '05 #1
5 10395

"twkelsey" <tw******@hotmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Hi,

My company has a scenario where we would like to change the data type
of an existing primary key from an integer to a char, but we are
concerned about the performance implications of doing so. The script
for the two tables that we need to modify is listed below. Table
FR_Sessions contains a column named TransmissionID which is currently
an integer. This table contains about 1 million rows of data. Table
FR_VTracking table also contains the TransmissionID as part of it's
primary key and it contains about 35 millions rows of data. These two
tables are frequently joined on TransmissionID (FR_Sessions is the
parent). The TransmissionID column is used primarily for joins and is
not typically displayed.

We need like to change the TransmissionID data type from int to
char(7), and I had a few questions:

1) Would this introduce significant performance degradation? I have
read that char keys/indexes are slower than int/numeric.
2) Are there collation options (or any other optimizations) that we
could use to minimize the performance hit of the char(7)...if so which
ones?

I am a software architect by trade, not a database guru, so please go
easy on my if I overlooked something obvious :)

Any suggestions or information would be greatly appreciated.

Thanks,
Tim


<snip>

From a performance point of view, the main difference is that an int takes
up 4 bytes, but char(7) would take 7. That means more data must be stored,
retrieved and moved around as MSSQL executes queries and modifies data.
Having said that, char(7) is not an especially wide column, and as long as
you have appropriate indexes, and queries run within acceptable times for
your users and application needs, then there's nothing to worry about.

A more important question is probably why you need to do this - if
TransmissionID is an internal key which isn't exposed to users, then why
would you want or need to change it? The fact that you're using IDENTITY
suggests that the values have no inherent meaning as such, so I'm not sure
what the benefit of char(7) is, unless perhaps you're moving from an
artificial to a natural key?

Simon
Jul 23 '05 #2
Simon,

Thanks for your reply. In regard to your question about why we would
like to do this:

We have wireless devices the field that transmit realtime data over
TCP/IP. A Windows service grabs this data off of the wire and inserts
it into the database. Currently, an initial "empty" row is inserted in
the FR_Session table to get the TransmissionID back from the identity
column. When the wireless session has completed, a subsequent update
is performed to fill in the session summary information. We are
optimizning the current process and my design requirements dictate that
I must not only avoid the subsequent updates, but also to completely
decouple the service from accessing the database.

To that end, the windows service is being re-designed to create BCP
files which will be bulk copied at fixed intervals into the database.
This implies that the TransmissionID will have to be generated by the
service itself and inserted into the BCP files for two tables (instead
of being generated by identity). The new TransmissionID that is
generated by the service will be alphanumeric, so if we could push that
ID straight in with the BCP files for both the parent and child tables
it would be preferred.

Thanks,
Tim

Jul 23 '05 #3
To answer the other question: yes, collation makes a small difference.
To minimize impact of collation, use any of the binary collations.

By the way: I would consider it a serious mistake to redesign your
database to match a wire format. You have a database to store data, not
to simulate some external format. You use SELECT statements to retrieve
the data. If needed, you can converting int to char(7) in such a query
or in a view. Maybe that is a simple solution for your problem.

Also note, that Identity only works on integer and decimal data types,
not on char.

HTH,
Gert-Jan
twkelsey wrote:

Hi,

My company has a scenario where we would like to change the data type
of an existing primary key from an integer to a char, but we are
concerned about the performance implications of doing so. The script
for the two tables that we need to modify is listed below. Table
FR_Sessions contains a column named TransmissionID which is currently
an integer. This table contains about 1 million rows of data. Table
FR_VTracking table also contains the TransmissionID as part of it's
primary key and it contains about 35 millions rows of data. These two
tables are frequently joined on TransmissionID (FR_Sessions is the
parent). The TransmissionID column is used primarily for joins and is
not typically displayed.

We need like to change the TransmissionID data type from int to
char(7), and I had a few questions:

1) Would this introduce significant performance degradation? I have
read that char keys/indexes are slower than int/numeric.
2) Are there collation options (or any other optimizations) that we
could use to minimize the performance hit of the char(7)...if so which
ones?

I am a software architect by trade, not a database guru, so please go
easy on my if I overlooked something obvious :)

Any suggestions or information would be greatly appreciated.

Thanks,
Tim

-------------------

CREATE TABLE [FR_Sessions] (
[TransmissionID] [int] IDENTITY (1, 1) NOT NULL ,
[PTUID] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[PortNum] [numeric](6, 0) NOT NULL CONSTRAINT [DF_FR_Sessions_PortNum]
DEFAULT (0),
[CloseStatus] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[RecvBytes] [int] NULL ,
[SendBytes] [int] NULL ,
[EndDT] [datetime] NULL CONSTRAINT [DF_FR_Sessions_EndDT] DEFAULT
(getutcdate()),
[LocalEndDT] [datetime] NULL ,
[TotalTime] [int] NULL ,
[OffenderID] [numeric](9, 0) NULL ,
[UploadStatus] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL CONSTRAINT [DF_FR_Sessions_UploadStatus] DEFAULT ('N'),
[SchedBatchID] [numeric](18, 0) NULL ,
[SWVersion] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DLST] [bit] NULL ,
[TZO] [smallint] NULL ,
[Processed] [bit] NOT NULL CONSTRAINT [DF_FR_Sessions_Processed]
DEFAULT (0),
[CallerID] [varchar] (13) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PeerIP] [varchar] (16) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[XtraInfo] [varchar] (1024) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[IdType] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
CONSTRAINT [PK_FR_Sessions] PRIMARY KEY CLUSTERED
(
[TransmissionID]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [FR_VTracking] (
[TransmissionID] [int] NOT NULL ,
[FrameNum] [int] NOT NULL ,
[LatDegrees] [float] NOT NULL ,
[LonDegrees] [float] NOT NULL ,
[Altitude] [float] NOT NULL ,
[Velocity] [float] NOT NULL ,
[NumPositions] [smallint] NOT NULL ,
[NavMode] [smallint] NOT NULL ,
[Units] [smallint] NOT NULL ,
[GPSTrackingID] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[dtStamp] [datetime] NULL ,
CONSTRAINT [PK_FR_VTracking] PRIMARY KEY CLUSTERED
(
[TransmissionID],
[FrameNum]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

Jul 23 '05 #4
Gert-Jan Strik (so***@toomuchspamalready.nl) writes:
To answer the other question: yes, collation makes a small difference.
To minimize impact of collation, use any of the binary collations.


I seem to recall numbers like 20% overhead for a case-sensitive
collation and 35% for an accent- and case-ínsensitive collation. But
this was very long ago, probably during 6.5 days, and may not be relevant.

I would recommend that you use the default collation of the database,
and if you have business requirements for, say, case-insensitive go
with that. The performance gain you could make by making these
particular columns using a binary collation may not exceed the cost
for the increased complexity.

An alternative would be to se binary values altogether, and avoid the
char problem.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #5
Tim,

Yes, there will some performance overhead to use char(7) as index key rather
than integer since string comparison normally is more expensive than integer
comparison. Also since integer takes 4 bytes while char(70 takes 7 bytes,
you should expect to see a small space increase with this change. The
performance overhead caused by copying the extra 3 bytes around in the
server normally is too small to notice.

Collation definitely makes a different when it comes to string comparison.
Try to use binary collation which normally compares fastest for string.

--
Gang He
Software Design Engineer
Microsoft SQL Server Storage Engine

This posting is provided "AS IS" with no warranties, and confers no rights.
"twkelsey" <tw******@hotmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Hi,

My company has a scenario where we would like to change the data type
of an existing primary key from an integer to a char, but we are
concerned about the performance implications of doing so. The script
for the two tables that we need to modify is listed below. Table
FR_Sessions contains a column named TransmissionID which is currently
an integer. This table contains about 1 million rows of data. Table
FR_VTracking table also contains the TransmissionID as part of it's
primary key and it contains about 35 millions rows of data. These two
tables are frequently joined on TransmissionID (FR_Sessions is the
parent). The TransmissionID column is used primarily for joins and is
not typically displayed.

We need like to change the TransmissionID data type from int to
char(7), and I had a few questions:

1) Would this introduce significant performance degradation? I have
read that char keys/indexes are slower than int/numeric.
2) Are there collation options (or any other optimizations) that we
could use to minimize the performance hit of the char(7)...if so which
ones?

I am a software architect by trade, not a database guru, so please go
easy on my if I overlooked something obvious :)

Any suggestions or information would be greatly appreciated.

Thanks,
Tim

-------------------

CREATE TABLE [FR_Sessions] (
[TransmissionID] [int] IDENTITY (1, 1) NOT NULL ,
[PTUID] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[PortNum] [numeric](6, 0) NOT NULL CONSTRAINT [DF_FR_Sessions_PortNum]
DEFAULT (0),
[CloseStatus] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[RecvBytes] [int] NULL ,
[SendBytes] [int] NULL ,
[EndDT] [datetime] NULL CONSTRAINT [DF_FR_Sessions_EndDT] DEFAULT
(getutcdate()),
[LocalEndDT] [datetime] NULL ,
[TotalTime] [int] NULL ,
[OffenderID] [numeric](9, 0) NULL ,
[UploadStatus] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL CONSTRAINT [DF_FR_Sessions_UploadStatus] DEFAULT ('N'),
[SchedBatchID] [numeric](18, 0) NULL ,
[SWVersion] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DLST] [bit] NULL ,
[TZO] [smallint] NULL ,
[Processed] [bit] NOT NULL CONSTRAINT [DF_FR_Sessions_Processed]
DEFAULT (0),
[CallerID] [varchar] (13) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PeerIP] [varchar] (16) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[XtraInfo] [varchar] (1024) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[IdType] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
CONSTRAINT [PK_FR_Sessions] PRIMARY KEY CLUSTERED
(
[TransmissionID]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [FR_VTracking] (
[TransmissionID] [int] NOT NULL ,
[FrameNum] [int] NOT NULL ,
[LatDegrees] [float] NOT NULL ,
[LonDegrees] [float] NOT NULL ,
[Altitude] [float] NOT NULL ,
[Velocity] [float] NOT NULL ,
[NumPositions] [smallint] NOT NULL ,
[NavMode] [smallint] NOT NULL ,
[Units] [smallint] NOT NULL ,
[GPSTrackingID] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[dtStamp] [datetime] NULL ,
CONSTRAINT [PK_FR_VTracking] PRIMARY KEY CLUSTERED
(
[TransmissionID],
[FrameNum]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

Jul 23 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: Fred | last post by:
Has anyone a link or any information comparing c and c++ as far as execution speed is concerned? Signal Processing algorithms would be welcome... Thanks Fred
2
by: Andreas Meyer | last post by:
Hi, Does anybody have experience with the performance of CHAR in comparison with VARCHAR? I´ve converted the table from varchar (dynamic format) to char (fixed row length) and the total size of...
5
by: dmhendricks | last post by:
Greetings, I have a question. I work on some SQL2k/ASP.NET apps at work. My predacessor, who created the databases/tables seemed to have liked to use 'char' for all text fields. Is there a...
22
by: jdokos | last post by:
Hello, I have a question about VARCHAR fields. Our application groups here are starting to use VARCHARs much more frequently. Even VARCHAR (2) to (9) length fields. They say this is because...
11
by: Walter Dnes (delete the 'z' to get my real address | last post by:
I've noticed a few threads (full of sound and fury, signifying nothing) here recently about allocation of large memory blocks. I'm about to start on a personal pet project where I'll be using...
1
by: jm | last post by:
Those of you who use Visual Studio, do you find a big performance difference between the single file model and the web form / class behind form model? I am trying to justify purchasing Visual...
2
by: Jon Lapham | last post by:
I have a table that stores TEXT information. I need query this table to find *exact* matches to the TEXT... no regular expressions, no LIKE queries, etc. The TEXT could be from 1 to 10000+...
4
by: Nick Chan | last post by:
all these while i've only used varchar for any string i heard from my ex-boss that char helps speed up searches. is that true? so there are these: 1) char with index 2) char without index...
8
by: oogie | last post by:
Hi, The arguments to main are usually passed as char* argv or char ** argv, where for this example argv is the array of arguments. What I don't understand is how accessing the actual argument...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.