469,909 Members | 1,562 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,909 developers. It's quick & easy.

Int versus Char primay key performance

Hi,

My company has a scenario where we would like to change the data type
of an existing primary key from an integer to a char, but we are
concerned about the performance implications of doing so. The script
for the two tables that we need to modify is listed below. Table
FR_Sessions contains a column named TransmissionID which is currently
an integer. This table contains about 1 million rows of data. Table
FR_VTracking table also contains the TransmissionID as part of it's
primary key and it contains about 35 millions rows of data. These two
tables are frequently joined on TransmissionID (FR_Sessions is the
parent). The TransmissionID column is used primarily for joins and is
not typically displayed.

We need like to change the TransmissionID data type from int to
char(7), and I had a few questions:

1) Would this introduce significant performance degradation? I have
read that char keys/indexes are slower than int/numeric.
2) Are there collation options (or any other optimizations) that we
could use to minimize the performance hit of the char(7)...if so which
ones?

I am a software architect by trade, not a database guru, so please go
easy on my if I overlooked something obvious :)

Any suggestions or information would be greatly appreciated.

Thanks,
Tim

-------------------

CREATE TABLE [FR_Sessions] (
[TransmissionID] [int] IDENTITY (1, 1) NOT NULL ,
[PTUID] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[PortNum] [numeric](6, 0) NOT NULL CONSTRAINT [DF_FR_Sessions_PortNum]
DEFAULT (0),
[CloseStatus] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[RecvBytes] [int] NULL ,
[SendBytes] [int] NULL ,
[EndDT] [datetime] NULL CONSTRAINT [DF_FR_Sessions_EndDT] DEFAULT
(getutcdate()),
[LocalEndDT] [datetime] NULL ,
[TotalTime] [int] NULL ,
[OffenderID] [numeric](9, 0) NULL ,
[UploadStatus] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL CONSTRAINT [DF_FR_Sessions_UploadStatus] DEFAULT ('N'),
[SchedBatchID] [numeric](18, 0) NULL ,
[SWVersion] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DLST] [bit] NULL ,
[TZO] [smallint] NULL ,
[Processed] [bit] NOT NULL CONSTRAINT [DF_FR_Sessions_Processed]
DEFAULT (0),
[CallerID] [varchar] (13) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PeerIP] [varchar] (16) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[XtraInfo] [varchar] (1024) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[IdType] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
CONSTRAINT [PK_FR_Sessions] PRIMARY KEY CLUSTERED
(
[TransmissionID]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [FR_VTracking] (
[TransmissionID] [int] NOT NULL ,
[FrameNum] [int] NOT NULL ,
[LatDegrees] [float] NOT NULL ,
[LonDegrees] [float] NOT NULL ,
[Altitude] [float] NOT NULL ,
[Velocity] [float] NOT NULL ,
[NumPositions] [smallint] NOT NULL ,
[NavMode] [smallint] NOT NULL ,
[Units] [smallint] NOT NULL ,
[GPSTrackingID] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[dtStamp] [datetime] NULL ,
CONSTRAINT [PK_FR_VTracking] PRIMARY KEY CLUSTERED
(
[TransmissionID],
[FrameNum]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

Jul 23 '05 #1
5 10092

"twkelsey" <tw******@hotmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Hi,

My company has a scenario where we would like to change the data type
of an existing primary key from an integer to a char, but we are
concerned about the performance implications of doing so. The script
for the two tables that we need to modify is listed below. Table
FR_Sessions contains a column named TransmissionID which is currently
an integer. This table contains about 1 million rows of data. Table
FR_VTracking table also contains the TransmissionID as part of it's
primary key and it contains about 35 millions rows of data. These two
tables are frequently joined on TransmissionID (FR_Sessions is the
parent). The TransmissionID column is used primarily for joins and is
not typically displayed.

We need like to change the TransmissionID data type from int to
char(7), and I had a few questions:

1) Would this introduce significant performance degradation? I have
read that char keys/indexes are slower than int/numeric.
2) Are there collation options (or any other optimizations) that we
could use to minimize the performance hit of the char(7)...if so which
ones?

I am a software architect by trade, not a database guru, so please go
easy on my if I overlooked something obvious :)

Any suggestions or information would be greatly appreciated.

Thanks,
Tim


<snip>

From a performance point of view, the main difference is that an int takes
up 4 bytes, but char(7) would take 7. That means more data must be stored,
retrieved and moved around as MSSQL executes queries and modifies data.
Having said that, char(7) is not an especially wide column, and as long as
you have appropriate indexes, and queries run within acceptable times for
your users and application needs, then there's nothing to worry about.

A more important question is probably why you need to do this - if
TransmissionID is an internal key which isn't exposed to users, then why
would you want or need to change it? The fact that you're using IDENTITY
suggests that the values have no inherent meaning as such, so I'm not sure
what the benefit of char(7) is, unless perhaps you're moving from an
artificial to a natural key?

Simon
Jul 23 '05 #2
Simon,

Thanks for your reply. In regard to your question about why we would
like to do this:

We have wireless devices the field that transmit realtime data over
TCP/IP. A Windows service grabs this data off of the wire and inserts
it into the database. Currently, an initial "empty" row is inserted in
the FR_Session table to get the TransmissionID back from the identity
column. When the wireless session has completed, a subsequent update
is performed to fill in the session summary information. We are
optimizning the current process and my design requirements dictate that
I must not only avoid the subsequent updates, but also to completely
decouple the service from accessing the database.

To that end, the windows service is being re-designed to create BCP
files which will be bulk copied at fixed intervals into the database.
This implies that the TransmissionID will have to be generated by the
service itself and inserted into the BCP files for two tables (instead
of being generated by identity). The new TransmissionID that is
generated by the service will be alphanumeric, so if we could push that
ID straight in with the BCP files for both the parent and child tables
it would be preferred.

Thanks,
Tim

Jul 23 '05 #3
To answer the other question: yes, collation makes a small difference.
To minimize impact of collation, use any of the binary collations.

By the way: I would consider it a serious mistake to redesign your
database to match a wire format. You have a database to store data, not
to simulate some external format. You use SELECT statements to retrieve
the data. If needed, you can converting int to char(7) in such a query
or in a view. Maybe that is a simple solution for your problem.

Also note, that Identity only works on integer and decimal data types,
not on char.

HTH,
Gert-Jan
twkelsey wrote:

Hi,

My company has a scenario where we would like to change the data type
of an existing primary key from an integer to a char, but we are
concerned about the performance implications of doing so. The script
for the two tables that we need to modify is listed below. Table
FR_Sessions contains a column named TransmissionID which is currently
an integer. This table contains about 1 million rows of data. Table
FR_VTracking table also contains the TransmissionID as part of it's
primary key and it contains about 35 millions rows of data. These two
tables are frequently joined on TransmissionID (FR_Sessions is the
parent). The TransmissionID column is used primarily for joins and is
not typically displayed.

We need like to change the TransmissionID data type from int to
char(7), and I had a few questions:

1) Would this introduce significant performance degradation? I have
read that char keys/indexes are slower than int/numeric.
2) Are there collation options (or any other optimizations) that we
could use to minimize the performance hit of the char(7)...if so which
ones?

I am a software architect by trade, not a database guru, so please go
easy on my if I overlooked something obvious :)

Any suggestions or information would be greatly appreciated.

Thanks,
Tim

-------------------

CREATE TABLE [FR_Sessions] (
[TransmissionID] [int] IDENTITY (1, 1) NOT NULL ,
[PTUID] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[PortNum] [numeric](6, 0) NOT NULL CONSTRAINT [DF_FR_Sessions_PortNum]
DEFAULT (0),
[CloseStatus] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[RecvBytes] [int] NULL ,
[SendBytes] [int] NULL ,
[EndDT] [datetime] NULL CONSTRAINT [DF_FR_Sessions_EndDT] DEFAULT
(getutcdate()),
[LocalEndDT] [datetime] NULL ,
[TotalTime] [int] NULL ,
[OffenderID] [numeric](9, 0) NULL ,
[UploadStatus] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL CONSTRAINT [DF_FR_Sessions_UploadStatus] DEFAULT ('N'),
[SchedBatchID] [numeric](18, 0) NULL ,
[SWVersion] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DLST] [bit] NULL ,
[TZO] [smallint] NULL ,
[Processed] [bit] NOT NULL CONSTRAINT [DF_FR_Sessions_Processed]
DEFAULT (0),
[CallerID] [varchar] (13) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PeerIP] [varchar] (16) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[XtraInfo] [varchar] (1024) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[IdType] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
CONSTRAINT [PK_FR_Sessions] PRIMARY KEY CLUSTERED
(
[TransmissionID]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [FR_VTracking] (
[TransmissionID] [int] NOT NULL ,
[FrameNum] [int] NOT NULL ,
[LatDegrees] [float] NOT NULL ,
[LonDegrees] [float] NOT NULL ,
[Altitude] [float] NOT NULL ,
[Velocity] [float] NOT NULL ,
[NumPositions] [smallint] NOT NULL ,
[NavMode] [smallint] NOT NULL ,
[Units] [smallint] NOT NULL ,
[GPSTrackingID] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[dtStamp] [datetime] NULL ,
CONSTRAINT [PK_FR_VTracking] PRIMARY KEY CLUSTERED
(
[TransmissionID],
[FrameNum]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

Jul 23 '05 #4
Gert-Jan Strik (so***@toomuchspamalready.nl) writes:
To answer the other question: yes, collation makes a small difference.
To minimize impact of collation, use any of the binary collations.


I seem to recall numbers like 20% overhead for a case-sensitive
collation and 35% for an accent- and case-ínsensitive collation. But
this was very long ago, probably during 6.5 days, and may not be relevant.

I would recommend that you use the default collation of the database,
and if you have business requirements for, say, case-insensitive go
with that. The performance gain you could make by making these
particular columns using a binary collation may not exceed the cost
for the increased complexity.

An alternative would be to se binary values altogether, and avoid the
char problem.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #5
Tim,

Yes, there will some performance overhead to use char(7) as index key rather
than integer since string comparison normally is more expensive than integer
comparison. Also since integer takes 4 bytes while char(70 takes 7 bytes,
you should expect to see a small space increase with this change. The
performance overhead caused by copying the extra 3 bytes around in the
server normally is too small to notice.

Collation definitely makes a different when it comes to string comparison.
Try to use binary collation which normally compares fastest for string.

--
Gang He
Software Design Engineer
Microsoft SQL Server Storage Engine

This posting is provided "AS IS" with no warranties, and confers no rights.
"twkelsey" <tw******@hotmail.com> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Hi,

My company has a scenario where we would like to change the data type
of an existing primary key from an integer to a char, but we are
concerned about the performance implications of doing so. The script
for the two tables that we need to modify is listed below. Table
FR_Sessions contains a column named TransmissionID which is currently
an integer. This table contains about 1 million rows of data. Table
FR_VTracking table also contains the TransmissionID as part of it's
primary key and it contains about 35 millions rows of data. These two
tables are frequently joined on TransmissionID (FR_Sessions is the
parent). The TransmissionID column is used primarily for joins and is
not typically displayed.

We need like to change the TransmissionID data type from int to
char(7), and I had a few questions:

1) Would this introduce significant performance degradation? I have
read that char keys/indexes are slower than int/numeric.
2) Are there collation options (or any other optimizations) that we
could use to minimize the performance hit of the char(7)...if so which
ones?

I am a software architect by trade, not a database guru, so please go
easy on my if I overlooked something obvious :)

Any suggestions or information would be greatly appreciated.

Thanks,
Tim

-------------------

CREATE TABLE [FR_Sessions] (
[TransmissionID] [int] IDENTITY (1, 1) NOT NULL ,
[PTUID] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[PortNum] [numeric](6, 0) NOT NULL CONSTRAINT [DF_FR_Sessions_PortNum]
DEFAULT (0),
[CloseStatus] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[RecvBytes] [int] NULL ,
[SendBytes] [int] NULL ,
[EndDT] [datetime] NULL CONSTRAINT [DF_FR_Sessions_EndDT] DEFAULT
(getutcdate()),
[LocalEndDT] [datetime] NULL ,
[TotalTime] [int] NULL ,
[OffenderID] [numeric](9, 0) NULL ,
[UploadStatus] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL CONSTRAINT [DF_FR_Sessions_UploadStatus] DEFAULT ('N'),
[SchedBatchID] [numeric](18, 0) NULL ,
[SWVersion] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DLST] [bit] NULL ,
[TZO] [smallint] NULL ,
[Processed] [bit] NOT NULL CONSTRAINT [DF_FR_Sessions_Processed]
DEFAULT (0),
[CallerID] [varchar] (13) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PeerIP] [varchar] (16) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[XtraInfo] [varchar] (1024) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[IdType] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
CONSTRAINT [PK_FR_Sessions] PRIMARY KEY CLUSTERED
(
[TransmissionID]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [FR_VTracking] (
[TransmissionID] [int] NOT NULL ,
[FrameNum] [int] NOT NULL ,
[LatDegrees] [float] NOT NULL ,
[LonDegrees] [float] NOT NULL ,
[Altitude] [float] NOT NULL ,
[Velocity] [float] NOT NULL ,
[NumPositions] [smallint] NOT NULL ,
[NavMode] [smallint] NOT NULL ,
[Units] [smallint] NOT NULL ,
[GPSTrackingID] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[dtStamp] [datetime] NULL ,
CONSTRAINT [PK_FR_VTracking] PRIMARY KEY CLUSTERED
(
[TransmissionID],
[FrameNum]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

Jul 23 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

12 posts views Thread by Fred | last post: by
2 posts views Thread by Andreas Meyer | last post: by
5 posts views Thread by dmhendricks | last post: by
22 posts views Thread by jdokos | last post: by
11 posts views Thread by Walter Dnes (delete the 'z' to get my real address | last post: by
4 posts views Thread by Nick Chan | last post: by
8 posts views Thread by oogie | last post: by
1 post views Thread by Waqarahmed | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.