what is the speed hit of "SELECT *" with MySql, as opposed to narrower database calls?

lkrubner

Are there any benchmarks on how much an extra, unneeded VARCHAR, CHAR,
INT, BIGINT, TEXT or MEDIUMTEXT slows down a database call with MySql?
PostGre info would also be useful.

I'm trying to explain to some friends the utility of making database
calls return only needed data.

As an example of what I'm talking about, suppose we had a database
table sort of like this:

table weblogs (
int id,
varchar 255 headline,
text mainContent,
int dateCreated,
varchar 255 author,
varchar 255 navigationText,
char 1 isPrivate
);
Suppose I'm doing a PHP command that get's the headlines and offers a
link to the actual page. What is the speed difference between

SELECT * FROM weblogs

as opposed to

SELECT id, headline FROM weblogs

And is there info out there that gives a general sense of how much each
extra, unneeded database field might slow down a script?

Jul 17 '05 #1

Subscribe Post Reply

2451

Philip Nelson

lk******@geocities.com wrote:

Are there any benchmarks on how much an extra, unneeded VARCHAR, CHAR,
INT, BIGINT, TEXT or MEDIUMTEXT slows down a database call with MySql?
PostGre info would also be useful.

I'm trying to explain to some friends the utility of making database
calls return only needed data.

As an example of what I'm talking about, suppose we had a database
table sort of like this:

table weblogs (
int id,
varchar 255 headline,
text mainContent,
int dateCreated,
varchar 255 author,
varchar 255 navigationText,
char 1 isPrivate
);
Suppose I'm doing a PHP command that get's the headlines and offers a
link to the actual page. What is the speed difference between

SELECT * FROM weblogs

as opposed to

SELECT id, headline FROM weblogs

And is there info out there that gives a general sense of how much each
extra, unneeded database field might slow down a script?

I've not got any figures on performance, although depending on the number of
columns involved and their size it could be quite a bit. I can tell you of
one instance I came across with another DBMS using ODBC, which just loves
to read the system catalog for information about everything right down to
column level, that to return 5000 rows of user data from a 70 column table
took over 2 million page I/Os to the system catalog. Not exactly great
for performance.

In my opinion there are other, more important, reasons for not using "SELECT
*". These principally relate to what will happen to the application if
you change the database structure. For example, in the table you gave lets
assume we decide to add a column publisher varchar(255) immediately after
the author column. Let's also assume that you access the columns in a
resultset by position, rather than by name. Now if you use a "select *"
where you displayed navigationText before you will now display publisher.

Similar problems occur with insert statements where the column list isn't
specified, but just the values list. Everything is OK until you add a
column. If the last column is "NOT NULL" the statement breaks. Worse
though you can end up putting data into the wrong places.

HTH

Phil

Jul 17 '05 #2

jerry gitomer

lk******@geocities.com wrote:

Are there any benchmarks on how much an extra, unneeded VARCHAR, CHAR,
INT, BIGINT, TEXT or MEDIUMTEXT slows down a database call with MySql?
PostGre info would also be useful.

I'm trying to explain to some friends the utility of making database
calls return only needed data.

As an example of what I'm talking about, suppose we had a database
table sort of like this:

table weblogs (
int id,
varchar 255 headline,
text mainContent,
int dateCreated,
varchar 255 author,
varchar 255 navigationText,
char 1 isPrivate
);
Suppose I'm doing a PHP command that get's the headlines and offers a
link to the actual page. What is the speed difference between

SELECT * FROM weblogs

as opposed to

SELECT id, headline FROM weblogs

And is there info out there that gives a general sense of how much each
extra, unneeded database field might slow down a script?

I think you will find it isn't going to make much difference
unless you are retrieving several million rows.

The amount of data read is dictated by the physical read
characteristics of the disk subsystem. The database takes what
the disk subsystem hands off to the OS, locates the desired
table row, and then parses the row to pull out either each
column or the specified columns.

The bulk of the time is spent waiting for disk i/o completion
and the difference in time a high performance database like
MySQL spends parsing the data and extracting and presenting
either every column or a subset of the columns isn't significant.
BTW, I agree with Philip Nelson, who responded earlier, and felt
that SELECT * should be avoided due to possible problems arising
from future maintenance changes to the database.
HTH

Jerry

Jul 17 '05 #3

Tony Marston

"Philip Nelson" <te*****@scotdb.com> wrote in message
news:we***********************@news.easynews.com.. .

lk******@geocities.com wrote:

Are there any benchmarks on how much an extra, unneeded VARCHAR, CHAR,
INT, BIGINT, TEXT or MEDIUMTEXT slows down a database call with MySql?
PostGre info would also be useful.

I'm trying to explain to some friends the utility of making database
calls return only needed data.

As an example of what I'm talking about, suppose we had a database
table sort of like this:

table weblogs (
int id,
varchar 255 headline,
text mainContent,
int dateCreated,
varchar 255 author,
varchar 255 navigationText,
char 1 isPrivate
);
Suppose I'm doing a PHP command that get's the headlines and offers a
link to the actual page. What is the speed difference between

SELECT * FROM weblogs

as opposed to

SELECT id, headline FROM weblogs

And is there info out there that gives a general sense of how much each
extra, unneeded database field might slow down a script?
I've not got any figures on performance, although depending on the number
of
columns involved and their size it could be quite a bit. I can tell you
of
one instance I came across with another DBMS using ODBC, which just loves
to read the system catalog for information about everything right down to
column level, that to return 5000 rows of user data from a 70 column table
took over 2 million page I/Os to the system catalog. Not exactly great
for performance.

In my opinion there are other, more important, reasons for not using
"SELECT
*". These principally relate to what will happen to the application if
you change the database structure. For example, in the table you gave
lets
assume we decide to add a column publisher varchar(255) immediately after
the author column. Let's also assume that you access the columns in a
resultset by position, rather than by name. Now if you use a "select *"
where you displayed navigationText before you will now display publisher.

That's the primary reason for returning associative lists, so that you can
reference items by name instead of their position. That way is does not
matter in what order the columns are retrieved.
Similar problems occur with insert statements where the column list isn't
specified, but just the values list. Everything is OK until you add a
column. If the last column is "NOT NULL" the statement breaks. Worse
though you can end up putting data into the wrong places.

If you do what I do and construct all INSERT, UPDATE and DELETE statements
programmatically then this problem will never appear.

--
Tony Marston

http://www.tonymarston.net

Jul 17 '05 #4

Andy Hassall

On 9 Apr 2005 12:40:13 -0700, lk******@geocities.com wrote:

Are there any benchmarks on how much an extra, unneeded VARCHAR, CHAR,
INT, BIGINT, TEXT or MEDIUMTEXT slows down a database call with MySql?
PostGre info would also be useful.

I'm trying to explain to some friends the utility of making database
calls return only needed data.

As an example of what I'm talking about, suppose we had a database
table sort of like this:

table weblogs (
int id,
varchar 255 headline,
text mainContent,
int dateCreated,
varchar 255 author,
varchar 255 navigationText,
char 1 isPrivate
);
Suppose I'm doing a PHP command that get's the headlines and offers a
link to the actual page. What is the speed difference between

SELECT * FROM weblogs

as opposed to

SELECT id, headline FROM weblogs

And is there info out there that gives a general sense of how much each
extra, unneeded database field might slow down a script?

Well, It Depends.

How wide is the row? If you've got a LONGTEXT in there with 1Mb of data, the
impact is going to be far greater if you fetch it when you don't need it.

What's your query? I don't know if MySQL has this feature, but in Oracle,
let's say you query just primary key columns; this can be answered more quickly
by scanning the index rather than the table (since the index is smaller), which
eliminates lots of lookups from the index to the table itself. But if you
request columns that aren't in the index, then it's got to hit the table to get
the data. So the same query conditions with a different select column list can
be executed very differently.

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool

Jul 17 '05 #5

lkrubner

Thanks for all the terrific feedback. I agree its best to reference
fields by their names.

But no articles that anyone can me to? No benchmarks from the open
source community? Disappointing.

Does it strike this group as likely that MediumText is likely to cause
a bigger hit than CHAR 1 ????

Jul 17 '05 #6

Alvaro G Vicario

*** lk******@geocities.com wrote/escribió (9 Apr 2005 12:40:13 -0700):

SELECT * FROM weblogs

as opposed to

SELECT id, headline FROM weblogs

The query itself is faster with *. Performance issues arise when you are
reading fields you don't need, which can happen:

* When your script doesn't use all table fields.

* If you add new fields to a table and do not edit *all* the scripts that
use these tables.

Performance gain with * is minimal but potential loss is huge: imagine you
add a BLOB field to store files (let's say, a picture of the country) and
retrieve the complete image from DB everytime you want to create a <select>
field to choose countries.
--
-- Álvaro G. Vicario - Burgos, Spain
-- http://bits.demogracia.com - Mi sitio sobre programación web
-- Don't e-mail me your questions, post them to the group
--

Jul 17 '05 #7

Andy Hassall

On Fri, 15 Apr 2005 13:44:00 +0200, Alvaro G Vicario
<al******************@telecomputeronline.com> wrote:

SELECT * FROM weblogs

as opposed to

SELECT id, headline FROM weblogs

The query itself is faster with *.

Any evidence of that? That doesn't make a lot of sense to me.

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool

Jul 17 '05 #8

CJ Llewellyn

On Tue, 12 Apr 2005 13:21:51 -0700, lkrubner wrote:

Thanks for all the terrific feedback. I agree its best to reference
fields by their names.

But no articles that anyone can me to? No benchmarks from the open
source community? Disappointing.

Does it strike this group as likely that MediumText is likely to cause
a bigger hit than CHAR 1 ????

Doesn't it strike you as bleading obvious that a CHAR(1) can not possibly
be used for the same purposes as BLOB data type? Therefore comparing the
performance characteristics of both types is an exercise left upto morons
and idiots?

While you're at it, why not compare the performance between an Oil Tanker
and a Canoe?

Jul 17 '05 #9

Alvaro G Vicario

*** Andy Hassall wrote/escribió (Fri, 15 Apr 2005 20:01:06 +0100):

The query itself is faster with *.

Any evidence of that? That doesn't make a lot of sense to me.

It's pretty obvious. Database server doesn't need to look for requested
fields (I don't know how that works internally, but I'd say it needs to
compare the strings of requested fields with every table field name), it
just needs to fetch all columns.

Of course, when I say "faster" I just mean 1/1000 seconds.

--
-- Álvaro G. Vicario - Burgos, Spain
-- http://bits.demogracia.com - Mi sitio sobre programación web
-- Don't e-mail me your questions, post them to the group
--

Jul 17 '05 #10

Malcolm Dew-Jones

Alvaro G Vicario (al******************@telecomputeronline.com) wrote:
: *** Andy Hassall wrote/escribió (Fri, 15 Apr 2005 20:01:06 +0100):
: >>The query itself is faster with *.
: >
: > Any evidence of that? That doesn't make a lot of sense to me.

: It's pretty obvious. Database server doesn't need to look for requested
: fields (I don't know how that works internally, but I'd say it needs to
: compare the strings of requested fields with every table field name), it
: just needs to fetch all columns.

I wouldn't make that assumption, though it may be true.

It is just as possible that the * is converted into a set of columns names
and then retrieved identically to the query that simply listed the column
names.

Personally I think the key factor is programmer productivity and code
correctness.

I find select * very useful because I don't have to decide until later
which columns I am going to use (in reports etc).

If fields are accessed by name then the code will not change if the
database changes.

More commonly, if the database changes then things like reports are
_supposed_ to change - but guess what - the report can incorporate those
changes by simply adding the fields, by name, into the output - none of
the underlying queries have to change. Which means less work and fewer
places for error.

I doubt that select * will introduce the type of fundamental inefficiency
that needs to be avoided at all cost, except in a few obvious cases.

$0.04 (inflation)

--

This space not for rent.

Jul 17 '05 #11

Andy Hassall

On Mon, 18 Apr 2005 10:01:12 +0200, Alvaro G Vicario
<al******************@telecomputeronline.com> wrote:

*** Andy Hassall wrote/escribió (Fri, 15 Apr 2005 20:01:06 +0100):
The query itself is faster with *.
Any evidence of that? That doesn't make a lot of sense to me.

It's pretty obvious. Database server doesn't need to look for requested
fields (I don't know how that works internally, but I'd say it needs to
compare the strings of requested fields with every table field name), it
just needs to fetch all columns.

How does it know what "all columns" are? It's got to look them up. So you're
effectively back where you started.
Of course, when I say "faster" I just mean 1/1000 seconds.

All this affects is parsing time, which (should) be an insignificant
proportion of total query time. If it makes milliseconds worth of difference to
parsing time I would be surprised.

--
Andy Hassall / <an**@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool

Jul 17 '05 #12

CJ Llewellyn

On Mon, 18 Apr 2005 10:02:58 -0800, Malcolm Dew-Jones wrote:

More commonly, if the database changes then things like reports are
_supposed_ to change - but guess what - the report can incorporate those
changes by simply adding the fields, by name, into the output - none of
the underlying queries have to change. Which means less work and fewer
places for error.

I doubt that select * will introduce the type of fundamental inefficiency
that needs to be avoided at all cost, except in a few obvious cases.

You've obviously not worked with some of the data sets and queries I've
worked with.

When a SQL report takes 20 minutes to run you know that it's down to bad
design and inefficient programming.

When I started working in IT storing more than three months worth of data
online was virtually unheard of. Now we expect everything to be kept
inpurpetual.

You might not regret your poor design decisions today, but seven years
down the line, you can almost be sure that it'll come back with an
avengance.

Jul 17 '05 #13

Malcolm Dew-Jones

CJ Llewellyn (cj**********@gmail.com) wrote:
: On Mon, 18 Apr 2005 10:02:58 -0800, Malcolm Dew-Jones wrote:

: >
: > More commonly, if the database changes then things like reports are
: > _supposed_ to change - but guess what - the report can incorporate those
: > changes by simply adding the fields, by name, into the output - none of
: > the underlying queries have to change. Which means less work and fewer
: > places for error.
: >
: > I doubt that select * will introduce the type of fundamental inefficiency
: > that needs to be avoided at all cost, except in a few obvious cases.

: You've obviously not worked with some of the data sets and queries I've
: worked with.

: When a SQL report takes 20 minutes to run you know that it's down to bad
: design and inefficient programming.

And if the select * is the cause of the slowdown then updating the query
to use exactly the columns required is about as trivial a bit of
maintenance as any I can imagine.

: When I started working in IT storing more than three months worth of data
: online was virtually unheard of. Now we expect everything to be kept
: inpurpetual.

And how does select * have anything to do with that?

: You might not regret your poor design decisions today, but seven years
: down the line, you can almost be sure that it'll come back with an
: avengance.

It's hard to imagine they could be a problem for seven years with no one
checking what was wrong, and just as hard to imagine that they could work
fine for seven years and suddenly become a problem.

I've seen a number of reports containing undetected logic flaws that are
used for some years before the error is detected. Compare that to an
inefficient query - it's pretty easy to detect, possibly easy to fix, and
is nothing more than an annoyance. I think that simple code that avoids
logic errors is much more important.
--

This space not for rent.

Jul 17 '05 #14

Alvaro G Vicario

*** Andy Hassall wrote/escribió (Mon, 18 Apr 2005 19:04:40 +0100):

All this affects is parsing time, which (should) be an insignificant
proportion of total query time. If it makes milliseconds worth of difference to
parsing time I would be surprised.

Yeah, I thought that was clear in my message.
--
-- Álvaro G. Vicario - Burgos, Spain
-- http://bits.demogracia.com - Mi sitio sobre programación web
-- Don't e-mail me your questions, post them to the group
--

Jul 17 '05 #15

lkrubner

I'm not sure I get you on this. If you're point is that one shouldn't
get fields one doesn't need, and therefore there is no need to compare
different fields, then we agree, but the agreement is sort of besides
the point. I was trying to explain to people who don't know much about
databases how each unneeded field slowed down a query, and I was
looking for some hard numbers to make my point. If a MediumText field
slows down a query more than a Char(1) field does, then that too would
help make my point.

While you're at it, why not compare the performance
between an Oil Tanker and a Canoe?

That is just what I'm trying to do, in a sense. An oil tanker can hold
more and go across a big ocean, a canoe, on the other hand, could make
its way down small creeks that an oil tanker would never be able to
traverse. Each of these things have different strengths and weaknesses,
and there is a different cost associated with each. In regards the
MediumText and Char fields, I was hoping to quantify the time cost
associated with each.

Jul 17 '05 #16

lkrubner

> And if the select * is the cause of the slowdown then updating the
query

to use exactly the columns required is about as trivial a bit of
maintenance as any I can imagine.

It wouldn't be trivial for a large corporate web design team where some
programmers were responsible for the database teir and others for the
template teir, though it wouldn't be any more complex than using
specific field lists from the start.

Jul 17 '05 #17

what is the speed hit of "SELECT *" with MySql, as opposed to narrower database calls?

Similar topics