field must appear in the GROUP BY clause or be used in an aggregatefunction?

Bill Moran

Hey all.

I've hit an SQL problem that I'm a bit mystified by. I have two different
questions regarding this problem: why? and how do I work around it?

The following query:

SELECT GCP.id,
GCP.Name
FROM Gov_Capital_Project GCP,
WHERE TLM.TLI_ID = $2
group by GCP.id
ORDER BY gcp.name;

Produces the following error:

ERROR: column "gcp.name" must appear in the GROUP BY clause or be used in an aggregate function

That field is a CHAR, so I'm not sure what kind of aggregate to use,
or (more important to my understanding) why one is necessary.

As I said, I'm not sure I understand why this occurs. I'm assuming that I
don't understand "group by" as well as I thought I did ;)

This isn't my query, I'm translating a system prototyped in MSSQL to
Postgres. This query _does_ work in MSSQL. Does that constitute a
bug in MSSQL, or a shortcomming of Postgres, or just a difference of
interpretation?

--
Bill Moran
Potential Technologies
http://www.potentialtech.com
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #1

Subscribe Post Reply

25201

John Sidney-Woollett

Bill Moran said:

I've hit an SQL problem that I'm a bit mystified by. I have two different
questions regarding this problem: why? and how do I work around it?

The following query:

SELECT GCP.id,
GCP.Name
FROM Gov_Capital_Project GCP,
WHERE TLM.TLI_ID = $2
group by GCP.id
ORDER BY gcp.name;

Produces the following error:

ERROR: column "gcp.name" must appear in the GROUP BY clause or be used in
an aggregate function

Since you're not agregating data, can't you use a select distinct instead?

SELECT distinct GCP.id, GCP.Name
FROM Gov_Capital_Project GCP, {?something missing here?}
WHERE TLM.TLI_ID = $2
ORDER BY gcp.name;

(BTW, I wasn't clear if the where clause trying to join to another table?)

Doesn't answer your original question, but hope it helps anyway.

John Sidney-Woollett
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 22 '05 #2

Bill Moran

John Sidney-Woollett wrote:

Bill Moran said:
I've hit an SQL problem that I'm a bit mystified by. I have two different
questions regarding this problem: why? and how do I work around it?

The following query:

SELECT GCP.id,
GCP.Name
FROM Gov_Capital_Project GCP,
WHERE TLM.TLI_ID = $2
group by GCP.id
ORDER BY gcp.name;

Produces the following error:

ERROR: column "gcp.name" must appear in the GROUP BY clause or be used in
an aggregate function
Since you're not agregating data, can't you use a select distinct instead?

Not sure. I'll have to get back to the programmer who wrote the orignal
SELECT and find out what kind of data he is actually trying to acquire.
SELECT distinct GCP.id, GCP.Name
FROM Gov_Capital_Project GCP, {?something missing here?}
WHERE TLM.TLI_ID = $2
ORDER BY gcp.name;

(BTW, I wasn't clear if the where clause trying to join to another table?)
Yes, my bad. The actual query causing the problem is a bit longer with about
6 joins to it. I did test:

select id, name from gov_capital_project group by id order by name;

and it causes the same error, so I thought I'd make the question simpler by
removing the parts that obviously weren't contributing to the problem.
Doesn't answer your original question, but hope it helps anyway.

It may, thanks for the input!

Like I said, the most important part (to me) is to understand why
Postgres refuses to run this. The fact that I don't know why points
to an obvious lack of understanding on my account, and I'd like to
remedy that :D

To that effect, if anyone can point me to a doc that will help me
gain a better understanding of why this error occurs, I'd be happy
to read it!

--
Bill Moran
Potential Technologies
http://www.potentialtech.com
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 22 '05 #3

Mike Mascari

Bill Moran wrote:

Hey all.

I've hit an SQL problem that I'm a bit mystified by. I have two different
questions regarding this problem: why? and how do I work around it?

The following query:

SELECT GCP.id,
GCP.Name
FROM Gov_Capital_Project GCP,
WHERE TLM.TLI_ID = $2
group by GCP.id
ORDER BY gcp.name;

Produces the following error:

ERROR: column "gcp.name" must appear in the GROUP BY clause or be used
in an aggregate function
The reason the grouping requires either an attribute to be
aggregated or apart of the group by list is that if it were not, an
arbitrary value would have to be selected:

[test@lexus] select * from projects;
dept | project
-----------+--------------
Finance | Y2K
Corporate | Y2K
Corporate | Annual Audit
(3 rows)
[test@lexus] select dept, project from projects group by dept;
ERROR: column "projects.project" must appear in the GROUP BY clause
or be used in an aggregate function

If this were to be permitted, which project should be selected,
'Y2K' or 'Annual Audit'?

[test@lexus] select dept, project from projects group by dept, project;
dept | project
-----------+--------------
Corporate | Y2K
Corporate | Annual Audit
Finance | Y2K
(3 rows)
Of course, this has little meaning without an aggregate. All you're
doing is leveraging GROUP BY's sort. You might as well use DISTINCT.
More useful would be:

[test@lexus] select dept, count(project) from projects group by dept;
dept | count
-----------+-------
Finance | 1
Corporate | 2
(2 rows)

or perhaps:

[test@lexus] select count(dept), project from projects group by project;
count | project
-------+--------------
2 | Y2K
1 | Annual Audit
This isn't my query, I'm translating a system prototyped in MSSQL to
Postgres. This query _does_ work in MSSQL. Does that constitute a
bug in MSSQL, or a shortcomming of Postgres, or just a difference of
interpretation?

If MSSQL picks an arbitrary value for the non-group by attribute, it
is violating spec.

Mike Mascari

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #4

John Sidney-Woollett

Bill Moran said:

Like I said, the most important part (to me) is to understand why
Postgres refuses to run this. The fact that I don't know why points
to an obvious lack of understanding on my account, and I'd like to
remedy that :D
I have always assumed that you had to place all (non aggregated) columns
in your select in the "group by" clause as well. I suspect that the other
database isn't so picky (or is incorrect?).

Presumably changing the query to:
select id, name from gov_capital_project group by id, name order by name;
works fine?
To that effect, if anyone can point me to a doc that will help me
gain a better understanding of why this error occurs, I'd be happy
to read it!

Have a look at:
http://www.postgresql.org/docs/7.4/s...ml#SQL-GROUPBY

[excerpted text]

GROUP BY Clause

The optional GROUP BY clause has the general form

GROUP BY expression [, ...]

GROUP BY will condense into a single row all selected rows that share the
same values for the grouped expressions. expression can be an input column
name, or the name or ordinal number of an output column (SELECT list
item), or an arbitrary expression formed from input-column values. In case
of ambiguity, a GROUP BY name will be interpreted as an input-column name
rather than an output column name.

Aggregate functions, if any are used, are computed across all rows making
up each group, producing a separate value for each group (whereas without
GROUP BY, an aggregate produces a single value computed across all the
selected rows). When GROUP BY is present, it is not valid for the SELECT
list expressions to refer to ungrouped columns except within aggregate
functions, since there would be more than one possible value to return for
an ungrouped column.

John Sidney-Woollett

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 22 '05 #5

scott.marlowe

On Fri, 27 Feb 2004, Bill Moran wrote:

Hey all.

I've hit an SQL problem that I'm a bit mystified by. I have two different
questions regarding this problem: why? and how do I work around it?

The following query:

SELECT GCP.id,
GCP.Name
FROM Gov_Capital_Project GCP,
WHERE TLM.TLI_ID = $2
group by GCP.id
ORDER BY gcp.name;

Produces the following error:

ERROR: column "gcp.name" must appear in the GROUP BY clause or be used in an aggregate function

OK, let's look at a test table:

id | data
---------
0 | 'abc'
0 | 'def'
1 | 'ghi'

Now, let's use this query:

select id, data from test_table group by id;

what results should I get back?

I have two possible results for the data column, abc and def. But I only
get one row with a 0 in it, so which one of those do I pick?

If I use an aggregate I can be sure to get the first or last one:

select id, max(data) from test_table group by id;

Also, you may want to look at postgresql's extension, "distinct on":

http://www.postgresql.org/docs/7.4/s...ERIES-DISTINCT

It can give you the kind of results you want.

select distinct on (id) id, data from test_table;

But is know to be indeterminate, so you may get different results each
time.
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #6

Richard Huxton

On Friday 27 February 2004 16:39, Bill Moran wrote:

John Sidney-Woollett wrote:
Bill Moran said:

SELECT GCP.id,
GCP.Name
FROM Gov_Capital_Project GCP,
WHERE TLM.TLI_ID = $2
group by GCP.id
ORDER BY gcp.name; ERROR: column "gcp.name" must appear in the GROUP BY clause or be used
in an aggregate function
Like I said, the most important part (to me) is to understand why
Postgres refuses to run this. The fact that I don't know why points
to an obvious lack of understanding on my account, and I'd like to
remedy that :D

Like the error message says, if you're using GROUP BY everything in the SELECT
list must be an aggregate SUM(...) or used in the GROUP BY.

So, this is OK:
SELECT dept, week, SUM(amt_sold)
FROM weekly_sales
GROUP BY dept,week;
This isn't:
SELECT dept, week, SUM(amt_sold)
FROM weekly_sales
GROUP BY dept;

Ask yourself which "week" should be returned in the second case.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #7

Michael Chaney

On Fri, Feb 27, 2004 at 11:11:28AM -0500, Bill Moran wrote:

Hey all.

I've hit an SQL problem that I'm a bit mystified by. I have two different
questions regarding this problem: why? and how do I work around it?

The following query:

SELECT GCP.id,
GCP.Name
FROM Gov_Capital_Project GCP,
WHERE TLM.TLI_ID = $2
group by GCP.id
ORDER BY gcp.name;

Produces the following error:

ERROR: column "gcp.name" must appear in the GROUP BY clause or be used in
an aggregate function

That field is a CHAR, so I'm not sure what kind of aggregate to use,
or (more important to my understanding) why one is necessary.

As I said, I'm not sure I understand why this occurs. I'm assuming that I
don't understand "group by" as well as I thought I did ;)

This isn't my query, I'm translating a system prototyped in MSSQL to
Postgres. This query _does_ work in MSSQL. Does that constitute a
bug in MSSQL, or a shortcomming of Postgres, or just a difference of
interpretation?

Well, if "non-standard" == "bug", then it's a bug in mssql. Your query
doesn't make any sense. What value for "Name" should be chosen if
there's more than one?

Michael
--
Michael Darrin Chaney
md******@michaelchaney.com
http://www.michaelchaney.com/

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 22 '05 #8

Tom Lane

Mike Mascari <ma*****@mascari.com> writes:

Bill Moran wrote:
SELECT GCP.id,
GCP.Name
FROM Gov_Capital_Project GCP,
WHERE TLM.TLI_ID = $2
group by GCP.id
ORDER BY gcp.name;

ERROR: column "gcp.name" must appear in the GROUP BY clause or be used
in an aggregate function

This isn't my query, I'm translating a system prototyped in MSSQL to
Postgres. This query _does_ work in MSSQL. Does that constitute a
bug in MSSQL, or a shortcomming of Postgres, or just a difference of
interpretation?
If MSSQL picks an arbitrary value for the non-group by attribute, it
is violating spec.

They might be operating per spec. If "id" is a primary or unique key
for the table, then SQL99 (but not SQL92) says that it's sufficient to
group by the id column; the database is supposed to realize that the
other columns can't have more than one value per group, and allow direct
references to them. Or at least that's my interpretation of the pages
and pages in SQL99 about functional dependency. It seems like a pretty
useless frammish ... if you know that id is unique, why are you
bothering with GROUP BY at all?

Anyway, Postgres currently implements the SQL92 definition, which is
that you can't refer to an ungrouped column except within an aggregate
function call. So you need to call out all the columns to be referenced
in GROUP BY.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 22 '05 #9

Bill Moran

Mike Mascari wrote:

Bill Moran wrote:
Hey all.

I've hit an SQL problem that I'm a bit mystified by. I have two
different
questions regarding this problem: why? and how do I work around it?

The following query:

SELECT GCP.id,
GCP.Name
FROM Gov_Capital_Project GCP,
WHERE TLM.TLI_ID = $2
group by GCP.id
ORDER BY gcp.name;

Produces the following error:

ERROR: column "gcp.name" must appear in the GROUP BY clause or be
used in an aggregate function

The reason the grouping requires either an attribute to be aggregated or
apart of the group by list is that if it were not, an arbitrary value
would have to be selected:

Thanks to everyone who responded. All the replies have been very helpful.

Talking with the originator of the SQL statement, I came up with this:

select id, max(name) from gov_capital_project group by id order by name;
ERROR: column "gov_capital_project.name" must appear in the GROUP BY clause or be used in an aggregate function

I turned that over in my head a little and tried this:
select id, max(name) from gov_capital_project group by id order by MAX(name);

Which finally works! As far as I understand it, that query will supply the
same results as they were getting from MSSQL on the previous query.

A little more playing around shows that this also works:
select id, max(name) as name from gov_capital_project group by id order by name;

Which will probably be a little faster since MAX() is evaluated less.

Now I'm starting to see (maybe) why the query worked under MSSQL. the
MSSQL version had:

SELECT id as [ID], max(name) as [Name] from gov_capital_project group by id order by name;

I'm guessing that MSSQL is fuzzy enought to figure that "group by name" actually
means "group by [Name]"?

--
Bill Moran
Potential Technologies
http://www.potentialtech.com
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 22 '05 #10

Bill Moran

Tom Lane wrote:

Mike Mascari <ma*****@mascari.com> writes:
Bill Moran wrote:
SELECT GCP.id,
GCP.Name
FROM Gov_Capital_Project GCP,
WHERE TLM.TLI_ID = $2
group by GCP.id
ORDER BY gcp.name;

ERROR: column "gcp.name" must appear in the GROUP BY clause or be used
in an aggregate function

This isn't my query, I'm translating a system prototyped in MSSQL to
Postgres. This query _does_ work in MSSQL. Does that constitute a
bug in MSSQL, or a shortcomming of Postgres, or just a difference of
interpretation?
If MSSQL picks an arbitrary value for the non-group by attribute, it
is violating spec.

They might be operating per spec. If "id" is a primary or unique key
for the table, then SQL99 (but not SQL92) says that it's sufficient to
group by the id column; the database is supposed to realize that the
other columns can't have more than one value per group, and allow direct
references to them. Or at least that's my interpretation of the pages
and pages in SQL99 about functional dependency. It seems like a pretty
useless frammish ... if you know that id is unique, why are you
bothering with GROUP BY at all?

It's possible that you're right about MSSQL, the column in question _is_
unique. I also had another theory (see other post).

As for why I'm using a GROUP BY: it's not my decision, I'm converting
SQL that someone else wrote, and (honestly) I don't understand the
data well enough to say whether it's required in this query or not.

Also, the _actual_ query that I'm converting here is more complex than
this (it's a join of 5 tables) but in my experimenting/testing, I found
that the query that I had minimized down to had the exact same behaviour.
So I posted the simplified query instead of the actual query, to make it
easier on those who would reply.

If you think it would help with Postgres' development, I'll give you
access to my development machine and the actual query involved. I'm
sure the client won't mind, since their banking their future on the
reliability of Postgres anyway ;)
Anyway, Postgres currently implements the SQL92 definition, which is
that you can't refer to an ungrouped column except within an aggregate
function call. So you need to call out all the columns to be referenced
in GROUP BY.

To me, that seems the most likely explanation (i.e. id is a primary key,
and MSSQL is SQL99 compliant)

--
Bill Moran
Potential Technologies
http://www.potentialtech.com
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #11

terry

Hey guys

I have dealt with this before.

And there is a simple solution: If the value really is unique, just wrap it
in a max(). Since it's unique, it has *ZERO* effect on your output, but it
then complies to PostgreSQL's GROUP BY implementation, and hence will run...

Terry Fielder
Manager Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
te***@greatgulfhomes.com
Fax: (416) 441-9085

-----Original Message-----
From: pg*****************@postgresql.org
[mailto:pg*****************@postgresql.org]On Behalf Of Tom Lane
Sent: Friday, February 27, 2004 1:09 PM
To: Mike Mascari
Cc: Bill Moran; pg***********@postgresql.org
Subject: Re: [GENERAL] field must appear in the GROUP BY clause or be
used
Mike Mascari <ma*****@mascari.com> writes:
Bill Moran wrote:
SELECT GCP.id,
GCP.Name
FROM Gov_Capital_Project GCP,
WHERE TLM.TLI_ID = $2
group by GCP.id
ORDER BY gcp.name;

ERROR: column "gcp.name" must appear in the GROUP BY clause or be used in an aggregate function

This isn't my query, I'm translating a system prototyped in MSSQL to Postgres. This query _does_ work in MSSQL. Does that constitute a
bug in MSSQL, or a shortcomming of Postgres, or just a difference of interpretation?

If MSSQL picks an arbitrary value for the non-group by

attribute, it
is violating spec.

They might be operating per spec. If "id" is a primary or unique key
for the table, then SQL99 (but not SQL92) says that it's sufficient to
group by the id column; the database is supposed to realize that the
other columns can't have more than one value per group, and
allow direct
references to them. Or at least that's my interpretation of the pages
and pages in SQL99 about functional dependency. It seems
like a pretty
useless frammish ... if you know that id is unique, why are you
bothering with GROUP BY at all?

Anyway, Postgres currently implements the SQL92 definition, which is
that you can't refer to an ungrouped column except within an aggregate
function call. So you need to call out all the columns to be
referenced
in GROUP BY.

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #12

Tom Lane

[ drifting a bit off the thread topic, but just for completeness... ]

Bill Moran <wm****@potentialtech.com> writes:

I turned that over in my head a little and tried this:
select id, max(name) from gov_capital_project group by id order by MAX(name);
...
A little more playing around shows that this also works:
select id, max(name) as name from gov_capital_project group by id order by name; Which will probably be a little faster since MAX() is evaluated less.

Actually I believe you'll get the exact same plan either way. GROUP and
ORDER BY expressions are merged with any matching SELECT-list entries
during parsing.

In fact, as of (I think) 7.4, the executor detects and eliminates
duplicate aggregate-function calls even when the parser didn't.
So for instance this:
SELECT max(x), max(x) + 1 FROM ...
will only run the MAX() aggregate once.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #13

field must appear in the GROUP BY clause or be used in an aggregatefunction?

Similar topics