469,347 Members | 18,544 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,347 developers. It's quick & easy.

Combine multiple rows with one common field into one row & dedupe

Hi all - I've been browsing this forum for a while now, looking for help, and finally decided to join and ask my question, since I'm not terribly good at writing SQL queries in Access. Here is my dilemma:

Table U contains 1 field (Field F, primary key = no dupes)
Table N contains 50 fields (including Field F, same as in Table U)
* Field F in Table N contains multiple records, with dupe values

Would like to create a new Table (or just append to Table U), all values in Table N that match on Field F, but only with one record for each value of Field F. Here are sample layouts of the two Tables:

Table U
Field F
A
B
C
etc

Table N
Field F Field F1 Field 52 ..... Field 10x
A A1 A2 A5
A A2 A3 A3
A A3 A7 A8
B B1 B2 B3
C C1 C9 C7
C C1 C2 C4

Basically, I want to end up with the following:

New Table
Field F Field F1 Field 52 ..... Field 10x
A A1 A2 A5 A3 A7 A8
B B1 B2 B3
C C1 C9 C7 C2 C4

Where, there is ultimately 1 record per value in Field F, and each of the corresponding values from multiple records of Field F are added to the end of the one record, if that makes logical sense.

I have tried many different things, from INTERSECT, to CONCAT, to INNER JOIN, to OUTER JOIN, to LAST, but my SQL isn't good enough even to post here, as nothing really works. :)

Thank you so much in advance!
karin
Oct 28 '08 #1
3 5304
Stewart Ross
2,545 Expert Mod 2GB
Hi Karin. I don't think It's your SQL skills that are the problem here; it's the way you are trying to represent the data, which would appear to be more freeform (spreadsheet-like) than entity-based (database-like) in character.

Database records within a table are generally vertical in structure - with each row representing a different item, and each column a property of that item.

What you are considering is something very different, where in effect you are pivoting what appears to be the same kind of entity in each case as a kind of sideways record of variable length attached to a primary key. In general that is not what databases do, and it is why you will not be able to use any variant of SQL to generate your pivoted records.

It is always possible to use bespoke programming to process bespoke data, using memo fields or the like to store custom lists, but then you will need custom processing to do anything with the results of such storage as well (even to read the lists once generated).

I would suggest taking a different approach, which is to consider whether or not a relational database is the right tool for the job. If it is, then you really need to reconsider what the table structure of your multi-record table is, so that it can be handled properly using standard 1-to-many relationships which, as mentioned, in a relational database are vertical and not horizontal in nature. The horizontal part of a table structure in a database models the fields of each table, not the records themselves. The fields in turn model the discrete properties of the entity being stored.

It is because there is currently a complete mismatch of what you have apparently got in your database with what relational databases are good at doing that you face such difficulties trying to get the end results you seek. This often happens if users are used to working with spreadsheet products rather than databases - I wonder if that is the case here?

It would also help us to help you if you could post example data which is closer to actual real-world data, rather than the use of dummy data values such as A1, A2 etc. We can't really see the patterns in your data from such heavily-disguised values, and it becomes more difficult to give you sound advice on how to progress this task when the data is not easy to understand.

-Stewart
Oct 28 '08 #2
Hi Stewart...Thanks for the input...I think what I'm trying to get at is in fact a typical db structure, but maybe i'm mistaken. Here is some sample data:

F N na N1 r2 n3 r4
anastasia stacy
andrea andrew drea rea andrew drea rea
andrea andrew dea
andreas andy
andrew andy ansey drewandrea andy ansey
ansey drewandrea
andrew andy drew andrea andy
andrew drew
angela angel angelica angelina angeline angie jane
angel angelica angelina angeline angie jane angelica
angelina angeline angie jane angelina angeline angie
jane angeline angie jane angie jane
angela angelica angelina angeline angelica angelina angeline
angelina angeline
angelina lina
angeline angie
anita annette
anita nan
anita nana
anita nina
anita nita
ann hannah nan nanny nancy nana hannah
ann nan nanny nancy nana nan nanny
ann nancy nana nanny nancy nana nancy
ann nana nana
ann anna nettie anna nettie
ann anna
ann anne
ann annie

What I need is either:
* one record for each unique name in Field F, which contains all possible values (which might be in multiple rows)
OR
* one record for each combination of unique name (in Field F) and each possible corresponding value from any/all of the other fields?

Does that help?
Thanks!
kbr

Hi Karin. I don't think It's your SQL skills that are the problem here; it's the way you are trying to represent the data, which would appear to be more freeform (spreadsheet-like) than entity-based (database-like) in character.

Database records within a table are generally vertical in structure - with each row representing a different item, and each column a property of that item.

What you are considering is something very different, where in effect you are pivoting what appears to be the same kind of entity in each case as a kind of sideways record of variable length attached to a primary key. In general that is not what databases do, and it is why you will not be able to use any variant of SQL to generate your pivoted records.

It is always possible to use bespoke programming to process bespoke data, using memo fields or the like to store custom lists, but then you will need custom processing to do anything with the results of such storage as well (even to read the lists once generated).

I would suggest taking a different approach, which is to consider whether or not a relational database is the right tool for the job. If it is, then you really need to reconsider what the table structure of your multi-record table is, so that it can be handled properly using standard 1-to-many relationships which, as mentioned, in a relational database are vertical and not horizontal in nature. The horizontal part of a table structure in database models the fields of each table, not the records themselves. It is because there is currently a complete mismatch of what you have apparently got in your database with what relational databases are good at doing that you face such difficulties trying to get the end results you seek. This often happens if users are used to working with spreadsheet products rather than databases - I wonder if that is the case here?

It would also help us to help you if you could post example data which is closer to actual real-world data, rather than the use of dummy data values such as A1, A2 etc. We can't really see the patterns in your data from such heavily-disguised values, and it becomes more difficult to give you sound advice on how to progress this task when the data is not easy to understand.

-Stewart
Oct 28 '08 #3
Stewart Ross
2,545 Expert Mod 2GB
Hi. Your data is not normalised in any way (see our HowTo article on Database Normalisation and Table Structures), so is decidedly non-standard. It is simply not possible to use the wide range of SQL techniques available for data summary and reporting on non-normalised data structures. It is not an if, a but or a maybe - non-normalised data is of no use in a relational database.

It would appear that you actually have just two separate fields in your table: a name, and a synonym for that name. The synonym is being repeated a number of times (which breaks the first rule of normalisation, to separate repeating groups into individual tables).

As an example of how your data SHOULD look in a database table I show an extract below using just the name-synonym structure:

ann hannah
ann nan
ann nanny
ann nancy
ann nana
ann anna
ann anne
ann annie
andrea andrew
andrea drea
andrea rea

The key field is repeated for each record, and followed by one synonym in each record - not multiple synonyms (unless these are in fact different, separately-identifiable elements of the underlying record with a particular meaning that must be recorded).

There can be no duplicate rows in a relational database table, so the repeated name-synonym combinations (as shown in your sample data) would be dealt with (simply by not being stored!).

Given this kind of vertical structure you CAN use SQL techniques to summarise the data and prepare lists in whatever way is appropriate; at present I cannot see any way to do so that can be applied systematically.

Ask yourself what the transformation rule you need to apply would be if you were to do the task you propose manually (forgetting SQL for the moment). If you would struggle to define the rules, covering all the cases you currently have, then you would find it even more of a struggle (if it can be done at all) to define a set of rules that can be implemented using a highly-structured processing language such as SQL.

I strongly advise you to look again at the structure of your data. Non-normalised data and relational databases do not mix.

-Stewart
Oct 29 '08 #4

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

2 posts views Thread by Knighterrant | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.