473,320 Members | 2,024 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Large table search question

Guys,

I have a general question about designing databases for large data sets.

I was speaking with a colleague about an application we're preparing to
build. One of the application's tables will potentially contain 2 million
or more names, containing (at least) the fields first_name, last_name,
middle_name and prefix.

A common lookup the application will require is the full name, so prefix +
first_name + middle_name + last_name.

My friend's suggestion was to create a "lookup field" in the table itself,
which would contain a concatenation of these fields created during insert.
So, for each record, we'd having each individual field and then a
full_name field that would contain the combination of the ind. fields.
His argument is that this will make lookups in this manner extremely fast
and efficient.

I agree with his assertion, but get the feeling that this is sort of an
ugly design. Would a compound index on these fields really be less
efficient?

Thanks for your help!

John

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #1
2 1523
"John Wells" <jb@sourceillustrated.com> writes:
A common lookup the application will require is the full name, so prefix +
first_name + middle_name + last_name. My friend's suggestion was to create a "lookup field" in the table itself,
which would contain a concatenation of these fields created during insert.
So, for each record, we'd having each individual field and then a
full_name field that would contain the combination of the ind. fields.
His argument is that this will make lookups in this manner extremely fast
and efficient.


Not unless you then add an index on that field, which would imply doubly
redundant storage of the data (primary fields, lookup field, lookup
field's index).

You don't actually need the lookup field in Postgres: you can create the
computed index directly. For instance

create index fooi on foo ((first_name || middle_name || last_name));

select * from foo
where (first_name || middle_name || last_name) = 'JohnQPublic';

This is still kinda grim on storage space, but at least it's 2x not 3x.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #2
John Wells wrote:
Guys,

I have a general question about designing databases for large data sets.

I was speaking with a colleague about an application we're preparing to
build. One of the application's tables will potentially contain 2 million
or more names, containing (at least) the fields first_name, last_name,
middle_name and prefix.

A common lookup the application will require is the full name, so prefix +
first_name + middle_name + last_name.

My friend's suggestion was to create a "lookup field" in the table itself,
which would contain a concatenation of these fields created during insert.
So, for each record, we'd having each individual field and then a
full_name field that would contain the combination of the ind. fields.
His argument is that this will make lookups in this manner extremely fast
and efficient.
Might, might not. No figures to back up his argument. It'll certainly
make updates slower and less efficient. In fact, since each row will
store the data twice you'll get less rows per disk-page which means
(potentially) more disk reads when you need to get several rows.
I agree with his assertion, but get the feeling that this is sort of an
ugly design. Would a compound index on these fields really be less
efficient?


Doubtful, I'd certainly not try his solution until I'd tried the simple
way first.

If you really want to try your friend's approach on PG you can build a
functional index. As of 7.4, these can be expressions rather than just
indexes so you can do something like:

CREATE INDEX my_idx_1 ON table1 ( prefix || ' ' || first_name ...);

If you're using 7.3.x you'll need to wrap that expression in a function
and index the function instead.

In your case though, I'd just build a compound index and leave it at that.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: robin | last post by:
I need to do a search through about 50 million records, each of which are less than 100 bytes wide. A database is actually too slow for this, so I thought of optimising the data and putting it all...
1
by: John | last post by:
I'm developing an application for medical use that will be used to capture patient background and visit data. The application will have approximately 50 forms, with an average of about 20 fields...
55
by: Jonas Smithson | last post by:
I've seen a few attractive multi-column sites whose geometry is based on pure CSS-P, but they're what you might call "code afficionado" sites, where the subject matter of the site is "coding...
3
by: Joachim Klassen | last post by:
Hi all, first apologies if this question looks the same as another one I recently posted - its a different thing but for the same szenario:-). We are having performance problems when...
5
by: Louis LeBlanc | last post by:
Hey folks. I'm new to the list, and not quite what you'd call a DB Guru, so please be patient with me. I'm afraid the lead up here is a bit verbose . . . I am working on an application that...
3
by: RC | last post by:
Dear Dudes, I post this in multiple groups for opening brain storm. Sometime I need to query the data from database server then display them into user's browser in HTML <table>. But if the...
5
by: Rich | last post by:
Hello, I have a search application to search data in tables in a database (3 sql server tables). I populate 2 comboboxes with with data from each table. One combobox will contain unique...
1
by: ozzii | last post by:
Hi, I have a table called products which consist of various supplier products and has the following fields: ID int not null primary key auto_increment SupplierID int not null ProductCode...
25
by: tekctrl | last post by:
Anyone: I have a simple MSAccess DB which was created from an old ASCII flatfile. It works fine except for something that just started happening. I'll enter info in a record, save the record,...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.