473,385 Members | 1,973 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

How should I set up this database?

Imagine 25.000.000 records in a database with a table like this:

TABLE SUBJECTS

table_id (INT)
subject (VARCHAR(200))
date_added (TIMESTAMP)

The SUBJECT field is a long string. Something like these:

Example 1:
"Johnny posts 'PHP Examples' version 2'

Example 2:
"Christmas Melodies from Japan (2001)"

QUESTION:

The user should be able to search the database on the SUBJECT field and
should be able to use partial keywords like:

Keyword examples:

- lodies (finds Melodies)
- Examp (finds Examples)
- John (finds Johnny)

Right now, using WHERE LIKE takes 40 seconds since it checks every single
record. I can't use indexes either. "Of course" you might say.

But what I want to know is... what is the workaround for this? How can I
create a database which allows people to do wilcard searches while keeping
my queries lightning fast?

Other websites and applications have this... so there must be a brilliant
"trick" or workaround for this.

Any ideas?
Jul 20 '05 #1
4 1643
Ondernemer wrote:
Other websites and applications have this... so there must be a brilliant
"trick" or workaround for this.


Basic idea is to create a table where is a list of all words used in
your text. Then create another table which links these words to the
table where the actual texts are located. After searching correct words
from the word table, rest of the queries can use indexes so it should be
fast. Even if you have to check all rows in the word table, you will
still propably have less than 25 000 000 rows there.

I just wrote some time ago a pretty good example about this subject:

http://groups.google.com/groups?selm...0read3.inet.fi

I should warn you that I have not studied this method much, so there
might be better ways to implement the same thing. But the idea behind
the method should be clear enough.
Jul 20 '05 #2

"Aggro" wrote:
http://groups.google.com/groups?selm...0read3.inet.fi


Brilliant. I actually was thinking the same thing, but was afraid I was
thinking in an amateur way and there was a more efficient command/way of
doing this. :-)

Thanks m8.
Jul 20 '05 #3

"Aggro" <sp**********@yahoo.com> wrote in message
news:2E***************@read3.inet.fi...
Ondernemer wrote:
Other websites and applications have this... so there must be a brilliant "trick" or workaround for this.


Basic idea is to create a table where is a list of all words used in
your text. Then create another table which links these words to the
table where the actual texts are located. After searching correct words
from the word table, rest of the queries can use indexes so it should be
fast. Even if you have to check all rows in the word table, you will
still propably have less than 25 000 000 rows there.

I just wrote some time ago a pretty good example about this subject:

http://groups.google.com/groups?selm...0read3.inet.fi

I should warn you that I have not studied this method much, so there
might be better ways to implement the same thing. But the idea behind
the method should be clear enough.


"And" will not work using a table of keywords. "Or" will work but you will
get everything. The best way to do this is to create a column with all
keywords (this must be of type MyIsam) then do a full text search.

Rich
Jul 20 '05 #4
"Ondernemer" <no**@email.com> wrote in message
news:10*************@corp.supernews.com...
Imagine 25.000.000 records in a database with a table like this:

TABLE SUBJECTS

table_id (INT)
subject (VARCHAR(200))
date_added (TIMESTAMP)

The SUBJECT field is a long string. Something like these:

Example 1:
"Johnny posts 'PHP Examples' version 2'

Example 2:
"Christmas Melodies from Japan (2001)"

QUESTION:

The user should be able to search the database on the SUBJECT field and
should be able to use partial keywords like:

Keyword examples:

- lodies (finds Melodies)
- Examp (finds Examples)
- John (finds Johnny)

Right now, using WHERE LIKE takes 40 seconds since it checks every single
record. I can't use indexes either. "Of course" you might say.

But what I want to know is... what is the workaround for this? How can I
create a database which allows people to do wilcard searches while keeping
my queries lightning fast?

Other websites and applications have this... so there must be a brilliant
"trick" or workaround for this.

Any ideas?


One quick idea would be to place a LIMIT on the returned rows (like
search engines). Look up the use of LIMIT in the mysql docs...
http://dev.mysql.com/doc/mysql/en/SELECT.html

From the page:

The LIMIT clause can be used to constrain the number of rows returned by the
SELECT statement. LIMIT takes one or two numeric arguments, which must be
integer constants. With two arguments, the first argument specifies the
offset of the first row to return, and the second specifies the maximum
number of rows to return. The offset of the initial row is 0 (not 1):

mysql> SELECT * FROM table LIMIT 5,10; # Retrieve rows 6-15
For compatibility with PostgreSQL, MySQL also supports the LIMIT row_count
OFFSET offset syntax. To retrieve all rows from a certain offset up to the
end of the result set, you can use some large number for the second
parameter. This statement retrieves all rows from the 96th row to the last:

mysql> SELECT * FROM table LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the
beginning of the result set:

mysql> SELECT * FROM table LIMIT 5; # Retrieve first 5 rows
--- used from the MySQL web site---
Jul 20 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
by: cjl | last post by:
Hey all: I know that it is silly in the age of Google to 'lose' something on the internet, but I recently checked out a project that had implemented a database with a subset of SQL in pure...
0
by: Manzoorul Hassan | last post by:
I just Installed MySQL v4.1.14 and had created a Database (from a script). But I would like to redo the Database portion of it but am not able to drop the Database. If I do a "show database;" I...
6
by: Marvin Libson | last post by:
Hi All: I am running DB2 UDB V7.2 with FP11. Platform is Windows 2000. I have created a java UDF and trigger. When I update my database I get the following error: SQL1224N A database...
0
by: Alex | last post by:
Hi all, I've been running a db2 V8.1 databasle to store my radius server accounting info for a *long* time and have never had any problems with it. Last week we had a power outage in our...
1
by: pintur | last post by:
The message is: SQL1036C Errore di I/O durante l' accesso al database. SQLSTATE=58030 what is the proble? what for restore tables? thanks
5
by: Steve | last post by:
Hi; I thought I would rephrase a question I asked in another post to more quickly get to the heart of the matter. My apologies for anyone who is offended by what appears to be a repetition. ...
0
debasisdas
by: debasisdas | last post by:
DATABASE LINK =============== A database link is a path through which a remote user in another database can connect to any other database. Once created database link exists as an object in the...
0
by: Jack | last post by:
Training Classes for Oracle10g, 9i, 8i Certification training in Oracle10g and 9i: DBA, Developer, Discoverer. training conducted at your location worldwide. Courseware licensing also available....
0
by: Winder | last post by:
Training Classes for Oracle10g, 9i, 8i Certification training in Oracle10g and 9i: DBA, Developer, Discoverer. training conducted at your location worldwide. Courseware licensing also available....
0
by: Laurynn | last post by:
# (ebook - pdf) - programming - mysql - php database applicati # (Ebook - Pdf)Learnkey How To Design A Database - Sql And Crystal Report # (ebook-pdf) E F Codd - Extending the Database Relational...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.