Best way to do so.

Bob Bedford

Sorry if I post here, but I don't have access to any valuable mysql ng for
this question, ans since 99% or php programmers deal with mysql, I'm sure
I'll get an answer here.

Look at the folloming table:
CREATE TABLE searchlog (
IDSearchLog MEDIUMINT UNSIGNED NOT NULL auto_increment,
F1 MEDIUMINT UNSIGNED default NULL,
F2 VARCHAR(40) default NULL,
F3a SMALLINT UNSIGNED default NULL,
F3b SMALLINT UNSIGNED default NULL,
F4a FLOAT(11,2) UNSIGNED default NULL,
F4b FLOAT(11,2) UNSIGNED default NULL,
F5a MEDIUMINT UNSIGNED default NULL,
F5b MEDIUMINT UNSIGNED default NULL,
F6 char(8) default NULL,
F7 char(8) default NULL,
F8 char(8) default NULL,
F9a TINYINT UNSIGNED default NULL,
F9b TINYINT UNSIGNED default NULL,
F10a FLOAT(6,2) UNSIGNED default NULL,
F10b FLOAT(6,2) UNSIGNED default NULL,
F11a FLOAT(6,2) UNSIGNED default NULL,
F11b FLOAT(6,2) UNSIGNED default NULL,
F12 datetime NOT NULL,
PRIMARY KEY (IDSearchLog),
UNIQUE KEY IDSearchLog (IDSearchLog),
INDEX AI_IDSearchLog (IDSearchLog)
);

Now this will be used for managing searchlog. Any time a search is done on
some fields, then I must log them.
For the long time, I think that they may be about 5-600 different search
type. Many search would be quite similar.
Now, for performances reasons I've 3 possibilities:
1) I may check if the same search is done then log only the datetime and the
IDSearchLog. If the same search doesn't exist, then I create it.
2) I may create a new entry on every search: I won't have to check if the
search exists, but I will have a lot of similar records. This will rise the
amount of data, but isn't quicker ? I avoid a check at every search.
3) I may create many tables. If you have a close look at the field's name
(explicitely changed for better understanding), any number with a or b next
to it means that you may define min and max. So for fields F3a and F3b is a
min and max value. So the limits combination may be quite common between
searches. So in my searchlog table, I may only have pointers to the
differents combinations. This seems very heavy in query time, but it let me
gain some space. Maybe I'm missing a quick way to create a new combination
every time, and return the existing ID if available instead of creating it.

After thinking at what I've, the first choice would have my preference. But
I'm scared that the search for existing record with the same fields would
slow down a lot, and it doesn't worth the gained space against choice 2.
Maybe there is a simple way to add some fields in a query, and if the fields
combination already exists, return the ID.
Some search example:
Adidas, shoes, from 150$, to 250$, blue
Nike, socks, to 50$
Adidas, socks, to 60$
Nike, socks, to 60$
Adidas shoes, to 250$
Puma shoes, red
Puma shoes, yellow
.....
I've to log such fields in order to be as fast as possible, but also save
space.

Please help.

Bob

Jul 17 '05 #1

Subscribe Post Reply

1828

News Me

Bob Bedford wrote:

Sorry if I post here, but I don't have access to any valuable mysql ng
for this question, ans since 99% or php programmers deal with mysql, I'm
sure I'll get an answer here.

Look at the folloming table:
CREATE TABLE searchlog (
IDSearchLog MEDIUMINT UNSIGNED NOT NULL auto_increment,
F1 MEDIUMINT UNSIGNED default NULL,
F2 VARCHAR(40) default NULL,
F3a SMALLINT UNSIGNED default NULL,
F3b SMALLINT UNSIGNED default NULL,
F4a FLOAT(11,2) UNSIGNED default NULL,
F4b FLOAT(11,2) UNSIGNED default NULL,
F5a MEDIUMINT UNSIGNED default NULL,
F5b MEDIUMINT UNSIGNED default NULL,
F6 char(8) default NULL,
F7 char(8) default NULL,
F8 char(8) default NULL,
F9a TINYINT UNSIGNED default NULL,
F9b TINYINT UNSIGNED default NULL,
F10a FLOAT(6,2) UNSIGNED default NULL,
F10b FLOAT(6,2) UNSIGNED default NULL,
F11a FLOAT(6,2) UNSIGNED default NULL,
F11b FLOAT(6,2) UNSIGNED default NULL,
F12 datetime NOT NULL,
PRIMARY KEY (IDSearchLog),
UNIQUE KEY IDSearchLog (IDSearchLog),
INDEX AI_IDSearchLog (IDSearchLog)
);

You don't need the UNIQUE KEY clause because the manual says:

14.2.6 CREATE TABLE Syntax

A PRIMARY KEY is a unique KEY where all key columns must be defined as
NOT NULL.

<snip>

As for everything else, I have no idea what you are asking.

NM

--
convert uppercase WORDS to single keystrokes to reply

Jul 17 '05 #2

Dan Stumpus

"Bob Bedford" <be******@YouKnowWhatToDoHerehotmail.com> wrote

Now this will be used for managing searchlog. Any time a search is done on
some fields, then I must log them.
For the long time, I think that they may be about 5-600 different search
type. Many search would be quite similar.

For a log, just record the raw event. Don't try and get fancy. Doing so
may get you in trouble later when you want to mine the data in some way you
haven't even considered.

Try and be relational: use the part id or style/color/size id's from your
item master tables.

I'd also add timestamp, source, and session_id fields to each row.

Don't design to minimize physical storage. Design for maximum flexibility.

How many rows will your app be inserting a day, anyway? You could purge the
old records to a history table every month if it got really nuts.

I designed a large inventory application which accumulated about a million
transactions a month, and we kept a years worth on-line without any
particular problems (this was Oracle on an HP box). It's hard to imagine
you'll get anything like that volume of transactions.

Just my 2 cents off the top of my head, not fully understanding your
application...

-- Dan

Jul 17 '05 #3

Bob Bedford

First, thanks for replying.

I'll explain differently what I'm trying to do.
I've to store all possible data for informing the companies that signed with
our website what the people choice are.
Actually I took example of the shoes, for simplify. The real items in the
site are used cars.
If they are 80% of the customers that are looking for Chevrolet, then the
car dealer may buy Chevrolet instead of BMW for their customers. If they
know that 90% are looking for a car between 5000 and 8000$, they would try
to provide this kind of car. So, everytime somebody make a search, I may log
what he was looking for, then build statistics for the car dealers. Since
not all fields are compulsory, a customer may chose the price, an other the
color, the other one just the make. I want to optimize it as much as
possible but keeping it flexible for doing any kind of statistics.

For a log, just record the raw event. Don't try and get fancy. Doing so
may get you in trouble later when you want to mine the data in some way
you haven't even considered. When you say I've to log the row event, I don't know what you mean. My form
has many fields, and I must save only the used ones, not the ones left
empty. I'll use the part ID, but how to store any price if they are filled
by the customer itself ? I don't provide any preset value for price.
Try and be relational: use the part id or style/color/size id's from your
item master tables. Obviously I'll do this way. I'll link to other tables.

I'd also add timestamp, source, and session_id fields to each row. What you mean for "source" to each row ?
Why should I store session_id ? this will let me know if the same person
looked at different articles, but since they are too many search, it would
be hard to get an interesting result. Maybe I may say: people looking for a
BMW are also looking for an AUDI.
Don't design to minimize physical storage. Design for maximum
flexibility. My problem is saving "time". I mean if my search makes 5 seconds because of
bad DB design or other, it may be good for trashbin. I want maximum speed
and flexibility.
How many rows will your app be inserting a day, anyway? You could purge
the old records to a history table every month if it got really nuts. I won't purge records. Maybe it wouldn't be interesting to have records 3
years old, but the records may be kept.
I designed a large inventory application which accumulated about a million
transactions a month, and we kept a years worth on-line without any
particular problems (this was Oracle on an HP box). It's hard to imagine
you'll get anything like that volume of transactions.

The concurrent get 7 millions/month. I won't achieve this for a long time,
but as what I see, I've about 50'000 search/month, and I don't store them
yet.

I've 2 choices:
1) store a table with every different search and an other where I store the
searchID, the timestamp and the session_ID.
But I've to check if the record already exist at every new search.
2) store everything in the same table, but a lot of same records will exist
in the table.

in example1, I may have this kind of result.
table a:
(ID,Make,pricefrom, priceto,kmfrom,kmto,....)
1,BMW, ,8000, , ,....
2,Bugatti, 50000,80000, ,100000,...
3,BMW, , , ,150000,...

table b:
(aID,timestamp,session_id)
1,10:10:42 2004-10-10,dfasdfsdfsdf
2,10:10:48 2004-10-10,fdsadsfdsdsv
3,10:10:55 2004-10-10,fsdvgrtwtrwt
2,10:11:02 2004-10-10,ffdsfvvvddvd
2,10:11:20 2004-10-10,ffshhzjuzturz

at beginning, the table a would be filled very quickly, but later, I'll have
less new data there.

Flexibility wouldn't be affected by such choices, only speed and diskspace.
So what's the best choice ?
Does it exist a way to store a record in a table, and if it already exist,
retrieve the ID ? That would be the faster way I think

Jul 17 '05 #4

R. Rajesh Jeba Anbiah

Bob Bedford wrote:

I'll explain differently what I'm trying to do.
I've to store all possible data for informing the companies that signed with our website what the people choice are.
Actually I took example of the shoes, for simplify. The real items in the site are used cars.
If they are 80% of the customers that are looking for Chevrolet, then the car dealer may buy Chevrolet instead of BMW for their customers. If they know that 90% are looking for a car between 5000 and 8000$, they would try to provide this kind of car. So, everytime somebody make a search, I may log what he was looking for, then build statistics for the car dealers. Since not all fields are compulsory, a customer may chose the price, an other the color, the other one just the make. I want to optimize it as much as
possible but keeping it flexible for doing any kind of statistics.

<snip>

Just another thought... Don't log anything with DB. Just use access
log for data mining. Probably, you have to use a cron job once in a
week that does the data mining and provides you enough stats or
trims/groups the data that available in access log.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Jul 17 '05 #5

Dan Stumpus

"Bob Bedford" <be******@YouKnowWhatToDoHerehotmail.com> wrote

First, thanks for replying.
Too much time on my hands...
I'll explain differently what I'm trying to do.
I've to store all possible data for informing the companies that signed
with our website what the people choice are.
Actually I took example of the shoes, for simplify. The real items in the
site are used cars.
If they are 80% of the customers that are looking for Chevrolet, then the
car dealer may buy Chevrolet instead of BMW for their customers. If they
know that 90% are looking for a car between 5000 and 8000$, they would try
to provide this kind of car. So, everytime somebody make a search, I may
log what he was looking for, then build statistics for the car dealers.
Since not all fields are compulsory, a customer may chose the price, an
other the color, the other one just the make. I want to optimize it as
much as possible but keeping it flexible for doing any kind of statistics.
It's a design mistake to try and optimize storage. It limits your choices
down the line. When I was younger, I did this several times, and it was
always a mistake in the long run.

Log the entire choice row. Then you can cook it down and digest it if you
need to for further analysis. 50,000 rows a month isn't much of a problem
on a fast box with fast disks. Keeping *all* fields in the log will give
you maximum flexibility and ease in conducting ad-hoc querying etc.

For a log, just record the raw event. When you say I've to log the row event, I don't know what you mean. My
form has many fields, and I must save only the used ones, not the ones
left empty.
Is someone looking over your shoulder and saying "you cannot save a field in
a row if it may be blank x% of the time?" If you want to design
heirarchical schema with "search" at the top, and many other types of
zero-to-1 entities beneath it (eg, mfg, color, miles, cost, upholstery,
etc), have at it. This will save storage but slow down querying (cause
you'll need lots of joining), and make querying always be a multiple table
join (a pain, if you do a lot of them). De-normalization (eg, one row per
search event) is used all the time where data mining needs to be quick and
convenient.
I'll use the part ID, but how to store any price if they are filled by the
customer itself ? I don't provide any preset value for price.
Leave them null, or default to zero, your pick. Nulls are sometimes a bit
of a pain to deal with syntactically, as function arguments, etc.
I'd also add timestamp, source, and session_id fields to each row.

What you mean for "source" to each row ?

Originating url or IP address, just in case someone is messing with you...
Why should I store session_id ? this will let me know if the same person
looked at different articles, but since they are too many search, it would
be hard to get an interesting result.
If you're in the design mentality, you don't prejudge what may or may not be
interesting.
Maybe I may say: people looking for a BMW are also looking for an AUDI.
Shopping behavior is very interesting to marketing types. You may get clues
for suggestive selling from analyzing search habits.

Don't design to minimize physical storage. Design for maximum
flexibility.

My problem is saving "time". I mean if my search makes 5 seconds because
of bad DB design or other, it may be good for trashbin. I want maximum
speed and flexibility.

Waiting a few seconds for a research query is perfectly ok. In my
experience (designing fairly large systems for 25 yrs) sub-second response
times are important for high volume things like customer activity, order
entry, and the like. Research (which is what your search history table is
for) involves people thinking about what they want, looking at the results,
and thinking some more. Educate your customer that processing historical
data takes a few seconds.
The concurrent get 7 millions/month. I won't achieve this for a long time,
but as what I see, I've about 50'000 search/month, and I don't store them
yet.
A few hundred thousand rows in a table or database is not much of a
challenge for a modern database. If you're worried, create a table with a
couple hundred k rows and run some queries against it.
1) store a table with every different search and an other where I store
the searchID, the timestamp and the session_ID.
But I've to check if the record already exist at every new search.
2) store everything in the same table, but a lot of same records will
exist in the table.
IMO, you're over-thinking this one. Just store a row per search and be done
with it.

Flexibility wouldn't be affected by such choices, only speed and
diskspace. So what's the best choice ?
Simplicity. One row per search event.
Does it exist a way to store a record in a table, and if it already exist,
retrieve the ID ? That would be the faster way I think

Every insertion will be slower that way. If you want user entry speed
maximized, this isn't the way to go.

Just the thoughts of an old designer....

-- Dan

Jul 17 '05 #6

Bob Bedford

> Just another thought... Don't log anything with DB. Just use access

log for data mining. Probably, you have to use a cron job once in a
week that does the data mining and provides you enough stats or
trims/groups the data that available in access log.

Since I use session variables trough pages, and also POST datas, access log
won't be possible, as those datas aren't shown in access log.

I'm thinking of logging datas in a table, then use a cron for "packing it a
way that takes less space, and does fit better the way datas may be stored.
The fast way is kept, and the space is also saved. letting me keep data for
a long time without caring about lost space.

Don't know if it's the best way..but it's quite a compromise fitting best to
my needs.
Bob

Jul 17 '05 #7

Bob Bedford

> Just the thoughts of an old designer....

....with probably far more experience than me. I'll take your advice and I
may always change later. This way will also be faster to implement.

Thanks for help.

Bob

Jul 17 '05 #8

R. Rajesh Jeba Anbiah

Bob Bedford wrote:

Just another thought... Don't log anything with DB. Just use access
log for data mining. Probably, you have to use a cron job once in a
week that does the data mining and provides you enough stats or
trims/groups the data that available in access log.
Since I use session variables trough pages, and also POST datas,

access log won't be possible, as those datas aren't shown in access log.

<snip>

Since, it is a search page and if I'm an enduser, I would expect it
to be a GET. Can't you just transform it to GET? Also, you *can*
actually get the session or any PHP variables in access log.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Jul 17 '05 #9

Bob Bedford

"R. Rajesh Jeba Anbiah" <ng**********@rediffmail.com> a écrit dans le
message de news: 11**********************@z14g2000cwz.googlegroups. com...

Bob Bedford wrote:
> Just another thought... Don't log anything with DB. Just use access
> log for data mining. Probably, you have to use a cron job once in a
> week that does the data mining and provides you enough stats or
> trims/groups the data that available in access log.

Since I use session variables trough pages, and also POST datas,

access log
won't be possible, as those datas aren't shown in access log.

<snip>

Since, it is a search page and if I'm an enduser, I would expect it
to be a GET. Can't you just transform it to GET? Also, you *can*
actually get the session or any PHP variables in access log.

I'm actually transforming to GET. what a mistake to use POST for search ! I
didn't know. Doesn't matter, I rewrite part of my site and improve it.

The session ID is available by using SID variable. Don't know the size of
it, so how to store it in a table ? wath type of field should I set for it ?

Bob

Jul 17 '05 #10

R. Rajesh Jeba Anbiah

Bob Bedford wrote:
<snip>

Since, it is a search page and if I'm an enduser, I would expect it
to be a GET. Can't you just transform it to GET? Also, you *can*
actually get the session or any PHP variables in access log.
I'm actually transforming to GET. what a mistake to use POST for

search ! I didn't know. Doesn't matter, I rewrite part of my site and improve it.

Just think about the Google which uses POST; being an enduser, you
lose the ability/flexibility to bookmark the pages, access the history,
hit back & forward, etc. The technical benefits are usually even
covered on the basic books.
The session ID is available by using SID variable. Don't know the size of it, so how to store it in a table ? wath type of field should I set

for it ?

I'm not sure whether you misunderstood my suggestion or not. My
suggestion is to avoid logging of search queries via DB, but to use
access log as a source of data mining--and possibly to extract/compact
the data in access log to a DB through a cron job or so. So, in that
case, if you need to log (in access log) any PHP variables or session
variables, it can be easily achieved. See
<http://www.webmasterworld.com/forum88/2489.htm>

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Jul 17 '05 #11

Best way to do so.

Similar topics