473,383 Members | 1,733 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Best way to do so.

Sorry if I post here, but I don't have access to any valuable mysql ng for
this question, ans since 99% or php programmers deal with mysql, I'm sure
I'll get an answer here.

Look at the folloming table:
CREATE TABLE searchlog (
IDSearchLog MEDIUMINT UNSIGNED NOT NULL auto_increment,
F1 MEDIUMINT UNSIGNED default NULL,
F2 VARCHAR(40) default NULL,
F3a SMALLINT UNSIGNED default NULL,
F3b SMALLINT UNSIGNED default NULL,
F4a FLOAT(11,2) UNSIGNED default NULL,
F4b FLOAT(11,2) UNSIGNED default NULL,
F5a MEDIUMINT UNSIGNED default NULL,
F5b MEDIUMINT UNSIGNED default NULL,
F6 char(8) default NULL,
F7 char(8) default NULL,
F8 char(8) default NULL,
F9a TINYINT UNSIGNED default NULL,
F9b TINYINT UNSIGNED default NULL,
F10a FLOAT(6,2) UNSIGNED default NULL,
F10b FLOAT(6,2) UNSIGNED default NULL,
F11a FLOAT(6,2) UNSIGNED default NULL,
F11b FLOAT(6,2) UNSIGNED default NULL,
F12 datetime NOT NULL,
PRIMARY KEY (IDSearchLog),
UNIQUE KEY IDSearchLog (IDSearchLog),
INDEX AI_IDSearchLog (IDSearchLog)
);

Now this will be used for managing searchlog. Any time a search is done on
some fields, then I must log them.
For the long time, I think that they may be about 5-600 different search
type. Many search would be quite similar.
Now, for performances reasons I've 3 possibilities:
1) I may check if the same search is done then log only the datetime and the
IDSearchLog. If the same search doesn't exist, then I create it.
2) I may create a new entry on every search: I won't have to check if the
search exists, but I will have a lot of similar records. This will rise the
amount of data, but isn't quicker ? I avoid a check at every search.
3) I may create many tables. If you have a close look at the field's name
(explicitely changed for better understanding), any number with a or b next
to it means that you may define min and max. So for fields F3a and F3b is a
min and max value. So the limits combination may be quite common between
searches. So in my searchlog table, I may only have pointers to the
differents combinations. This seems very heavy in query time, but it let me
gain some space. Maybe I'm missing a quick way to create a new combination
every time, and return the existing ID if available instead of creating it.

After thinking at what I've, the first choice would have my preference. But
I'm scared that the search for existing record with the same fields would
slow down a lot, and it doesn't worth the gained space against choice 2.
Maybe there is a simple way to add some fields in a query, and if the fields
combination already exists, return the ID.
Some search example:
Adidas, shoes, from 150$, to 250$, blue
Nike, socks, to 50$
Adidas, socks, to 60$
Nike, socks, to 60$
Adidas shoes, to 250$
Puma shoes, red
Puma shoes, yellow
.....
I've to log such fields in order to be as fast as possible, but also save
space.

Please help.

Bob
Jul 17 '05 #1
10 1828
Bob Bedford wrote:
Sorry if I post here, but I don't have access to any valuable mysql ng
for this question, ans since 99% or php programmers deal with mysql, I'm
sure I'll get an answer here.

Look at the folloming table:
CREATE TABLE searchlog (
IDSearchLog MEDIUMINT UNSIGNED NOT NULL auto_increment,
F1 MEDIUMINT UNSIGNED default NULL,
F2 VARCHAR(40) default NULL,
F3a SMALLINT UNSIGNED default NULL,
F3b SMALLINT UNSIGNED default NULL,
F4a FLOAT(11,2) UNSIGNED default NULL,
F4b FLOAT(11,2) UNSIGNED default NULL,
F5a MEDIUMINT UNSIGNED default NULL,
F5b MEDIUMINT UNSIGNED default NULL,
F6 char(8) default NULL,
F7 char(8) default NULL,
F8 char(8) default NULL,
F9a TINYINT UNSIGNED default NULL,
F9b TINYINT UNSIGNED default NULL,
F10a FLOAT(6,2) UNSIGNED default NULL,
F10b FLOAT(6,2) UNSIGNED default NULL,
F11a FLOAT(6,2) UNSIGNED default NULL,
F11b FLOAT(6,2) UNSIGNED default NULL,
F12 datetime NOT NULL,
PRIMARY KEY (IDSearchLog),
UNIQUE KEY IDSearchLog (IDSearchLog),
INDEX AI_IDSearchLog (IDSearchLog)
);


You don't need the UNIQUE KEY clause because the manual says:

14.2.6 CREATE TABLE Syntax

A PRIMARY KEY is a unique KEY where all key columns must be defined as
NOT NULL.

<snip>

As for everything else, I have no idea what you are asking.

NM

--
convert uppercase WORDS to single keystrokes to reply
Jul 17 '05 #2

"Bob Bedford" <be******@YouKnowWhatToDoHerehotmail.com> wrote
Now this will be used for managing searchlog. Any time a search is done on
some fields, then I must log them.
For the long time, I think that they may be about 5-600 different search
type. Many search would be quite similar.


For a log, just record the raw event. Don't try and get fancy. Doing so
may get you in trouble later when you want to mine the data in some way you
haven't even considered.

Try and be relational: use the part id or style/color/size id's from your
item master tables.

I'd also add timestamp, source, and session_id fields to each row.

Don't design to minimize physical storage. Design for maximum flexibility.

How many rows will your app be inserting a day, anyway? You could purge the
old records to a history table every month if it got really nuts.

I designed a large inventory application which accumulated about a million
transactions a month, and we kept a years worth on-line without any
particular problems (this was Oracle on an HP box). It's hard to imagine
you'll get anything like that volume of transactions.

Just my 2 cents off the top of my head, not fully understanding your
application...

-- Dan
Jul 17 '05 #3

First, thanks for replying.

I'll explain differently what I'm trying to do.
I've to store all possible data for informing the companies that signed with
our website what the people choice are.
Actually I took example of the shoes, for simplify. The real items in the
site are used cars.
If they are 80% of the customers that are looking for Chevrolet, then the
car dealer may buy Chevrolet instead of BMW for their customers. If they
know that 90% are looking for a car between 5000 and 8000$, they would try
to provide this kind of car. So, everytime somebody make a search, I may log
what he was looking for, then build statistics for the car dealers. Since
not all fields are compulsory, a customer may chose the price, an other the
color, the other one just the make. I want to optimize it as much as
possible but keeping it flexible for doing any kind of statistics.
For a log, just record the raw event. Don't try and get fancy. Doing so
may get you in trouble later when you want to mine the data in some way
you haven't even considered. When you say I've to log the row event, I don't know what you mean. My form
has many fields, and I must save only the used ones, not the ones left
empty. I'll use the part ID, but how to store any price if they are filled
by the customer itself ? I don't provide any preset value for price.
Try and be relational: use the part id or style/color/size id's from your
item master tables. Obviously I'll do this way. I'll link to other tables.

I'd also add timestamp, source, and session_id fields to each row. What you mean for "source" to each row ?
Why should I store session_id ? this will let me know if the same person
looked at different articles, but since they are too many search, it would
be hard to get an interesting result. Maybe I may say: people looking for a
BMW are also looking for an AUDI.
Don't design to minimize physical storage. Design for maximum
flexibility. My problem is saving "time". I mean if my search makes 5 seconds because of
bad DB design or other, it may be good for trashbin. I want maximum speed
and flexibility.
How many rows will your app be inserting a day, anyway? You could purge
the old records to a history table every month if it got really nuts. I won't purge records. Maybe it wouldn't be interesting to have records 3
years old, but the records may be kept.
I designed a large inventory application which accumulated about a million
transactions a month, and we kept a years worth on-line without any
particular problems (this was Oracle on an HP box). It's hard to imagine
you'll get anything like that volume of transactions.

The concurrent get 7 millions/month. I won't achieve this for a long time,
but as what I see, I've about 50'000 search/month, and I don't store them
yet.

I've 2 choices:
1) store a table with every different search and an other where I store the
searchID, the timestamp and the session_ID.
But I've to check if the record already exist at every new search.
2) store everything in the same table, but a lot of same records will exist
in the table.

in example1, I may have this kind of result.
table a:
(ID,Make,pricefrom, priceto,kmfrom,kmto,....)
1,BMW, ,8000, , ,....
2,Bugatti, 50000,80000, ,100000,...
3,BMW, , , ,150000,...

table b:
(aID,timestamp,session_id)
1,10:10:42 2004-10-10,dfasdfsdfsdf
2,10:10:48 2004-10-10,fdsadsfdsdsv
3,10:10:55 2004-10-10,fsdvgrtwtrwt
2,10:11:02 2004-10-10,ffdsfvvvddvd
2,10:11:20 2004-10-10,ffshhzjuzturz

at beginning, the table a would be filled very quickly, but later, I'll have
less new data there.

Flexibility wouldn't be affected by such choices, only speed and diskspace.
So what's the best choice ?
Does it exist a way to store a record in a table, and if it already exist,
retrieve the ID ? That would be the faster way I think
Jul 17 '05 #4
Bob Bedford wrote:
I'll explain differently what I'm trying to do.
I've to store all possible data for informing the companies that signed with our website what the people choice are.
Actually I took example of the shoes, for simplify. The real items in the site are used cars.
If they are 80% of the customers that are looking for Chevrolet, then the car dealer may buy Chevrolet instead of BMW for their customers. If they know that 90% are looking for a car between 5000 and 8000$, they would try to provide this kind of car. So, everytime somebody make a search, I may log what he was looking for, then build statistics for the car dealers. Since not all fields are compulsory, a customer may chose the price, an other the color, the other one just the make. I want to optimize it as much as
possible but keeping it flexible for doing any kind of statistics.

<snip>

Just another thought... Don't log anything with DB. Just use access
log for data mining. Probably, you have to use a cron job once in a
week that does the data mining and provides you enough stats or
trims/groups the data that available in access log.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Jul 17 '05 #5

"Bob Bedford" <be******@YouKnowWhatToDoHerehotmail.com> wrote

First, thanks for replying.
Too much time on my hands...
I'll explain differently what I'm trying to do.
I've to store all possible data for informing the companies that signed
with our website what the people choice are.
Actually I took example of the shoes, for simplify. The real items in the
site are used cars.
If they are 80% of the customers that are looking for Chevrolet, then the
car dealer may buy Chevrolet instead of BMW for their customers. If they
know that 90% are looking for a car between 5000 and 8000$, they would try
to provide this kind of car. So, everytime somebody make a search, I may
log what he was looking for, then build statistics for the car dealers.
Since not all fields are compulsory, a customer may chose the price, an
other the color, the other one just the make. I want to optimize it as
much as possible but keeping it flexible for doing any kind of statistics.
It's a design mistake to try and optimize storage. It limits your choices
down the line. When I was younger, I did this several times, and it was
always a mistake in the long run.

Log the entire choice row. Then you can cook it down and digest it if you
need to for further analysis. 50,000 rows a month isn't much of a problem
on a fast box with fast disks. Keeping *all* fields in the log will give
you maximum flexibility and ease in conducting ad-hoc querying etc.
For a log, just record the raw event. When you say I've to log the row event, I don't know what you mean. My
form has many fields, and I must save only the used ones, not the ones
left empty.
Is someone looking over your shoulder and saying "you cannot save a field in
a row if it may be blank x% of the time?" If you want to design
heirarchical schema with "search" at the top, and many other types of
zero-to-1 entities beneath it (eg, mfg, color, miles, cost, upholstery,
etc), have at it. This will save storage but slow down querying (cause
you'll need lots of joining), and make querying always be a multiple table
join (a pain, if you do a lot of them). De-normalization (eg, one row per
search event) is used all the time where data mining needs to be quick and
convenient.
I'll use the part ID, but how to store any price if they are filled by the
customer itself ? I don't provide any preset value for price.
Leave them null, or default to zero, your pick. Nulls are sometimes a bit
of a pain to deal with syntactically, as function arguments, etc.
I'd also add timestamp, source, and session_id fields to each row.

What you mean for "source" to each row ?


Originating url or IP address, just in case someone is messing with you...
Why should I store session_id ? this will let me know if the same person
looked at different articles, but since they are too many search, it would
be hard to get an interesting result.
If you're in the design mentality, you don't prejudge what may or may not be
interesting.
Maybe I may say: people looking for a BMW are also looking for an AUDI.
Shopping behavior is very interesting to marketing types. You may get clues
for suggestive selling from analyzing search habits.
Don't design to minimize physical storage. Design for maximum
flexibility.

My problem is saving "time". I mean if my search makes 5 seconds because
of bad DB design or other, it may be good for trashbin. I want maximum
speed and flexibility.


Waiting a few seconds for a research query is perfectly ok. In my
experience (designing fairly large systems for 25 yrs) sub-second response
times are important for high volume things like customer activity, order
entry, and the like. Research (which is what your search history table is
for) involves people thinking about what they want, looking at the results,
and thinking some more. Educate your customer that processing historical
data takes a few seconds.
The concurrent get 7 millions/month. I won't achieve this for a long time,
but as what I see, I've about 50'000 search/month, and I don't store them
yet.
A few hundred thousand rows in a table or database is not much of a
challenge for a modern database. If you're worried, create a table with a
couple hundred k rows and run some queries against it.
1) store a table with every different search and an other where I store
the searchID, the timestamp and the session_ID.
But I've to check if the record already exist at every new search.
2) store everything in the same table, but a lot of same records will
exist in the table.
IMO, you're over-thinking this one. Just store a row per search and be done
with it.

Flexibility wouldn't be affected by such choices, only speed and
diskspace. So what's the best choice ?
Simplicity. One row per search event.
Does it exist a way to store a record in a table, and if it already exist,
retrieve the ID ? That would be the faster way I think


Every insertion will be slower that way. If you want user entry speed
maximized, this isn't the way to go.

Just the thoughts of an old designer....

-- Dan
Jul 17 '05 #6
> Just another thought... Don't log anything with DB. Just use access
log for data mining. Probably, you have to use a cron job once in a
week that does the data mining and provides you enough stats or
trims/groups the data that available in access log.


Since I use session variables trough pages, and also POST datas, access log
won't be possible, as those datas aren't shown in access log.

I'm thinking of logging datas in a table, then use a cron for "packing it a
way that takes less space, and does fit better the way datas may be stored.
The fast way is kept, and the space is also saved. letting me keep data for
a long time without caring about lost space.

Don't know if it's the best way..but it's quite a compromise fitting best to
my needs.
Bob
Jul 17 '05 #7
> Just the thoughts of an old designer....

....with probably far more experience than me. I'll take your advice and I
may always change later. This way will also be faster to implement.

Thanks for help.

Bob
Jul 17 '05 #8
Bob Bedford wrote:
Just another thought... Don't log anything with DB. Just use access
log for data mining. Probably, you have to use a cron job once in a
week that does the data mining and provides you enough stats or
trims/groups the data that available in access log.
Since I use session variables trough pages, and also POST datas,

access log won't be possible, as those datas aren't shown in access log.

<snip>

Since, it is a search page and if I'm an enduser, I would expect it
to be a GET. Can't you just transform it to GET? Also, you *can*
actually get the session or any PHP variables in access log.

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Jul 17 '05 #9

"R. Rajesh Jeba Anbiah" <ng**********@rediffmail.com> a écrit dans le
message de news: 11**********************@z14g2000cwz.googlegroups. com...
Bob Bedford wrote:
> Just another thought... Don't log anything with DB. Just use access
> log for data mining. Probably, you have to use a cron job once in a
> week that does the data mining and provides you enough stats or
> trims/groups the data that available in access log.


Since I use session variables trough pages, and also POST datas,

access log
won't be possible, as those datas aren't shown in access log.

<snip>

Since, it is a search page and if I'm an enduser, I would expect it
to be a GET. Can't you just transform it to GET? Also, you *can*
actually get the session or any PHP variables in access log.


I'm actually transforming to GET. what a mistake to use POST for search ! I
didn't know. Doesn't matter, I rewrite part of my site and improve it.

The session ID is available by using SID variable. Don't know the size of
it, so how to store it in a table ? wath type of field should I set for it ?

Bob
Jul 17 '05 #10
Bob Bedford wrote:
<snip>
Since, it is a search page and if I'm an enduser, I would expect it
to be a GET. Can't you just transform it to GET? Also, you *can*
actually get the session or any PHP variables in access log.
I'm actually transforming to GET. what a mistake to use POST for

search ! I didn't know. Doesn't matter, I rewrite part of my site and improve it.

Just think about the Google which uses POST; being an enduser, you
lose the ability/flexibility to bookmark the pages, access the history,
hit back & forward, etc. The technical benefits are usually even
covered on the basic books.
The session ID is available by using SID variable. Don't know the size of it, so how to store it in a table ? wath type of field should I set

for it ?

I'm not sure whether you misunderstood my suggestion or not. My
suggestion is to avoid logging of search queries via DB, but to use
access log as a source of data mining--and possibly to extract/compact
the data in access log to a DB through a cron job or so. So, in that
case, if you need to log (in access log) any PHP variables or session
variables, it can be easily achieved. See
<http://www.webmasterworld.com/forum88/2489.htm>

--
<?php echo 'Just another PHP saint'; ?>
Email: rrjanbiah-at-Y!com Blog: http://rajeshanbiah.blogspot.com/

Jul 17 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

15
by: John J | last post by:
I've written the following code into a class to search for and display the results of all races entered (The complete code is in a previous thread). I wish to amend the code so as to display the...
136
by: Matt Kruse | last post by:
http://www.JavascriptToolbox.com/bestpractices/ I started writing this up as a guide for some people who were looking for general tips on how to do things the 'right way' with Javascript. Their...
20
by: hagai26 | last post by:
I am looking for the best and efficient way to replace the first word in a str, like this: "aa to become" -> "/aa/ to become" I know I can use spilt and than join them but I can also use regular...
3
by: Irene | last post by:
Hi all, I have set up a simple VB program (and later on an ASP interface) to manage an Athletics database. I'm using Access 2000. To simplify, I have the Athlets, the Competitions and the...
5
by: l.woods | last post by:
I want your recommendation on which ASP.NET Shopping Cart software I should buy? Best code Best documentation Best support (if needed. I will buying source code, if possible) TIA, Larry...
10
by: jojobar | last post by:
Hello, I am trying to use vs.net 2005 to migrate a project originally in vs.net 2003. I started with creation of a "web site", and then created folders for each component of the site. I read...
10
by: Ren | last post by:
Hi All, I'm still rather new at vb.net and would like to know the proper way to access private varibables in a class. Do I access the variable directly or do I use the public property? ...
4
by: Ned Balzer | last post by:
Hi all, I am pretty new to asp.net; I've done lots of classic asp, but am just beginning to get my mind wrapped around .net. What I'd like to do is include some code that tests if a user is...
24
by: Earl | last post by:
I have all of my data operations in a separate library, so I'm looking for what might be termed "best practices" on a return type from those classes. For example, let's say I send an update from...
11
by: ankitmathur | last post by:
Hi, I'm trying to overcome a situation whereby I have to search through 4-5 columns and produce the results with an order by according to the values matched in these columns. Example: My...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.