By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,495 Members | 994 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,495 IT Pros & Developers. It's quick & easy.

Need something similar to JOIN, but not exactly.

Expert 5K+
P: 8,435
Ok, this is a question I've asked here before, but never found the answer I was looking for. However, at that time I was investigating ways of approaching something, so I'll try simplifying the question to the technical details of one specific approach. Anyway...

Let's say I have a table Table1 with field DateTime1 (date/time).
And I have a table Table2 with fields Started and Finished (both date/time).
(Actually, the field formats should be irrelevant to the discussion).

I want to build some sort of query, if possible, which will group by the values in DateTime1, and for each value, return a count of the number of records in Table2 with (Started <= DateTime1 and Finished >= DateTime1).
In other words, kind of like a simple join, but where two fields "bracket" or "surround" the original field, rather than one field matching it.

Can do?
Nov 15 '06 #1
Share this Question
Share on Google+
44 Replies


NeoPa
Expert Mod 15k+
P: 31,186
Using Between (DateTime1 Between Started And Finished) in your WHERE clause, you can determine if there is a match.
Expand|Select|Wrap|Line Numbers
  1. SELECT DateTime1, Count(DateTime1) AS MatchCount
  2. FROM Table1, Table2
  3. WHERE DateTime1 Between Started And Finished
  4. GROUP BY DateTime1
Nov 16 '06 #2

Expert 5K+
P: 8,435
Using Between (DateTime1 Between Started And Finished) in your WHERE clause, you can determine if there is a match.
Expand|Select|Wrap|Line Numbers
  1. SELECT DateTime1, Count(DateTime1) AS MatchCount
  2. FROM Table1, Table2
  3. WHERE DateTime1 Between Started And Finished
  4. GROUP BY DateTime1
Just heading off home now, but I'll mull this over.

Thanks.
Nov 16 '06 #3

MMcCarthy
Expert Mod 10K+
P: 14,534
Ok, this is a question I've asked here before, but never found the answer I was looking for. However, at that time I was investigating ways of approaching something, so I'll try simplifying the question to the technical details of one specific approach. Anyway...

Let's say I have a table Table1 with field DateTime1 (date/time).
And I have a table Table2 with fields Started and Finished (both date/time).
(Actually, the field formats should be irrelevant to the discussion).

I want to build some sort of query, if possible, which will group by the values in DateTime1, and for each value, return a count of the number of records in Table2 with (Started <= DateTime1 and Finished >= DateTime1).
In other words, kind of like a simple join, but where two fields "bracket" or "surround" the original field, rather than one field matching it.

Can do?
You could also try ...

SELECT DateTime1, Sum(IIf(DLookup("[Started]","Table2","[Started]<=" & [DateTime1]) AND DLookup("[Finished]","Table2","[Finished]>=" & [DateTime1]),1,0))
FROM Table1
GROUP BY DateTime1;

I don't know which will have the better preformance but I'm guessing that because Adrian's has a full outer join mine will be better. Maybe I'll win this one. Although with my luck it won't work.

Don't tell me you're winning the battle on this one Killer.

Mary
Nov 16 '06 #4

NeoPa
Expert Mod 15k+
P: 31,186
SELECT DateTime1, Sum(IIf(DLookup("[Started]","Table2","[Started]<=" & [DateTime1]) AND DLookup("[Finished]","Table2","[Finished]>=" & [DateTime1]),1,0))
FROM Table1
GROUP BY DateTime1;
Actually, because OUTER JOINs don't need either recordset to be sorted to match the join, I would expect it to run quite quickly.
However, if that doesn't prove true then this version of Mary's code might be easier to use (excuse this Mary - I'm not trying to be critical but this area is a strength of mine that most people struggle with to a certain extent - this processes through Table2 only once per Table1 record).
Expand|Select|Wrap|Line Numbers
  1. SELECT DateTime1, _
  2.    DCount('[Started]','Table2', _
  3.    Format([DateTime1],'\#m/d/yyyy\#') & _
  4.    ' Between [Started] And [Finished]')
  5. FROM Table1
  6. GROUP BY DateTime1
Nov 16 '06 #5

MMcCarthy
Expert Mod 10K+
P: 14,534
Actually, because OUTER JOINs don't need either recordset to be sorted to match the join, I would expect it to run quite quickly.
However, if that doesn't prove true then this version of Mary's code might be easier to use (excuse this Mary - I'm not trying to be critical but this area is a strength of mine that most people struggle with to a certain extent - this processes through Table2 only once per Table1 record).
Adrian

What have I told you about criticising me. How very DOUBLE DARE you.

But really ...



You know query preformance is not one of my areas of strength. I love these discussions because I learn more about it all the time.

Feel free to correct my code anytime.

Mary
Nov 16 '06 #6

PEB
Expert 100+
P: 1,418
PEB
He he he
Where i see dlookup, dcount, dsum or something that begins with d

It drives me mad... And don't speek me for performance about a query that have this kind of functions...

So Killer if you want to trat a lot of data with this SQL change the d function with your own! It will run better without giving wrong results and not actualized data!

See f you can use subqueries! But in this case it seems not possible!
Nov 18 '06 #7

NeoPa
Expert Mod 15k+
P: 31,186
Hee Hee.

Yes, Domain Aggregate functions (DThis & DThat), should probably come with a Government Health Warning ;).

However :
They can be useful to provide individual items of data (generally processing through a dataset just the once).
They can be easier to understand when SQL logic gets complicated (if you've never seen really complicated SQL logic - trust me, it can be brain-scrambling).
Generally it's a very good rule of thumb that it's not a good idea to use it relative to a record in a recordset.
This is partly because the setting up and processing of the dataset is done over and over again for each record.

Obviously, there are exceptions to this, many of them we've seen in these forums. Where it's not practical or possible to do it any other way.

When there's a choice - don't use them.
Nov 18 '06 #8

Expert 5K+
P: 8,435
Thanks for all the input, people. Despite what I said on Friday, I haven't had a chance to mull it all over yet.

But when I find the time, I'll definitely be having a go, using the techniques expounded here.

Depending on the performance I can get out of all this (assuming for the moment that I can get it to work), I'll either use the stats directly from the query, or dump them into another table for future reference. Given that the data is static once loaded, the latter may be the way to go, it's just a bit of a nuisance that I'll have to add yet another step or two to the daily load procedure.

Anyway, I'll let you know how it goes, when it goes.
Nov 19 '06 #9

Expert 5K+
P: 8,435
Using Between (DateTime1 Between Started And Finished) in your WHERE clause, you can determine if there is a match.
Expand|Select|Wrap|Line Numbers
  1. SELECT DateTime1, Count(DateTime1) AS MatchCount
  2. FROM Table1, Table2
  3. WHERE DateTime1 Between Started And Finished
  4. GROUP BY DateTime1
I've just tried this query. My CPU is currently redlining, cooling fan going berserk. This has been going on for 5 - 10 minutes now, and this is with only a small sample (maybe two months) of the date/time range.

I guess I'd better have a look at Mary's version. :)

(This was a quick first attempt, and obviously I may have interpreted your suggestion incorrectly - will check it out later.)
Nov 23 '06 #10

MMcCarthy
Expert Mod 10K+
P: 14,534
I've just tried this query. My CPU is currently redlining, cooling fan going berserk. This has been going on for 5 - 10 minutes now, and this is with only a small sample (maybe two months) of the date/time range.

I guess I'd better have a look at Mary's version. :)

(This was a quick first attempt, and obviously I may have interpreted your suggestion incorrectly - will check it out later.)
Let me know if I win!!

Mary
Nov 23 '06 #11

Expert 5K+
P: 8,435
Let me know if I win!!
Ok. But don't forget, the last code posted was NeoPa's enhanced version of your code. So it looks as though it'll be more of a team effort.

Sorry about that. :) I know how you like to hog the glory.

Still running, by the way. Let's see, that's about 1hr 10min, take off 25 minutes to compensate for theScripts adding it:(, so it has been going for about 45 minutes. Given that this is on a relatively small sample of my data, it might be time to interrupt it and try your version.
Nov 23 '06 #12

Expert 5K+
P: 8,435
Sorry, just to clarify...

(Showing my translation of field names etc, not originally posted code)

NeoPa posted some code (which I'm testing now)
Expand|Select|Wrap|Line Numbers
  1. SELECT StartTime, Count(StartTime) AS MatchCount
  2. FROM DateTime2, Log
  3. WHERE StartTime Between Started And Ended
  4. GROUP BY StartTime
Then Mary posted an alternative which I will be trying out next
Expand|Select|Wrap|Line Numbers
  1. SELECT StartTime, Sum(IIf(DLookup("[Started]","Table2","[Started]<=" & [StartTime])
  2.  AND DLookup("[Finished]","Log","[Ended]>=" & [StartTime]),1,0))
  3. FROM DateTime2
  4. GROUP BY StartTime;
Then NeoPa suggested a modified version of that
Expand|Select|Wrap|Line Numbers
  1. SELECT StartTime, _
  2.    DCount('[Started]','Table2', _
  3.    Format([StartTime],'\#m/d/yyyy\#') & _
  4.    ' Between [Started] And [Ended]')
  5. FROM DateTime2
  6. GROUP BY StartTime
Nov 23 '06 #13

MMcCarthy
Expert Mod 10K+
P: 14,534
Ok. But don't forget, the last code posted was NeoPa's enhanced version of your code. So it looks as though it'll be more of a team effort.

Sorry about that. :) I know how you like to hog the glory.

Still running, by the way. Let's see, that's about 1hr 10min, take off 25 minutes to compensate for theScripts adding it:(, so it has been going for about 45 minutes. Given that this is on a relatively small sample of my data, it might be time to interrupt it and try your version.
You know if my version runs faster I'll never let him live it down.

Hopefully.... :)

Mary
Nov 23 '06 #14

MMcCarthy
Expert Mod 10K+
P: 14,534
Damn, I knew it was to good to be true.

But, there is still a small glimmer of hope .... ;)

Mary
Nov 23 '06 #15

Expert 5K+
P: 8,435
Damn, I knew it was to good to be true.
But, there is still a small glimmer of hope .... ;)
Ok, I think I managed to interrupt it (or perhaps it just finished). Produced 72 lines of output. Which may or may not be right - I'll check later. The results look reasonable, though.

Now to try Mary's version...

By the way, I've had to modify this one a few times to get it to start (such as inserting # symbols around the date. Here's the latest...
Expand|Select|Wrap|Line Numbers
  1. SELECT DateTime2.StartTime,
  2. Sum(IIf(DLookUp("[Started]","Log","[Started]<=#" & [StartTime]&"#")
  3. And DLookUp("[Ended]","Log","[Ended]>=#" & [StartTime]&"#"),1,0)) AS Expr1
  4. FROM DateTime2
  5. GROUP BY DateTime2.StartTime;
  6.  
Oh. It finished quite quickly, but produced a few "1" entries, then the rest zero. Hm...

Oh well, on to version 3...
Nov 23 '06 #16

Expert 5K+
P: 8,435
By the way, I probably should mention that I was originally using a VB program to produce these stats. The problem is that it's very slow. I mean, producing the stats for the last 3 years or so, the program ran for quite a few days.

Then when I found that I had needed to make a slight change and re-run it, I decided enough was enough:(. That's why I'm searching for a quicker (and hopefully simpler) way to generate these numbers.

The nice thing is, it means I have (or can easily generate) a complete set of results for comparison.
Nov 23 '06 #17

MMcCarthy
Expert Mod 10K+
P: 14,534
Now you're modifying my code, what is this, a conspiracy?

HEEEELP, the humans are revolting.

I have no idea what I was doing. I can't follow my own code. Now that is bad.

Mary
Nov 23 '06 #18

Expert 5K+
P: 8,435
Now you're modifying my code, what is this, a conspiracy?

HEEEELP, the humans are revolting.
Yeah, they are pretty disgusting... :)

I have no idea what I was doing. I can't follow my own code. Now that is bad.
No comment ;)

I'm considering giving up on the Access approach and going back to VB. But I'm thinking of modifying the technique used there. Originally I was doing something like
Expand|Select|Wrap|Line Numbers
  1. For Each Hour over the desired date/time range
  2.   Generate a query to count the matching records
  3.   Write the results to another table
  4.   Commit
  5. Loop
Note, this was actually complicated by the fact that I was generating a number of different statistics based on relationships between request/start/finish times. In other words, number waiting to start, number active, etc etc etc. I've been simplifying somewhat so far in this discussion.

I'm thinking I come at it from the other direction. Something along these lines
Expand|Select|Wrap|Line Numbers
  1. For every existing record
  2.   Work out which hour it belongs to
  3.   Update (or create) that result record
  4.   Commit (periodically, not every iteration)
  5. Loop
  6.  
I might even build up the results in a big array so that the majority of the time I'm only reading the database, not writing. Then dump them at the end.
Nov 23 '06 #19

MMcCarthy
Expert Mod 10K+
P: 14,534
Ok, this should work but it is only a reworking of Adrians last effort. So I can't take credit.

Damnit.

Mary

Expand|Select|Wrap|Line Numbers
  1. SELECT StartTime, 
  2. DCount("[Started]","Log","[Started]<=#" & [StartTime] & "# 
  3. And [Finished]>=#" & [StartTime] & "#")
  4. FROM DateTime2
  5. GROUP BY StartTime;
Nov 23 '06 #20

MMcCarthy
Expert Mod 10K+
P: 14,534
Assumption:
You are trying to work out by it's date who or what machine was logged on at the time that event took place or something following this logic.

So all you really need to know is:

Find all log records prior to this starttime. If you order the log by started Ascending and return only records where started is prior to this date. If you then check only these records for ended being after this date.

Or, my last attempt:

Expand|Select|Wrap|Line Numbers
  1.  
  2. SELECT StartTime, Sum(IIf([Ended]>= StartTime,1,0)
  3. FROM DateTime2, Log
  4. WHERE [Started] <= StartTime
  5. GROUP BY StartTime;
  6.  
Nov 24 '06 #21

Expert 5K+
P: 8,435
Assumption:
You are trying to work out by it's date who or what machine was logged on at the time that event took place or something following this logic.
'Fraid not.

These are a bunch of log records indicating (as you're probably aware) the date/time that various tasks were requested, started and finished. I am generating stats on the number of tasks which were in various states once each hour, minute or whatever. For example:
  • At dd/mm/yyyy hh:mm, how many tasks had been requested but not yet started?
  • During some particular hour (or minute, day, month, etc), how many tasks were active at any time? Regardless of whether they started during that hour, ended, or ran right through, they need to be counted.
  • And so on.
Each of the individual questions is relatively simple to answer, and I can easily throw together a select query to return the answer for a given entry (time, or range). The tough part is generating a list showing the breakdown per hour, minute, day or whatever.

Some ideas are starting to form as I discuss it. I have to leave it for now, but I'll take a copy home over the weekend and play with it. It's not really urgent - at this point the stats are really just for my own entertainment and edification, though I suspect they will have some use.
Nov 24 '06 #22

MMcCarthy
Expert Mod 10K+
P: 14,534
What's the basic log field format?


'Fraid not.

These are a bunch of log records indicating (as you're probably aware) the date/time that various tasks were requested, started and finished. I am generating stats on the number of tasks which were in various states once each hour, minute or whatever. For example:
  • At dd/mm/yyyy hh:mm, how many tasks had been requested but not yet started?
  • During some particular hour (or minute, day, month, etc), how many tasks were active at any time? Regardless of whether they started during that hour, ended, or ran right through, they need to be counted.
  • And so on.
Each of the individual questions is relatively simple to answer, and I can easily throw together a select query to return the answer for a given entry (time, or range). The tough part is generating a list showing the breakdown per hour, minute, day or whatever.

Some ideas are starting to form as I discuss it. I have to leave it for now, but I'll take a copy home over the weekend and play with it. It's not really urgent - at this point the stats are really just for my own entertainment and edification, though I suspect they will have some use.
Nov 24 '06 #23

Expert 5K+
P: 8,435
What's the basic log field format?
Well, I don't think I'm allowed to say too much about the layout. But for the purposes of these stats, I'm simply counting records. So the only relevant fields are the three I mentioned. They are called Submitted, Started and Ended. All Date/Time format.

In a sense, we've come full-circle again. Because it was this statistical thing which originally prompted me to look into the possibility of UNIONing them all together into one big "table", to make it easier to generate the stats. :)

I may yet end up doing that yet, and reading through it in one fell swoop rather than doing searches/counts. We'll see. On the other hand, I may even extract just the date/time fields from all the different databases and stuff them into one big table. Without the rest of the info, they may fit. All of that is fairly irrelevant at this point, anyway.

Like I said, I'll take some time over the weekend to play around with it. I hope.
Nov 24 '06 #24

NeoPa
Expert Mod 15k+
P: 31,186
Just to comment.
The performance of processing through two unconnected tables, while it may be better than including a DLookUp() within the output processing, will nevertheless be expected to be poor compared with linking two JOINed tables. Probably by some way.
If you're looking at doing this with some of your mega-tables, then you can certainly expect some delays.
I hope you find an efficient way around your problem, but I would expect it to be a non-trivial matter in terms of performance.
BTW My ideas are just worked out on theory and my understanding of how it should work. Don't take any of it as gospel. It's always possible that I've completely left out a factor which could make all the difference. Unlikely to be to such an extent that it's slower than Mary's option though - j/k ;).
Nov 24 '06 #25

Expert 5K+
P: 8,435
The performance of processing through two unconnected tables, ...
I didn't really follow that very well. But it's Friday afternoon and I'm tired. Maybe I'll re-read it at home and it'll sink in. :)

I hope you find an efficient way around your problem, but I would expect it to be a non-trivial matter in terms of performance.
I'm still thinking a sequential pass through the data is the way to go, spitting out stats as I go. Or better still, building up stats in memory and spitting them out at the end. Could mean a lot of memory, though. Let's see, 1440 minutes a day, times 365.25 days a year, that's 525,960 array elements per year. Hm... might have to set some reasonable limits and roll old stuff out of memory.

We'll see...

...Unlikely to be to such an extent that it's slower than Mary's option though - j/k ;).
Ouch!

Lucky you're a moderator.
Nov 24 '06 #26

NeoPa
Expert Mod 15k+
P: 31,186
The performance of processing through two unconnected tables, ...
I didn't really follow that very well. But it's Friday afternoon and I'm tired. Maybe I'll re-read it at home and it'll sink in.
I was referring to the code I posted earlier (post #2) which included two tables without any JOIN at all.
Essentially, this produces output records for every combination of all the records of both tables.

Assume you have two tables each with an ID field. TableA with ID of records A; B; C; D and Table2 with ID of records 1; 2; 3.
The result set of
Expand|Select|Wrap|Line Numbers
  1. SELECT TableA.ID, Table2.ID
  2. FROM TableA, Table2
would be :
Expand|Select|Wrap|Line Numbers
  1. A    1
  2. A    2
  3. A    3
  4. B    1
  5. B    2
  6. B    3
  7. C    1
  8. C    2
  9. C    3
  10. D    1
  11. D    2
  12. D    3
This is obviously a large dataset to come out of two tables, the count of records being the product of the counts of each table.
You can restrict the output using a WHERE clause.
Nov 24 '06 #27

MMcCarthy
Expert Mod 10K+
P: 14,534
It's always possible that I've completely left out a factor which could make all the difference. Unlikely to be to such an extent that it's slower than Mary's option though - j/k ;).
I'm currently planning my revenge which will be when you least expect it. Keep looking over your shoulder and shaking in your boots.

I NEVER forget!

Mary
Nov 25 '06 #28

Expert 5K+
P: 8,435
I was referring to the code I posted earlier ... the count of records being the product of the counts of each table.
Um...

It might interest you to know that I've just counted my data records. There are a bit over 55 million. The other (postulated) table was the minutes for the time range. So if we assume one year, it was around half a million. The product comes out to...

29,018,947,290,120 records.

Hm... 29 trillion records. Per year.

Or would that be 29 billion in the UK? Either way, I think it's fair to say that's a pretty large result set.

For now though, I might look into the different approach I've been thinking over.
Nov 25 '06 #29

NeoPa
Expert Mod 15k+
P: 31,186
Maaarryyy,

I'm not scared of you! :Help someone:
That quiver in my voice is simply due to the cold - really it is.

Killer,

I hear what you're saying.
That IS quite a few records.
A point to bear in mind though, that's going to be a mega operation no matter how it's coded.
I would expect a Dblah() call for every one of the 55 million records would be worse, however.
Do you fancy testing the hypothesis (shouldn't take long).

That's a good point about the English Billion.
I believe it predated the American Billion but nevertheless, is very rarely quoted nowadays. Everyone seems to have adopted the American version.
Nov 25 '06 #30

MMcCarthy
Expert Mod 10K+
P: 14,534
Maaarryyy,

I'm not scared of you! :Help someone:
That quiver in my voice is simply due to the cold - really it is.
Be Afraid, Be Very Very Afraid. (HA HA HA HA - Igor tone)

That's a good point about the English Billion.
I believe it predated the American Billion but nevertheless, is very rarely quoted nowadays. Everyone seems to have adopted the American version.
What's an American Billion?

I'm so old I'm only familiar with the English version. ;)

Mary
Nov 25 '06 #31

Expert 5K+
P: 8,435
Be Afraid, Be Very Very Afraid. (HA HA HA HA - Igor tone)
What's an American Billion?
I'm so old I'm only familiar with the English version. ;)
In American...

1,000,000 = One Million (duh!)
1,000,000,000 = One billion
1,000,000,000,000 = One trillion
Nov 25 '06 #32

NeoPa
Expert Mod 15k+
P: 31,186
You and me both, Mary.
We should form an oldies club of some kind (blantantly trying to squirm onto her good side).

For anyone else who may really be confused :
English billion = 1,000,000,000,000 or million ^ 2.
American billion = 1,000,000,000 or a thousand million.
Nov 25 '06 #33

MMcCarthy
Expert Mod 10K+
P: 14,534
In American...

1,000,000 = One Million (duh!)
1,000,000,000 = One billion
1,000,000,000,000 = One trillion
That's just stupid. Sorry Yanks! :)

A million million is a billion. I remember the rhyme. So all those American Billionaires aren't really billionaires. They just made up a number so they could pretend they were.

Mary
Nov 25 '06 #34

Expert 5K+
P: 8,435
A point to bear in mind though, that's going to be a mega operation no matter how it's coded.
I would expect a Dblah() call for every one of the 55 million records would be worse, however.
Do you fancy testing the hypothesis (shouldn't take long).
:D Bwahahaha...

I'm still leaning toward the sequential run through the 55 million, building up the stats in memory as I go. Have started coding in VB6, so I'm a bit reluctant to change direction now. Especially since I can only get back to it from time to time bewteen doing other things. Like endless conversations on thescripts, for a start. ;)

It's Saturday afternoon here, and too hot to think.

(Also, keep in mind I have to be able to process each day's new records, without taking a week for each.)
Nov 25 '06 #35

MMcCarthy
Expert Mod 10K+
P: 14,534
You and me both, Mary.
We should form an oldies club of some kind (blantantly trying to squirm onto her good side).
You think calling me an oldie is going to do that ?

Mary :)
Nov 25 '06 #36

Expert 5K+
P: 8,435
That's just stupid. Sorry Yanks! :)
There seem to be surprisingly few here. Or perhaps I just get that impression because I largely deal with you and Adrian.
A million million is a billion. I remember the rhyme. So all those American Billionaires aren't really billionaires. They just made up a number so they could pretend they were.
Well, I must admit that "billionaire" sounds better than "thousand-millionaire".
Nov 25 '06 #37

MMcCarthy
Expert Mod 10K+
P: 14,534

It's Saturday afternoon here, and too hot to think.
What have I told you about complaining about the weather.

It's been raining here ALL day. How do you think we get all that grass in the Fair Green Isle.

Mary
Nov 25 '06 #38

NeoPa
Expert Mod 15k+
P: 31,186
It's Saturday afternoon here, and too hot to think.
BITCH (and I don't care WHAT gender you are)!
It's the middle of the night here and I'm cold to the bone.
I had a bike accident in some attrocious weather last week.
...And you're suffering from too much sunshine!
Ouch.
Nov 25 '06 #39

MMcCarthy
Expert Mod 10K+
P: 14,534
I know this sounds terrible but I'm not in the humour of giving coding advice tonight. I had to run two one 2 one training sessions today on Access but with two different projects.

I think I'm adviced out. (If either of you correct my grammer here you're DEAD)

Mary
Nov 25 '06 #40

NeoPa
Expert Mod 15k+
P: 31,186
I think I'm adviced out. (If either of you correct my grammer here you're DEAD)
Not Grammar Mary, spelling!

Ow, Ouch, Oooooowwww!

(Picture NeoPa sprinting down the road (past Pat!) with Mary running close behind (belying her age) with a rolling pin in her outstretched right hand).

(OK - I know I'm dead :( )
Nov 25 '06 #41

MMcCarthy
Expert Mod 10K+
P: 14,534
Not Grammar Mary, spelling!

Ow, Ouch, Oooooowwww!

(Picture NeoPa sprinting down the road (past Pat!) with Mary running close behind (belying her age) with a rolling pin in her outstretched right hand).

(OK - I know I'm dead :( )
Nor only dead but WRONG

I meant Advice not Advise which would indicate I'd been receiving rather than imparting said Advice.

Mary (Sticking her tongue out while running down the road with the rolling pin)

How did you know I was right handed?
Nov 25 '06 #42

NeoPa
Expert Mod 15k+
P: 31,186
Sorry Mary, but I was referring to the spelling of 'Grammer' :S
Nov 25 '06 #43

MMcCarthy
Expert Mod 10K+
P: 14,534
Sorry Mary, but I was referring to the spelling of 'Grammer' :S
I HATE you! :)

I didn't even see that.

Mary

STOP Grinning, I can see you ....
Nov 25 '06 #44

Expert 5K+
P: 8,435
Well, I managed to get my stats generated from the almost-four-years of data - you know, the 55 million records. Did it in VB6 again.

The old way
The process I had used before with some success was to somewhat arbitrarily set a start and end value for the date/time range I was interested in. Then I ran a loop through that range, incrementing by one hour each time. For each time, I generated a bunch of queries to count the records which matched various rules. For example, to find the tasks which were outstanding (not started) at that point in time, I would execute
Expand|Select|Wrap|Line Numbers
  1. SELECT Count <somefield> AS TheCount
  2.   FROM Log
  3.   WHERE ([Submitted] <= #sometime#
  4.   AND [Started] >= #sometime# 
and similarly for other combinations. Given that it was reading each record in turn, and performing a number of individual queries for each one, the performance was not too bad. But to run through all the millions of records took quite a few days.

All-new and improved :D
The new technique is to examine the data and get the highest and lowest date/time values, then set up an array and populate it with every hour between those two extremes. The fun part is that to produce stats per hour, I have... oh I forget, but tens of thousands of array elements. To do stats per minute, the array size is around two million entries. :) Iím glad I didnít attempt this back when arrays were limited to 64KB or whatever.
So, we have our array of hours, made up of a user-defined type like so...
Expand|Select|Wrap|Line Numbers
  1. Private Type StatsType
  2.   StartTime As Date    ' Start of the period (hour, or whatever)
  3.   EndTime As Date      ' End of the period (hour, or whatever)
  4.  
  5.   Submitted As Long    ' Tasks which were submitted during the period.
  6.   Started As Long      ' Tasks which started during the period.
  7.   Ended As Long        ' Tasks which ended during the period.
  8.   Waiting As Long      ' Outstanding tasks (snapshot, start of period)
  9.   InProgress As Long   ' Tasks in progress (snapshot, start of period)
  10.   Active As Long       ' Total tasks active any time over the period
  11. End Type
Populating the array initially is done like so...
Expand|Select|Wrap|Line Numbers
  1. StatsCount = DateDiff("h", MinDate, MaxDate)
  2. ReDim Stats(Zero To StatsCount)
  3. Stats(Zero).StartTime = MinDate
  4. Stats(Zero).EndTime = DateAdd("s", 3599, MinDate)
  5. For I = One To StatsCount
  6.   With Stats(I)
  7.     .StartTime = DateAdd("h", One, Stats(I - One).StartTime)
  8.     .EndTime = DateAdd("s", 3599, .StartTime)
  9.   End With
  10. Next
Then I scan through the data once, using a UNION query I created in Access, containing just the three relevant date/time fields. For each record, I determine the first and last entries in the array which could be affected. ThenÖ
Expand|Select|Wrap|Line Numbers
  1. Stats(Entry_Sub).Submitted = Stats(Entry_Sub).Submitted + One
  2. Stats(Entry_Sta).Started = Stats(Entry_Sta).Started + One
  3. Stats(Entry_End).Ended = Stats(Entry_End).Ended + One
  4. For I = Entry_Sub To Entry_End
  5.   With Stats(I)
  6.     If Value_Sub <= .StartTime And Value_Sta > .StartTime Then
  7.       .Waiting = .Waiting + One
  8.     End If
  9.     If Value_Sta <= .StartTime And Value_End >= .StartTime Then
  10.       .InProgress = . InProgress + One
  11.     End If
  12.     If Value_Sta <= .EndTime And Value_End >= .StartTime Then
  13.       .Active = .Active + One
  14.     End If
  15.   End With
  16. Next
I hope this is making some kind of sense.

Anyway, the upshot is that I am producing what look like reasonable stats (still to be verified), and the entire 55 million records took about 34 minutes to produce. And thatís on my home PC which is a bit slower than the one at work.

For now Iím just writing it out to a tab-delimited text file, will eventually import the file to Access. What I still have to decide is how Iím going to generate the stats on an ongoing basis for each dayís new data. Oh well, plenty of time for that. I probably wonít have a chance to look into it until Iím on holidays. Only a couple of weeks leftÖ :)
Nov 26 '06 #45

Post your reply

Sign in to post your reply or Sign up for a free account.