By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,903 Members | 1,977 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,903 IT Pros & Developers. It's quick & easy.

Complex(?) Access Query/Excel Question

P: 23
Hey everybody! This is my first post here, so hopefully I will be able to make it as clear as possible. I have a very large database (over 1,000,000 entries) and my current goal is to find out the number of 'sessions' that a person does during his or her lifetime in the database.

I have a table that has records of each person (each have their own unique ID), and the date of that person's visit. I have defined a session as being complete if the person does not come back for a visit in a 3 week period. After the 3 week period, if the person comes back for another visit, a new session is started.

There are thousands of unique persons in the database, and a person will often visit many many times.

My Goal: to determine how many 'sessions' each patient does.

I was thinking of exporting the Query that shows the patients and dates to Excel, and play with the data there, but as I said there are many many entries, and I am using Access/Excel 2003 so I am limited to about 65,000 rows in Excel.

I am a total novice in both Access and VBA, although quite experienced in Excel. Any help would be GREATLY appreciated!!

Thank you so much for your time, and I will try to make things more clear if there are any questions. Have a great day!

-Chris
Jul 31 '07 #1
Share this Question
Share on Google+
37 Replies


P: 55
Hey everybody! This is my first post here, so hopefully I will be able to make it as clear as possible. I have a very large database (over 1,000,000 entries) and my current goal is to find out the number of 'sessions' that a person does during his or her lifetime in the database.

I have a table that has records of each person (each have their own unique ID), and the date of that person's visit. I have defined a session as being complete if the person does not come back for a visit in a 3 week period. After the 3 week period, if the person comes back for another visit, a new session is started.

There are thousands of unique persons in the database, and a person will often visit many many times.

My Goal: to determine how many 'sessions' each patient does.

I was thinking of exporting the Query that shows the patients and dates to Excel, and play with the data there, but as I said there are many many entries, and I am using Access/Excel 2003 so I am limited to about 65,000 rows in Excel.

I am a total novice in both Access and VBA, although quite experienced in Excel. Any help would be GREATLY appreciated!!

Thank you so much for your time, and I will try to make things more clear if there are any questions. Have a great day!

-Chris

I am fairly new to this board, and answering questions. However, you should try setting criteria. So if Date() > 3 weeks (ie. 21 days) then show all the times that person has had a "session." From there you could have the count function tally up the number of times this person/s has attended one of your "sessions."
Jul 31 '07 #2

P: 23
Thanks very much for your help. I am still a bit confused though, where would I use that? In the Design portion of the Query? In VBA?

Thanks again!
Jul 31 '07 #3

P: 55
Thanks very much for your help. I am still a bit confused though, where would I use that? In the Design portion of the Query? In VBA?

Thanks again!

In the design of the query itself. There is a tab that states "criteria."
Jul 31 '07 #4

Rabbit
Expert Mod 10K+
P: 12,383
I didn't test this. It may run into problems with nulls but that can be fixed if that comes up.
Expand|Select|Wrap|Line Numbers
  1. Query1:
  2. SELECT PatientID, EventDate
  3. FROM SomeTable
  4. ORDER BY PatientID, EventDate;
  5.  
  6. Query2:
  7. SELECT PatientID, EventDate
  8. FROM Query1 AS x
  9. WHERE DateDiff("d", EventDate, (SELECT TOP 1 EventDate FROM Query1 WHERE PatientID = x.PatientID AND EventDate < x.EventDate ORDER BY EventDate DESC;)) > 21;
  10.  
Jul 31 '07 #5

P: 23
In the design of the query itself. There is a tab that states "criteria."
But how do I set it so that it calculates the difference between the two dates?
I know the criteria field you are talking about...
Jul 31 '07 #6

Rabbit
Expert Mod 10K+
P: 12,383
Seems I was right on what would cause a problem.

Expand|Select|Wrap|Line Numbers
  1. SELECT x.PatientID, x.EventDate
  2. FROM Query1 AS x
  3. WHERE (((DateDiff("d",Nz((SELECT TOP 1 EventDate FROM Table1 WHERE PatientID = x.PatientID AND EventDate < x.EventDate ORDER BY EventDate DESC;),0),[EventDate]))>21));
  4.  
Jul 31 '07 #7

P: 23
Thanks so much for your help Rabbit.

In the code where you referenced Table 1, do you mean Query 1?

-Chris
Jul 31 '07 #8

Rabbit
Expert Mod 10K+
P: 12,383
Yeah, I meant Query1
Jul 31 '07 #9

Rabbit
Expert Mod 10K+
P: 12,383
This query doesn't do a count, it just leaves the first record that marks the beginning of a session.
Jul 31 '07 #10

Rabbit
Expert Mod 10K+
P: 12,383
For a count though, you could try:
Expand|Select|Wrap|Line Numbers
  1. SELECT x.PatientID, Count(x.EventDate) AS CountOfSessions
  2. FROM Query1 AS x
  3. GROUP BY PatientID
  4. HAVING (((DateDiff("d",Nz((SELECT TOP 1 EventDate FROM Query1 WHERE PatientID = x.PatientID AND EventDate < x.EventDate ORDER BY EventDate DESC;),0),[EventDate]))>21));
  5.  
Jul 31 '07 #11

P: 23
This query doesn't do a count, it just leaves the first record that marks the beginning of a session.
Awesome, that's great!

Next question (sorry for so many!)

I am running the query as you said (hopefully, anyways).

How long would you expect it to take on about 1,00,000 entries, on a Core Duo machine at 2ghz? Only one core is being used, but it is being fully utilized. The query has been running for over 10 minutes now...is something going wrong?
Jul 31 '07 #12

Rabbit
Expert Mod 10K+
P: 12,383
Awesome, that's great!

Next question (sorry for so many!)

I am running the query as you said (hopefully, anyways).

How long would you expect it to take on about 1,00,000 entries, on a Core Duo machine at 2ghz? Only one core is being used, but it is being fully utilized. The query has been running for over 10 minutes now...is something going wrong?
I would expect it to take a while. It's not just doing 1 million runs, it's doing multiple runs for each record so I expect it won't finish for a bit. I once did a non-cartesian join, joining 20,000 to 1.2 million records that that took 10 hours. I don't expect yours to take quite that long but it may be a while. If you want to see if it works, you could try grabbing a subset of the records in another table and then testing it on that.

It's going to take even longer because it has to sort the records and a million records is a lot to sort.
Jul 31 '07 #13

Rabbit
Expert Mod 10K+
P: 12,383
For a count though, you could try:
Expand|Select|Wrap|Line Numbers
  1. SELECT x.PatientID, Count(x.EventDate) AS CountOfSessions
  2. FROM Query1 AS x
  3. GROUP BY PatientID
  4. HAVING (((DateDiff("d",Nz((SELECT TOP 1 EventDate FROM Query1 WHERE PatientID = x.PatientID AND EventDate < x.EventDate ORDER BY EventDate DESC;),0),[EventDate]))>21));
  5.  
Scratch that, I just tested it. Although the previous may work, it's untested so to be safe it should be:

Expand|Select|Wrap|Line Numbers
  1. SELECT x.PatientID, Count(x.EventDate) AS CountOfSessions
  2. FROM Query1 AS x
  3. WHERE (((DateDiff("d",Nz((SELECT TOP 1 EventDate FROM Query1 WHERE PatientID = x.PatientID AND EventDate < x.EventDate ORDER BY EventDate DESC;),0),[EventDate]))>21))
  4. GROUP BY PatientID;
  5.  
Jul 31 '07 #14

P: 23
I would expect it to take a while. It's not just doing 1 million runs, it's doing multiple runs for each record so I expect it won't finish for a bit. I once did a non-cartesian join, joining 20,000 to 1.2 million records that that took 10 hours. I don't expect yours to take quite that long but it may be a while. If you want to see if it works, you could try grabbing a subset of the records in another table and then testing it on that.

It's going to take even longer because it has to sort the records and a million records is a lot to sort.
(Hopefully) last question in this thread:

Is there a way to safely cancel the query that is currently running, so that I can try it on a subset? Or alternatively, is the a way to monitor the progress of the running query and get an estimated completion time?
Jul 31 '07 #15

Rabbit
Expert Mod 10K+
P: 12,383
There's no way to monitor the progress that I know of unless you do some VBA stuff.

I think Ctrl+Break will cancel the SQL though.
Jul 31 '07 #16

P: 55
(Hopefully) last question in this thread:

Is there a way to safely cancel the query that is currently running, so that I can try it on a subset? Or alternatively, is the a way to monitor the progress of the running query and get an estimated completion time?

Control break stops it and is this somethign you are looking for???



declare @output varchar(100)
declare @i int
declare @st datetime

set @i = 0
set @st = getdate()
while @i < 2000
begin
select @output=OrderId from Northwind.dbo.Orders
set @i = @i + 1
end
print 'First batch completed in: ' +
rtrim(cast(datediff(ss,@st,getdate()) as char(10))) +
' seconds!'
set @i = 0
set @st = getdate()
while @i < 4000
begin
select @output=OrderId from Northwind.dbo.[Order Details]
set @i = @i + 1
end
print 'Second batch completed in: ' +
rtrim(cast(datediff(ss,@st,getdate()) as char(10))) +
' seconds!'
Jul 31 '07 #17

P: 23
Scratch that, I just tested it. Although the previous may work, it's untested so to be safe it should be:

Expand|Select|Wrap|Line Numbers
  1. SELECT x.PatientID, Count(x.EventDate) AS CountOfSessions
  2. FROM Query1 AS x
  3. WHERE (((DateDiff("d",Nz((SELECT TOP 1 EventDate FROM Query1 WHERE PatientID = x.PatientID AND EventDate < x.EventDate ORDER BY EventDate DESC;),0),[EventDate]))>21))
  4. GROUP BY PatientID;
  5.  
For the total noob:

Does x represent the name of the table? Or do I just leave x as 'x' ?
Jul 31 '07 #18

Rabbit
Expert Mod 10K+
P: 12,383
For the total noob:

Does x represent the name of the table? Or do I just leave x as 'x' ?
x is an alias for the table/query.

I need to refer to the same query in the subquery so I had to give the query a different name. So leave the x as is.
Jul 31 '07 #19

Rabbit
Expert Mod 10K+
P: 12,383
Control break stops it and is this somethign you are looking for???



declare @output varchar(100)
declare @i int
declare @st datetime

set @i = 0
set @st = getdate()
while @i < 2000
begin
select @output=OrderId from Northwind.dbo.Orders
set @i = @i + 1
end
print 'First batch completed in: ' +
rtrim(cast(datediff(ss,@st,getdate()) as char(10))) +
' seconds!'
set @i = 0
set @st = getdate()
while @i < 4000
begin
select @output=OrderId from Northwind.dbo.[Order Details]
set @i = @i + 1
end
print 'Second batch completed in: ' +
rtrim(cast(datediff(ss,@st,getdate()) as char(10))) +
' seconds!'
This isn't Access is it? This is Oracle? or SQL Server?
Jul 31 '07 #20

Rabbit
Expert Mod 10K+
P: 12,383
The only thing you need to change is PatientID or EventDate to match your field names and:
Query1 if you named the first query something else.
SomeTable to the original table.
Jul 31 '07 #21

P: 23
The only thing you need to change is PatientID or EventDate to match your field names and:
Query1 if you named the first query something else.
SomeTable to the original table.
Using this code:

Expand|Select|Wrap|Line Numbers
  1. SELECT x.[Patient ID], Count(x.Date) AS CountOfSessions
  2. FROM Query1 AS x
  3. Where (((DateDiff("d",Nz((SELECT TOP 1 Date FROM Query1 WHERE [Patient ID] = x.[Patient ID] AND Date < x.Date ORDER BY Date DESC;),0),[Date]))>21)) ;
Access give me this error:

"You tried to execute a query that does not include the specified expression 'Patient ID" as part of an aggregate function.



Notes: the original code had included the word Query1 in front of Patient ID and Date, as that is the name of the Query (SELECT Query1.[Patient ID], etc.)

And: CountOfSessions, is that an arbitrary name? The name of a table?

Thanks again for all your help

*Edited. Query name is indeed Query1
Jul 31 '07 #22

P: 55
This isn't Access is it? This is Oracle? or SQL Server?

hmmm...I don't know. It might be SQL server, now that I look at it.
Jul 31 '07 #23

P: 23
Additional Information: Modifying the above code to read:

Expand|Select|Wrap|Line Numbers
  1. SELECT x.Query1.[Patient ID] AS Expr1, Count(x.Query1.Date) AS CountOfSessions
  2. FROM Query1 AS x
  3. WHERE (((DateDiff("d",Nz((SELECT TOP 1 Query1.Date FROM Query1 WHERE Query1.[Patient ID] = x.Query1.[Patient ID] AND Query1.Date < x.Query1.Date ORDER BY Query1.Date DESC;),0),[Query1].[Date]))>21));
gives me 3 windows when I try to run the query, asking me to enter a Parameter Value for x.Query1.PatientID and x.Query1.Date and Query1.Date.

Entering nothing for those 3 values (just clicking OK) gives me a CountOfSessions value of 0
Jul 31 '07 #24

Rabbit
Expert Mod 10K+
P: 12,383
Using this code:

Expand|Select|Wrap|Line Numbers
  1. SELECT x.[Patient ID], Count(x.Date) AS CountOfSessions
  2. FROM Query1 AS x
  3. Where (((DateDiff("d",Nz((SELECT TOP 1 Date FROM Query1 WHERE [Patient ID] = x.[Patient ID] AND Date < x.Date ORDER BY Date DESC;),0),[Date]))>21)) ;
Access give me this error:

"You tried to execute a query that does not include the specified expression 'Patient ID" as part of an aggregate function.



Notes: the original code had included the word Query1 in front of Patient ID and Date, as that is the name of the Query (SELECT Query1.[Patient ID], etc.)

And: CountOfSessions, is that an arbitrary name? The name of a table?

Thanks again for all your help

*Edited. Query name is indeed Query1
Use:
Expand|Select|Wrap|Line Numbers
  1. SELECT x.[Patient ID], Count(x.Date) AS CountOfSessions
  2. FROM Query1 AS x
  3. Where (((DateDiff("d",Nz((SELECT TOP 1 Date FROM Query1 WHERE [Patient ID] = x.[Patient ID] AND Date < x.Date ORDER BY Date DESC;),0),[Date]))>21))
  4. GROUP BY x.[Patient ID];
  5.  
CountOfSessions is another Alias as the name of a field can't be Count(x.Date)
Jul 31 '07 #25

Rabbit
Expert Mod 10K+
P: 12,383
Oh, and just to be clear on something.
1/1/07
1/14/07
1/31/07
Are considered the same session because there's less than 3 weeks in between visits even though the first visit is more than 3 weeks from the last visit. Is this what you want?
Jul 31 '07 #26

Rabbit
Expert Mod 10K+
P: 12,383
Additional Information: Modifying the above code to read:

Expand|Select|Wrap|Line Numbers
  1. SELECT x.Query1.[Patient ID] AS Expr1, Count(x.Query1.Date) AS CountOfSessions
  2. FROM Query1 AS x
  3. WHERE (((DateDiff("d",Nz((SELECT TOP 1 Query1.Date FROM Query1 WHERE Query1.[Patient ID] = x.Query1.[Patient ID] AND Query1.Date < x.Query1.Date ORDER BY Query1.Date DESC;),0),[Query1].[Date]))>21));
gives me 3 windows when I try to run the query, asking me to enter a Parameter Value for x.Query1.PatientID and x.Query1.Date and Query1.Date.

Entering nothing for those 3 values (just clicking OK) gives me a CountOfSessions value of 0
That's because it has no idea what x.Query1.--- is.
Jul 31 '07 #27

P: 23
Oh, and just to be clear on something.
1/1/07
1/14/07
1/31/07
Are considered the same session because there's less than 3 weeks in between visits even though the first visit is more than 3 weeks from the last visit. Is this what you want?
That's exactly what I want. I am back to the database tomorrow (at work) so I will keep plugging away, Thanks for all your help!
Aug 1 '07 #28

P: 23
Use:
Expand|Select|Wrap|Line Numbers
  1. SELECT x.[Patient ID], Count(x.Date) AS CountOfSessions
  2. FROM Query1 AS x
  3. Where (((DateDiff("d",Nz((SELECT TOP 1 Date FROM Query1 WHERE [Patient ID] = x.[Patient ID] AND Date < x.Date ORDER BY Date DESC;),0),[Date]))>21))
  4. GROUP BY x.[Patient ID];
  5.  
CountOfSessions is another Alias as the name of a field can't be Count(x.Date)

Using that code causes Access to crash
"Microsoft office Acces has encountered a problem and needs to close. We are sorry for the inconvenience"

With the option to Repair my open database and restart Access.

Any ideas why?
Aug 1 '07 #29

Rabbit
Expert Mod 10K+
P: 12,383
Using that code causes Access to crash
"Microsoft office Acces has encountered a problem and needs to close. We are sorry for the inconvenience"

With the option to Repair my open database and restart Access.

Any ideas why?
No idea why, that's the same code I use except for different field names in my test.
Your database might be corrupted. Try a compact and repair, if that doesn't work, try importing the table into a new database and redo the query, and if that still doesn't work make a new table and new query in a new database.
Aug 1 '07 #30

P: 23
No idea why, that's the same code I use except for different field names in my test.
Your database might be corrupted. Try a compact and repair, if that doesn't work, try importing the table into a new database and redo the query, and if that still doesn't work make a new table and new query in a new database.
I just upgraded to Office 2007, same problem with the crash.
Every time it crashes Access does compact and repair.

Does it matter that Query1 is based off of another Query?
Aug 1 '07 #31

Rabbit
Expert Mod 10K+
P: 12,383
I just upgraded to Office 2007, same problem with the crash.
Every time it crashes Access does compact and repair.

Does it matter that Query1 is based off of another Query?
No, it shouldn't. Try starting over in a new database with just the bare necessities and a sample set of data and see if it works there.

All you really need is one table with PatientID and EventDate, a few records to test, and the query could run right off the table. If that works, start making more and more like the original until you hit the problem. You could try importing the stuff into the new database first but if that doesn't work try recreating a simple scenario.

All I did to test was I created a Table with two fields and I ran the query on the table. That worked fine for me. I was using 2003. I haven't used 2007 yet.
Aug 1 '07 #32

P: 23
Me again.

I made a new Query, and changed the names of the fields of the other query which was being referenced to PatientID and VisitDate

I was thinking that maybe using Date as a field name may have been causing some complications.

So my code is now:

Expand|Select|Wrap|Line Numbers
  1. SELECT x.Query2.PatientID AS Expr1, Count(x.Query2.VisitDate) AS CountOfSessions
  2. FROM Query2 AS x
  3. WHERE (((DateDiff("d",Nz((SELECT TOP 1 Query2.VisitDate FROM Query2 WHERE Query2.PatientID = x.Query2.PatientID AND Query2.VisitDate < x.Query2.VisitDate ORDER BY Query2.VisitDate DESC;),0),[Query2].[VisitDate]))>21));
Running this now doesn't crash Access, but when I run it 3 boxes pop up before the Query runs, asking me for parameter values for:
x.Query2.PatientID, x.Query2.VisitDate, and Query2.VisitDate


Any ideas?
Aug 2 '07 #33

Rabbit
Expert Mod 10K+
P: 12,383
Just x. not x.Query2, refer to post #27. And that very last VisitDate should be x. not Query2.
Aug 2 '07 #34

P: 23
Just x. not x.Query2, refer to post #27. And that very last VisitDate should be x. not Query2.
So should this be the code:

Expand|Select|Wrap|Line Numbers
  1. SELECT x.PatientID AS Expr1, Count(x.VisitDate) AS CountOfSessions
  2. FROM Query2 AS x
  3. WHERE (((DateDiff("d",Nz((SELECT TOP 1 Query2.VisitDate FROM Query2 WHERE Query2.PatientID = x.PatientID AND Query2.VisitDate < x.VisitDate ORDER BY Query2.VisitDate DESC;),0),[x].[VisitDate]))>21));
This gives the error: "You tried to execute a query that does not include the specified expression 'Expr1' as part of an aggregate function'

This seems to be the Query from doom! (For me at least)
Aug 2 '07 #35

Rabbit
Expert Mod 10K+
P: 12,383
It's now an aggregate query because of the Count in which case you need the Group By clause. Refer to post #25.

You also don't need that As Expr1
Aug 2 '07 #36

P: 23
Alright, I think I am on the right track.

I am running this query right now:

Expand|Select|Wrap|Line Numbers
  1. SELECT x.PatientID, Count(x.VisitDate) AS CountOfSessions
  2. FROM Query3 AS x
  3. WHERE (((DateDiff("d",Nz((SELECT TOP 1 VisitDate FROM Query2 WHERE PatientID = x.PatientID AND VisitDate < x.VisitDate ORDER BY VisitDate DESC;),0),[VisitDate]))>21))
  4. GROUP BY x.PatientID;
Where Patientid and VisitDate are in located in Query 3.

The Query began to run with no errors, but I am worried that it may not be working as I began the query at about 9:00am EST and it is just about noon here, and the query is still chugging away.

I realize that there are a lot of entries and that it will take time (about 1.21 million entries), but how long is too long to wait?

Specs again: 2ghz Core Duo laptop with 1gig Ram. It's a shame that Access is only using one core, although it leaves me the ability to do other things on the computer.
Aug 3 '07 #37

Rabbit
Expert Mod 10K+
P: 12,383
Alright, I think I am on the right track.

I am running this query right now:

Expand|Select|Wrap|Line Numbers
  1. SELECT x.PatientID, Count(x.VisitDate) AS CountOfSessions
  2. FROM Query3 AS x
  3. WHERE (((DateDiff("d",Nz((SELECT TOP 1 VisitDate FROM Query2 WHERE PatientID = x.PatientID AND VisitDate < x.VisitDate ORDER BY VisitDate DESC;),0),[VisitDate]))>21))
  4. GROUP BY x.PatientID;
Where Patientid and VisitDate are in located in Query 3.

The Query began to run with no errors, but I am worried that it may not be working as I began the query at about 9:00am EST and it is just about noon here, and the query is still chugging away.

I realize that there are a lot of entries and that it will take time (about 1.21 million entries), but how long is too long to wait?

Specs again: 2ghz Core Duo laptop with 1gig Ram. It's a shame that Access is only using one core, although it leaves me the ability to do other things on the computer.
I have no idea, like I said, the last time I did something of this scale it took me 10 hours and my computer is better than the one you're using.

Have you tried it on a subset first to see if it works?
Aug 3 '07 #38

Post your reply

Sign in to post your reply or Sign up for a free account.