By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,589 Members | 1,194 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,589 IT Pros & Developers. It's quick & easy.

Back to the basics...

P: n/a
Folks,

I'm just now beginning to use Access. To date, I've managed hundreds
of thousands of records in SAS, but would like to expand my software
horizon. I've read some and sat throught some classes, but one of
things that I can't seem to get my head wrapped around is this: if
you're given a large table that has missing and incomplete records and
you want to clean that data up, (replace missing dates using the
previous record and randomly assign records a value based on a number
of criteria for example), do you actually modify the original table or
simpy create queries that create "temporary" tables that contain the
new and improved records?

I know that in SAS, you may have several data steps and several
temporary data sets, but when you get the data like you want it, you
create a permanent SAS data set that is stored on your computer or a
server. In the end, your raw data remains in tact and you also have a
cleaned copy of that data for subsequent analyses. All of the
intermediate data sets are gone when you exit SAS. Perhaps queries in
Access can be viewed as intermediate data steps. And I suppose that if
you wanted, you could create a permanent copy of the cleaned table once
you got the data just the way you wanted it. Does this make any sense
at all?

Mike

Apr 18 '06 #1
Share this Question
Share on Google+
8 Replies


P: n/a
only if every field in your table is really effed up. no need to create
a completely seperate table.

i'd say the approach i prefer is to just use update queries to
supplement data where missing using an Is Null criteria in the records
i'd like to change. you may have to supplement that Is Null to handle
empty strings. allen browne has a good discussion on nulls and empty
sets on his website.

if your data is in really bad shape you may want to create an Append
query to move data out of the current table and into a new one, but to
avoid asynchronous data you'll have to be very careful to specify
absolutely all correctable possibilities when the table is first
updated.

like i said, i'd use Update queries and take it field by field or a
couple fields at a time and keep things in a more or less piecemeal
fashion to avoid missing things.

if you are pretty well rebuilding a table from the ground-up, you'll
probably want to take a close look at your data normalization at the
same time to ensure you don't have to go through this process more than
once.

Apr 18 '06 #2

P: n/a
Often, you will have a raw table of data.

You certainly might make a copy of this table, and then work on it....

You might run several update queries to set a date value for all columns
that don't have a date.

You then might delete all recodes that have some status value of a
particular value. You can continue these updates and cleaning as much as you
want.

However, each time you make a update, or clean out some data, you are fact
modifying the table you are working on.

If you just are doing some selects and query of data, then the results are
not saved anywhere, but you can save the query. This is an advantage in MOST
cases, since if you have a query that returns the data you want, and then
main table is modified, then that query will always return the correct data.
(if you add, or delete records in the table..then the query will reflect
his). So, a query is not really a snapshot of the data, and in MOST cases
you don't want that.

However, if you finally do get your query the way you want, you can simply
change the query in the sql builder to a append query..and run it....you
will then have new table with just this data. So, it is very easy to "save"
the data you slice and dice..it just a question of you deciding to do this
(access does not have some type of session or such...but it is easy and up
to you if you want to save the data to a new table for history reasons.

It is also not clear how your previously cleaned up data, and removed data,
and set values of data without suing some type of query, or programming
system that let you do this....(or, perhaps you used sql in these other
systems...and you can/should do the same in ms-access).

In ms-access, we typically use the query builder to modify and clean up
data....

--
Albert D. Kallal (Access MVP)
Edmonton, Alberta Canada
pl*****************@msn.com
http://www.members.shaw.ca/AlbertKallal
Apr 18 '06 #3

P: n/a
Hey Jamey - Thanks a lot for the insight. That's exactly the kind of
information I was looking for. I don't see the need to rebuild the
table from ground zero, so it sounds as though update queries are the
way to go. Do you have a link to Allen Browne's discussion of nulls
and empty sets?

Have a great day. I'm sure you'll be seeing my name a lot as I dive
into this stuff. Can't wait!

Mike

Apr 18 '06 #4

P: n/a
Albert,

Thanks for the help. It really cleared things up. The only queries
I've ever done have been select queries. I couldn't visualize how you
could do much with such a query. It sounds as though Update queries
are what I really need to be thinking about. As for your question
about how I did things before, here is a snipet of SAS code that is
used to assign SEASON OF HARVEST to each deer:

DATA TEMP.L;

LENGTH SEASON $ 40;

SET TEMP.K;

IF CS IN (9999,999)

THEN SEASON='NASA';

ELSE IF CS IN (9998,998)

THEN SEASON='RAVENNA';

ELSE IF CS=930

THEN SEASON='SALT FORK';

* ELSE IF CS=992

THEN SEASON='SHAWNEE';

ELSE IF CS=991

THEN SEASON='WILDCAT';

ELSE IF (CS=7307 AND DATE IN
('24OCT05'D,'25OCT05'D,'26OCT05'D,'27OCT05'D,

'28OCT05'D,'29OCT05'D) AND
WEAPON=4)
THEN SEASON='SHAWNEE';

ELSE IF DATE IN ('19NOV05'D,'20NOV05'D) AND WEAPON IN (1,4,5)

THEN SEASON='YOUTH';

ELSE IF DATE IN
('28NOV05'd,'29NOV05'd,'30NOV05'd,'01DEC05'd,'02DE C05'd,

'03DEC05'd,'04DEC05'd)

THEN SEASON='GUN';

ELSE IF DATE IN ('27DEC05'D,'28DEC05'D,'29DEC05'D,'30DEC05'D)

THEN SEASON='SWML';

RUN;

****
The code below ADDRESSES DATE ISSUES - CONVERTS DATE VARIABLES TO SAS
DATE VALUE AND ASSIGNS A
DATE TO RECORDS WITH MISSING
VALUES

****
DATA TEMP.A;

LENGTH ORIGINALDATE $ 30;

SET SASUSER.QURY5798;

IF ('01OCT05'd LE DATE LE '31JAN06'd) THEN DO;

ORIGINALDATE='GOOD';

END;

ELSE DO;

ORIGINALDATE='BAD';

END;

RUN;

DATA TEMP.B;

RETAIN NEWDATE_RETAINED;

SET;

IF ORIGINALDATE='GOOD' THEN DO;

NEWDATE_RETAINED=DATE;

END;

DATA TEMP.C;

SET;

DROP DATE;

DATA TEMP.D;

SET;

RENAME NEWDATE_RETAINED=DATE;

RUN;
There are about 1/2 dozen or so of these sorts of operations that are
used to clean the data up. I have another section that makes sure that
all the county names are valid and so on.

Would these sorts of things be difficult to do with update queries?

Mike

Apr 18 '06 #5

P: n/a
Takeadoe wrote:
Albert,

Thanks for the help. It really cleared things up. The only queries
I've ever done have been select queries. I couldn't visualize how you
could do much with such a query. It sounds as though Update queries
are what I really need to be thinking about. As for your question
about how I did things before, here is a snipet of SAS code that is
used to assign SEASON OF HARVEST to each deer:

DATA TEMP.L;

LENGTH SEASON $ 40;

SET TEMP.K;

IF CS IN (9999,999)

THEN SEASON='NASA';

ELSE IF CS IN (9998,998)

THEN SEASON='RAVENNA';

ELSE IF CS=930

THEN SEASON='SALT FORK';

* ELSE IF CS=992

THEN SEASON='SHAWNEE';

ELSE IF CS=991

THEN SEASON='WILDCAT';

ELSE IF (CS=7307 AND DATE IN
('24OCT05'D,'25OCT05'D,'26OCT05'D,'27OCT05'D,

'28OCT05'D,'29OCT05'D) AND
WEAPON=4)
THEN SEASON='SHAWNEE';

ELSE IF DATE IN ('19NOV05'D,'20NOV05'D) AND WEAPON IN (1,4,5)

THEN SEASON='YOUTH';

ELSE IF DATE IN
('28NOV05'd,'29NOV05'd,'30NOV05'd,'01DEC05'd,'02DE C05'd,

'03DEC05'd,'04DEC05'd)

THEN SEASON='GUN';

ELSE IF DATE IN ('27DEC05'D,'28DEC05'D,'29DEC05'D,'30DEC05'D)

THEN SEASON='SWML';

RUN;

****
The code below ADDRESSES DATE ISSUES - CONVERTS DATE VARIABLES TO SAS
DATE VALUE AND ASSIGNS A
DATE TO RECORDS WITH MISSING
VALUES

****
DATA TEMP.A;

LENGTH ORIGINALDATE $ 30;

SET SASUSER.QURY5798;

IF ('01OCT05'd LE DATE LE '31JAN06'd) THEN DO;

ORIGINALDATE='GOOD';

END;

ELSE DO;

ORIGINALDATE='BAD';

END;

RUN;

DATA TEMP.B;

RETAIN NEWDATE_RETAINED;

SET;

IF ORIGINALDATE='GOOD' THEN DO;

NEWDATE_RETAINED=DATE;

END;

DATA TEMP.C;

SET;

DROP DATE;

DATA TEMP.D;

SET;

RENAME NEWDATE_RETAINED=DATE;

RUN;
There are about 1/2 dozen or so of these sorts of operations that are
used to clean the data up. I have another section that makes sure that
all the county names are valid and so on.

Would these sorts of things be difficult to do with update queries?

Mike

You can do all sorts of stuff with queries; append, update, delete,
total/sum, etc. Once you have your records in a table, prior to running
any "action" queries, make a copy of the table.

I notice the dates are in a format that is not a "date" for access, its
a string. You might need to fix that...or create a column that makes a
valid "date". See DateSerial in help. The reason is I saw a criteria
for dates in a range.

You mentioned you may be posting many questions as you dig into Access.
One of the best sources of information is GoogleGroups since many of
us have been there/done that. See
http://groups.google.com/advanced_search?hl=en. In the group box I
usually enter *Access* and then the keywords I'm searching for to only
return answers from those using MSAccess.

Another place is http://www.mvps.org/access for code examples.

In queries, here's a basic tip: Surround strings with " or '. If
looking for the word Test, in the criteria row enter "Test". If Date,
surround by #. Ex: Between #1/1/2006# And #4/15/2006#. And if a
number, nothing but the number.

I think you'll find the querybuilder intuitive...but you don't want to
hose the table when trying out something new so that's why I mention
always having a master you can return to if the need to start over
occurs. BTW...if you create an action query like an Update, you can
press the View icon and it basically acts like a Select query. If you
press the ! exclamation button, it will run the query.


Apr 18 '06 #6

P: n/a
Salad - Hey thanks again for the leads. The MVPS web site looks like
it will be pretty handy! I'm pretty stoked about all of this stuff! I
can't wait to do my first query. Since I've already cleaned up this
year data using SAS, yet still have the original "uncleaned" data in an
Access table, I'm going to try and pick away at this 1 step at a time
and see if I can't get the Access table to look like the final SAS data
set. You've probably already seen my question about the replacing
missing and invalid date values. I appreciate the leads and the help.
I will do my best to learn as much by example and limit my questions.
The last thing I want is for people to think they're doing my work.

Take care,

Mike

Apr 18 '06 #7

P: n/a
Thank you.

Mike

Apr 24 '06 #9

This discussion thread is closed

Replies have been disabled for this discussion.