423,846 Members | 1,959 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 423,846 IT Pros & Developers. It's quick & easy.

What sort of loop?

P: 21
I have a file (70,000 records) in which family members are identified by shared Household Numbers. One person (Head) in each household has an occupation. Let's say Farmer is assigned 1 and Fisherman is assigned 2.

Making an assumption that all members of each household are economically dependent on the Head I want to assign a common value to all. So, if the household number is 375 and the Head's value is 2 then I want all of the other members of the household to be given the value 2. Move on to household number 376 and value 1 etc.

I'm a novice at this. I'm guessing that the code would be some sort of for next loop which would take the assigned value of the Head in the set of records and do while the household number is the same, i.e. Apply the same value to all of the others in the household. Then loop around to take on the next household number?

Any guidance would be much appreciated.
2 Weeks Ago #1

✓ answered by twinnyfo

Ahhh--this does make things a bit more clear. It may be wise to try to identify (long term vision here) individual "people" in a separate table that may have several other "also known as" records in another table (tblAKA, for example).

Concerning the SQL posted by NeoPa, if you can get the SELECT to work, you should also be able to get the UPDATE to work. What NeoPa provided is the framework from which to start, to make sure you can gather the data you want. Then you either convert that query into an UPDATE query, or use the results of the SELECT query to create a separate UPDATE query (kind of redundant, but there might be uses for both methods).

Hope that hepps!

Your comments hepped me understand your quandary.

Share this Question
Share on Google+
20 Replies


NeoPa
Expert Mod 15k+
P: 31,121
Your question makes sense, but it would help to know what information you currently have. I would assume you have the raw data of individuals in a table. For this to make sense you'd also need a field that specifies which Household Number they have.

Do you already have a separate table of occupations which links Farmer with 1 and Fisherman with 2? If not then you will need one. I assume each person record would also have a code to indicate its occupation.

In real terms - how do you determine which person should be considered the head of a household?

It would seem that the solution is more likely to be the SQL of an UPDATE query than anything complicated in VBA, but we need more information from you before we can advise reliably. Otherwise any advice we give is based on assumptions and that should be avoided where possible.
2 Weeks Ago #2

P: 21
Yes the data is in a raw table and the occupation number foreign field will be related to a table containing Farmer, Fisherman, etc. The main table has a field called HouseholdNumber. There is also a field in the main table that numbers the status of the occupants, so the head of household is number 1.
I would like a means of looking at each set of records grouped under household number and applying the occupation number to all of them that currenly is only applied to the head.
The result would be that the number of people rather than households that are economically dependent on fishing or farming could be calculated.
1 Week Ago #3

NeoPa
Expert Mod 15k+
P: 31,121
Thank you. That's the information we need in order to answer your question properly.

Having also given a slightly different repetition of your requirements though, it seems there may some confusion as to what you actually need. Let me explain.

If you simply want to report on this information then a SELECT query is all that's necessary. There's no need to update the underlying table data if that's what you want. Your original question was about updating the data. That can be done but isn't necessary based on your requirement as explained differently in your later post. For now I'll assume that this is what you require but if you need the data updated for any reason then that can also be achieved.

So, to describe the query in layman's terms :
You need the main table (I'll call it [tblPerson] for now.) in the query twice. Once to represent the person, and again to represent the head of the household for that person. From that you include all the data from [tblPerson] that you need but the occupation information only from the head of household. You've included no table names so you'll have to work out what I'm referring to and switch them for whatever names you've used.
Expand|Select|Wrap|Line Numbers
  1. SELECT   [tP].[HouseholdID]
  2.        , [tP].[PersonID]
  3.        , [tP].[...]
  4.        , [tHH].[OccupationID]
  5.        , [lO].[...]
  6. FROM     ([tblPerson] AS [tP]
  7.           INNER JOIN
  8.           [tblPerson] AS [tHH]
  9.   ON      [tP].[HouseholdID]=[tHH].[HouseholdID]
  10.  AND      [tP].[Status]=1)
  11.          INNER JOIN
  12.          [lupOccupation] AS [lO]
  13.   ON     [tHH].[OccupationID]=[lO].[OccupationID]
  14. ORDER BY [tP].[HouseholdID]
  15.        , [tP].[Status]
Where I've used [...] that's for adding in any relevant fields you want/need. The basic structure should give you the data you need to work from though.
1 Week Ago #4

P: 21
Thank you for this. I think that I get it - but can you please modify the code to include UPDATE. I shall probably use the outcome along with other relevant fields auch as sex, age etc.
1 Week Ago #5

NeoPa
Expert Mod 15k+
P: 31,121
We have a problem. This is not about you simply expressing what you need and expecting someone else simply to produce it for you. We should be working together.

Convince me.
1 Week Ago #6

P: 21
Sorry. It isn't my wish to take advantage of your kindness. Given the limited size of my database I don't see any great benefit in choosing a temporary against a permanent solution - which is why I would prefer to use Update to populate a field.
The main table is not normalised - but for good reason. There are, in some instances, ten records that probably relate to the same person. The problem is that each record is slightly different in important aspects such as name spellings, date of birth etc. In fact the differences are the justification for having the table.
I want at some point to analyse the contents on various cmbinations such as age v sex, location v occupation, economic dependency v age etc.
1 Week Ago #7

twinnyfo
Expert Mod 2.5K+
P: 2,703
Sorry to jump in here, but I am becoming more confused as this thread goes on....

Squiddley1957:
Given the limited size of my database...
70,000 records is a significant sized database, almost no matter how you slice it.

Squiddley1957:
The main table is not normalised - but for good reason. There are, in some instances, ten records that probably relate to the same person. The problem is that each record is slightly different in important aspects such as name spellings, date of birth etc.
There may be many valid reasons why one would not normalize a table--these should be few and far between. However, multiple records in a table such as this should never refer to the same person (@NeoPa, you may freely correct me if I am out of my mind). What makes your case more confusing is that you say that the same person may have different spellings of their name and different dates of birth? This sounds like a terrible mess--my sincerest condolences if you inherited this data! But, a stern warning to fix the data first--especially before you start trying to analyze the data and determine corelations and co-dependencies based upon that data. I simply think that you will not have anything that resembles reliable analysis if the data is this jumbled.

This is not meant as a back-hand across the face, but as an advisory to be very careful with this data set.

Fix the data first, then start looking at how the data works together.

Hope this hepps!
1 Week Ago #8

P: 21
Hello Twinnyfo

Thanks for your post. The table comprises mainly of historical data drawn in from different sources - this last point is important. An example: James Trip, b1855/James T Trip b1856/James Tripp b1856/James Tripcony b1855/James Thomas Tripconey b1860. They almost certainly refer to the same person but there is no efficient way to prove it. Even if there were it is important to retain the spellings from the different sources. It's not a terrible mess - it's a reflection of the way that the data was first gleaned.
Of course I have pulled out as much dependent data as possible but the fact remains that the table must reflect the historical reality and not the convenience of the database.
1 Week Ago #9

P: 21
Further..
What this means is that someone who has viewed the source data can find the record on this system. That record includes a unique id added by me. That unique id then permits the viewer to see all of the other records that may refer to the subject - and all of the spelling variants. They can make their mind up if the records refer to the same person.
1 Week Ago #10

twinnyfo
Expert Mod 2.5K+
P: 2,703
Ahhh--this does make things a bit more clear. It may be wise to try to identify (long term vision here) individual "people" in a separate table that may have several other "also known as" records in another table (tblAKA, for example).

Concerning the SQL posted by NeoPa, if you can get the SELECT to work, you should also be able to get the UPDATE to work. What NeoPa provided is the framework from which to start, to make sure you can gather the data you want. Then you either convert that query into an UPDATE query, or use the results of the SELECT query to create a separate UPDATE query (kind of redundant, but there might be uses for both methods).

Hope that hepps!

Your comments hepped me understand your quandary.
1 Week Ago #11

twinnyfo
Expert Mod 2.5K+
P: 2,703
We crossed posts--looks like my first paragraph is mostly covered!

:-)
1 Week Ago #12

P: 21
I had thought of an aka table but on deep reflection I couldn't see that it would save anything. I would still need the individual names in the main table along with the varying dates of birth etc. I also considered using a soundex approach but, again, that does nothing for the dates of birth. Thanks for your help.
1 Week Ago #13

NeoPa
Expert Mod 15k+
P: 31,121
I can see that you usage of a database for this data is an outlying case. Very difficult to process this sort of data in any meaningful way if, as it seems, it's mainly required as storage for historical data. So, I'm duly convinced by your recent posts that you're pulling along with us on this one in a task that's off the beaten path :-)

Amusingly, when Twinny asked if his earlier statement was correct, I wondered if I should respond by saying "Well, actually, there are outliers that might not correspond to those rules exactly.", but you jumped in anyway with your explanation which is a perfect example of that.

As he's already said, the SQL for an UPDATE query can be derived from the SELECT example posted. Have a go at that yourself and, if you struggle, post back here with the details of your SQL and we can help you further.
1 Week Ago #14

NeoPa
Expert Mod 15k+
P: 31,121
Squiddley1957:
I had thought of an aka table but on deep reflection I couldn't see that it would save anything.
It wouldn't save anything as such, but it may add value in that it could be used for such things as determining the number of items in the set. This assumes it would be straightforward to create of course. I suspect that may turn out to be an issue. Remember though, that we tend to throw ideas up that are conceptual rather than necessarily practical in your particular circumstances. Mainly because we cannot possibly expect to understand them as well as you can. You consider the ideas in conjunction with what you know of those circumstances and accept/reject as seems appropriate to you.
1 Week Ago #15

P: 21
Thanks for all your help
1 Week Ago #16

Rabbit
Expert Mod 10K+
P: 12,279
I also would prefer to just use a select query and not update the original data. The query itself should run relatively quickly and it's never a bad idea to keep the original data.

As far as the fuzzy name matching goes, we have various articles on this forum that go over different methods of approaching that. I've found soundex to be wildly inaccurate.

Double Metaphone
Levenshtein
N-grams

As far as the birthdates go, you can calculate the difference between the two. Combine that with a name matching score and you can come up with a confidence score of how closely 2 entities match.
1 Week Ago #17

P: 21
Thanks Rabbit - I've had a quick look at the first article - vely interlesting. With regard to my database, as I explained, someone who fastened on a name from the original source would search on that. That is why it is important that all versions of the name are retained. I understand that, in principle, it's better to use a temporary field but I don't think that much is gained in time terms - not on a database of this size. Perhaps I will try both approaches and see what happens.
1 Week Ago #18

Rabbit
Expert Mod 10K+
P: 12,279
someone who fastened on a name from the original source would search on that. That is why it is important that all versions of the name are retained.
No one is saying you shouldn't keep the different versions of the name found in the raw data. In fact, we're saying the same thing you're saying, that you should keep the original data.

I understand that, in principle, it's better to use a temporary field but I don't think that much is gained in time terms - not on a database of this size.
We're not saying using a "temporary" field (calculated field is what we are referring to) will gain you time. The opposite, it's a small time cost. We're saying it maintains the fidelity of the original data. Which is what you're saying you want to keep.
1 Week Ago #19

P: 21
Hello Rabbit
I'm conscious that this might sound ungrateful - Far from it. I get it that one wouldn't want to affect the integrity of data by removing or altering some part - but I don't see the harm in adding to it; after all data rarely comes ready set with a unique id.
1 Week Ago #20

Rabbit
Expert Mod 10K+
P: 12,279
Adding to it is fine, no one is saying you can't add to the data. It's your choice whether you add to it using a query or add to it by inserting related entities into an "also known as" table like TwinnyFo suggests. All we are saying is you should avoid updating the original data if possible. Augmenting the original data is perfectly fine and expected, we are not saying you shouldn't do that. It's just that your earlier posts say you want to update the original data from one value to another value.
1 Week Ago #21

Post your reply

Sign in to post your reply or Sign up for a free account.