473,396 Members | 1,840 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

What sort of loop?

I have a file (70,000 records) in which family members are identified by shared Household Numbers. One person (Head) in each household has an occupation. Let's say Farmer is assigned 1 and Fisherman is assigned 2.

Making an assumption that all members of each household are economically dependent on the Head I want to assign a common value to all. So, if the household number is 375 and the Head's value is 2 then I want all of the other members of the household to be given the value 2. Move on to household number 376 and value 1 etc.

I'm a novice at this. I'm guessing that the code would be some sort of for next loop which would take the assigned value of the Head in the set of records and do while the household number is the same, i.e. Apply the same value to all of the others in the household. Then loop around to take on the next household number?

Any guidance would be much appreciated.
Nov 29 '18 #1

✓ answered by twinnyfo

Ahhh--this does make things a bit more clear. It may be wise to try to identify (long term vision here) individual "people" in a separate table that may have several other "also known as" records in another table (tblAKA, for example).

Concerning the SQL posted by NeoPa, if you can get the SELECT to work, you should also be able to get the UPDATE to work. What NeoPa provided is the framework from which to start, to make sure you can gather the data you want. Then you either convert that query into an UPDATE query, or use the results of the SELECT query to create a separate UPDATE query (kind of redundant, but there might be uses for both methods).

Hope that hepps!

Your comments hepped me understand your quandary.

20 1224
NeoPa
32,556 Expert Mod 16PB
Your question makes sense, but it would help to know what information you currently have. I would assume you have the raw data of individuals in a table. For this to make sense you'd also need a field that specifies which Household Number they have.

Do you already have a separate table of occupations which links Farmer with 1 and Fisherman with 2? If not then you will need one. I assume each person record would also have a code to indicate its occupation.

In real terms - how do you determine which person should be considered the head of a household?

It would seem that the solution is more likely to be the SQL of an UPDATE query than anything complicated in VBA, but we need more information from you before we can advise reliably. Otherwise any advice we give is based on assumptions and that should be avoided where possible.
Nov 30 '18 #2
Yes the data is in a raw table and the occupation number foreign field will be related to a table containing Farmer, Fisherman, etc. The main table has a field called HouseholdNumber. There is also a field in the main table that numbers the status of the occupants, so the head of household is number 1.
I would like a means of looking at each set of records grouped under household number and applying the occupation number to all of them that currenly is only applied to the head.
The result would be that the number of people rather than households that are economically dependent on fishing or farming could be calculated.
Nov 30 '18 #3
NeoPa
32,556 Expert Mod 16PB
Thank you. That's the information we need in order to answer your question properly.

Having also given a slightly different repetition of your requirements though, it seems there may some confusion as to what you actually need. Let me explain.

If you simply want to report on this information then a SELECT query is all that's necessary. There's no need to update the underlying table data if that's what you want. Your original question was about updating the data. That can be done but isn't necessary based on your requirement as explained differently in your later post. For now I'll assume that this is what you require but if you need the data updated for any reason then that can also be achieved.

So, to describe the query in layman's terms :
You need the main table (I'll call it [tblPerson] for now.) in the query twice. Once to represent the person, and again to represent the head of the household for that person. From that you include all the data from [tblPerson] that you need but the occupation information only from the head of household. You've included no table names so you'll have to work out what I'm referring to and switch them for whatever names you've used.
Expand|Select|Wrap|Line Numbers
  1. SELECT   [tP].[HouseholdID]
  2.        , [tP].[PersonID]
  3.        , [tP].[...]
  4.        , [tHH].[OccupationID]
  5.        , [lO].[...]
  6. FROM     ([tblPerson] AS [tP]
  7.           INNER JOIN
  8.           [tblPerson] AS [tHH]
  9.   ON      [tP].[HouseholdID]=[tHH].[HouseholdID]
  10.  AND      [tP].[Status]=1)
  11.          INNER JOIN
  12.          [lupOccupation] AS [lO]
  13.   ON     [tHH].[OccupationID]=[lO].[OccupationID]
  14. ORDER BY [tP].[HouseholdID]
  15.        , [tP].[Status]
Where I've used [...] that's for adding in any relevant fields you want/need. The basic structure should give you the data you need to work from though.
Dec 2 '18 #4
Thank you for this. I think that I get it - but can you please modify the code to include UPDATE. I shall probably use the outcome along with other relevant fields auch as sex, age etc.
Dec 2 '18 #5
NeoPa
32,556 Expert Mod 16PB
We have a problem. This is not about you simply expressing what you need and expecting someone else simply to produce it for you. We should be working together.

Convince me.
Dec 4 '18 #6
Sorry. It isn't my wish to take advantage of your kindness. Given the limited size of my database I don't see any great benefit in choosing a temporary against a permanent solution - which is why I would prefer to use Update to populate a field.
The main table is not normalised - but for good reason. There are, in some instances, ten records that probably relate to the same person. The problem is that each record is slightly different in important aspects such as name spellings, date of birth etc. In fact the differences are the justification for having the table.
I want at some point to analyse the contents on various cmbinations such as age v sex, location v occupation, economic dependency v age etc.
Dec 4 '18 #7
twinnyfo
3,653 Expert Mod 2GB
Sorry to jump in here, but I am becoming more confused as this thread goes on....

Squiddley1957:
Given the limited size of my database...
70,000 records is a significant sized database, almost no matter how you slice it.

Squiddley1957:
The main table is not normalised - but for good reason. There are, in some instances, ten records that probably relate to the same person. The problem is that each record is slightly different in important aspects such as name spellings, date of birth etc.
There may be many valid reasons why one would not normalize a table--these should be few and far between. However, multiple records in a table such as this should never refer to the same person (@NeoPa, you may freely correct me if I am out of my mind). What makes your case more confusing is that you say that the same person may have different spellings of their name and different dates of birth? This sounds like a terrible mess--my sincerest condolences if you inherited this data! But, a stern warning to fix the data first--especially before you start trying to analyze the data and determine corelations and co-dependencies based upon that data. I simply think that you will not have anything that resembles reliable analysis if the data is this jumbled.

This is not meant as a back-hand across the face, but as an advisory to be very careful with this data set.

Fix the data first, then start looking at how the data works together.

Hope this hepps!
Dec 4 '18 #8
Hello Twinnyfo

Thanks for your post. The table comprises mainly of historical data drawn in from different sources - this last point is important. An example: James Trip, b1855/James T Trip b1856/James Tripp b1856/James Tripcony b1855/James Thomas Tripconey b1860. They almost certainly refer to the same person but there is no efficient way to prove it. Even if there were it is important to retain the spellings from the different sources. It's not a terrible mess - it's a reflection of the way that the data was first gleaned.
Of course I have pulled out as much dependent data as possible but the fact remains that the table must reflect the historical reality and not the convenience of the database.
Dec 4 '18 #9
Further..
What this means is that someone who has viewed the source data can find the record on this system. That record includes a unique id added by me. That unique id then permits the viewer to see all of the other records that may refer to the subject - and all of the spelling variants. They can make their mind up if the records refer to the same person.
Dec 4 '18 #10
twinnyfo
3,653 Expert Mod 2GB
Ahhh--this does make things a bit more clear. It may be wise to try to identify (long term vision here) individual "people" in a separate table that may have several other "also known as" records in another table (tblAKA, for example).

Concerning the SQL posted by NeoPa, if you can get the SELECT to work, you should also be able to get the UPDATE to work. What NeoPa provided is the framework from which to start, to make sure you can gather the data you want. Then you either convert that query into an UPDATE query, or use the results of the SELECT query to create a separate UPDATE query (kind of redundant, but there might be uses for both methods).

Hope that hepps!

Your comments hepped me understand your quandary.
Dec 4 '18 #11
twinnyfo
3,653 Expert Mod 2GB
We crossed posts--looks like my first paragraph is mostly covered!

:-)
Dec 4 '18 #12
I had thought of an aka table but on deep reflection I couldn't see that it would save anything. I would still need the individual names in the main table along with the varying dates of birth etc. I also considered using a soundex approach but, again, that does nothing for the dates of birth. Thanks for your help.
Dec 4 '18 #13
NeoPa
32,556 Expert Mod 16PB
I can see that you usage of a database for this data is an outlying case. Very difficult to process this sort of data in any meaningful way if, as it seems, it's mainly required as storage for historical data. So, I'm duly convinced by your recent posts that you're pulling along with us on this one in a task that's off the beaten path :-)

Amusingly, when Twinny asked if his earlier statement was correct, I wondered if I should respond by saying "Well, actually, there are outliers that might not correspond to those rules exactly.", but you jumped in anyway with your explanation which is a perfect example of that.

As he's already said, the SQL for an UPDATE query can be derived from the SELECT example posted. Have a go at that yourself and, if you struggle, post back here with the details of your SQL and we can help you further.
Dec 4 '18 #14
NeoPa
32,556 Expert Mod 16PB
Squiddley1957:
I had thought of an aka table but on deep reflection I couldn't see that it would save anything.
It wouldn't save anything as such, but it may add value in that it could be used for such things as determining the number of items in the set. This assumes it would be straightforward to create of course. I suspect that may turn out to be an issue. Remember though, that we tend to throw ideas up that are conceptual rather than necessarily practical in your particular circumstances. Mainly because we cannot possibly expect to understand them as well as you can. You consider the ideas in conjunction with what you know of those circumstances and accept/reject as seems appropriate to you.
Dec 4 '18 #15
Thanks for all your help
Dec 4 '18 #16
Rabbit
12,516 Expert Mod 8TB
I also would prefer to just use a select query and not update the original data. The query itself should run relatively quickly and it's never a bad idea to keep the original data.

As far as the fuzzy name matching goes, we have various articles on this forum that go over different methods of approaching that. I've found soundex to be wildly inaccurate.

Double Metaphone
Levenshtein
N-grams

As far as the birthdates go, you can calculate the difference between the two. Combine that with a name matching score and you can come up with a confidence score of how closely 2 entities match.
Dec 4 '18 #17
Thanks Rabbit - I've had a quick look at the first article - vely interlesting. With regard to my database, as I explained, someone who fastened on a name from the original source would search on that. That is why it is important that all versions of the name are retained. I understand that, in principle, it's better to use a temporary field but I don't think that much is gained in time terms - not on a database of this size. Perhaps I will try both approaches and see what happens.
Dec 4 '18 #18
Rabbit
12,516 Expert Mod 8TB
someone who fastened on a name from the original source would search on that. That is why it is important that all versions of the name are retained.
No one is saying you shouldn't keep the different versions of the name found in the raw data. In fact, we're saying the same thing you're saying, that you should keep the original data.

I understand that, in principle, it's better to use a temporary field but I don't think that much is gained in time terms - not on a database of this size.
We're not saying using a "temporary" field (calculated field is what we are referring to) will gain you time. The opposite, it's a small time cost. We're saying it maintains the fidelity of the original data. Which is what you're saying you want to keep.
Dec 4 '18 #19
Hello Rabbit
I'm conscious that this might sound ungrateful - Far from it. I get it that one wouldn't want to affect the integrity of data by removing or altering some part - but I don't see the harm in adding to it; after all data rarely comes ready set with a unique id.
Dec 4 '18 #20
Rabbit
12,516 Expert Mod 8TB
Adding to it is fine, no one is saying you can't add to the data. It's your choice whether you add to it using a query or add to it by inserting related entities into an "also known as" table like TwinnyFo suggests. All we are saying is you should avoid updating the original data if possible. Augmenting the original data is perfectly fine and expected, we are not saying you shouldn't do that. It's just that your earlier posts say you want to update the original data from one value to another value.
Dec 4 '18 #21

Sign in to post your reply or Sign up for a free account.

Similar topics

9
by: JasBascom | last post by:
say i had 97456 and 23456 is there already a sort function to check which one begins with the smaller number rearrange it. in this case the bottom number should clearly be rearranged to the...
99
by: Shi Mu | last post by:
Got confused by the following code: >>> a >>> b >>> c {1: , ], 2: ]} >>> c.append(b.sort()) >>> c {1: , ], 2: , None]}
1
by: Eric Lindsay | last post by:
Most of my old navigation links take the form <p> <a href ...> ... </a> <br> <a href ...> ... </a> <br> etc. I would like to change the navigation to lists. However I can not decide whether...
2
by: Bob | last post by:
Hi, I have a Windows app project that generates some data. This data now has to be sent to a web server as a request and hopefully it will get a response. The request / response format has been...
4
by: Byte | last post by:
The following code will not work for me: x = 1 while x == 1: print 'hello' x = input('What is x now?: ') while x == 2: print 'hello again'
5
by: Karyn Williams | last post by:
I am new to Pyton. I am trying to modify and understand a script someone else wrote. I am trying to make sense of the following code snippet. I know line 7 would be best coded with regex. I first...
3
by: biddy2 | last post by:
Hi all I am trying to wirte a program to sort an arry of size 10 into ascending order. I also need to ensure that after the second pass it only makes eight comparisons , seven compairisions on the...
6
by: Protoman | last post by:
Hi! I'm currently a HS sophomore who has passed the California High School Proficency Exam, and will be starting Cerritos College soon --I'll get my AA in Mathematics/Economics; I'll then...
4
by: Kapteyn's Star | last post by:
Hi all, Can anyone identify this sort code? why is it very slow compard to qsort()? and it can be improved? Thanks. int *next_ascending(int *arr, size_t narr, int val) { size_t ctr; int...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.