By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,269 Members | 1,561 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,269 IT Pros & Developers. It's quick & easy.

regrouping data in large dataset

P: n/a
I have a very large data set where I have to do considerable regrouping
of values and would like to know how you would suggest doing the
following regrouping?

The structure of the data set is the following (heading fields in first
row and columns separated by commas):

country, year, company, department, expenses (dollars)
USA, 2002, Pepsi, marketing,1004
France, 2001, Pepsi, marketing,300
Spain, 1999, Pepsi, human resources, 25003
Belgium, 2001, Amex, selling Office, 2400
France, 2001, McDonalds, expense Office, 334000
Japan, 2002, Amex, financing, 2400
France, 2003, Amex, selling Office, 23
Korea, 2001, Amex, admin. Office, 333
Korea, 2001, Amex, management, 22
USA, 2002, McDonalds, brands, 3340

Basically, all the data must be regrouped according to Pepsi`s
departmental categories only which are:

sales
marketing
finance
human Resources
management

The following are the equivalent names of the departments for Pepsi`s
departments in the other companies (columns separated by "/"). Some of
the Pepsi departments are also the sum of two smaller departments in the
other companies:

Pepsi / McDonalds / Amex
Sales / Sales Department / Selling Office
Marketing / Brands+Market Office / Marketing(US)+Marketing(Europe)
Finance / Expense Office / Financing
Human Resources / HR / People Development + HR office
Management / Administration /Admin. Office + Management

So the above data set needs to be converted to:

country, year, company, department, expenses (dollars)
USA, 2002, Pepsi, marketing, 1004
France, 2001, Pepsi, marketing, 300
Spain, 1999, Pepsi, human resources, 25003
Belgium, 2001, Amex, sales,2400
France, 2001,McDonalds, finance,334000
Japan, 2002,Amex, finance, 2400
France, 2003, Amex, sales, 23
Korea, 2001, Amex, management, 333
Korea, 2001, Amex,management,22
USA, 2002, McDonalds, marketing,3340

Thank you for your suggestions.

Best regards,

Dan Graziano


*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Nov 13 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Are you importing this data from a text file? If so you could probably
use VBA & a SELECT statement to "normalize" the department names:

Select Case strCompany
Case "Amex"
Select Case strDepartment
Case "Selling Office"
strDepartment = "sales"
Case "Financing"
strDepartment = "finance"
.... etc. ...

Or (my favorite), you could create a "translation" table that holds the
info for each company/department & it's translation to Pepsi's
department names:

CREATE TABLE DeptTranslations (
CompanyID Long NOT NULL , -- link to Companies table
DeptName TEXT(30) NOT NULL , -- Company's name for dept.
PepsiDeptName TEXT(30) NOT NULL , -- Pepsi's name for dept.
CONSTRAINT PK_DeptTranslations PRIMARY KEY (CompanyID, DeptName)
)

The primary key allows only one DeptName per Company ID.

Load the DeptTranslations table w/ appropriate data. E.g.:

Company IDs: McDonalds = 2; Amex = 3

CompanyID DeptName PepsiDeptName
- --------- -------------------- ----------------
2 Sales Department Sales
2 Brands Marketing
2 Market Office Marketing
3 People Development Human Resources
3 HR Office Human Resources
2 Administration Management
.... etc. ...

Import the raw data (that which needs to be translated) to a work table.
Run a query like this to translate the dept name to a Pepsi dept. name:

SELECT C.CompanyName, DT.PepsiDeptName
FROM Work_table AS WT LEFT JOIN (DeptTranslations AS DT INNER JOIN
Companies As C ON DT.CompanyID = C.Companies) ON WT.Company =
C.CompanyName

Include this query in your other queries that need to translate the
dept. names link by the C.CompanyName.

--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)

-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQA/AwUBQRKLSoechKqOuFEgEQIjawCdEtOppjOaDnd5AgT2PbKk/GdrM4oAoICZ
+BWH5xy8OCy47/IDtT2ukx0x
=r5Rh
-----END PGP SIGNATURE-----
dan graziano wrote:
I have a very large data set where I have to do considerable regrouping
of values and would like to know how you would suggest doing the
following regrouping?

The structure of the data set is the following (heading fields in first
row and columns separated by commas):

country, year, company, department, expenses (dollars)
USA, 2002, Pepsi, marketing,1004
France, 2001, Pepsi, marketing,300
Spain, 1999, Pepsi, human resources, 25003
Belgium, 2001, Amex, selling Office, 2400
France, 2001, McDonalds, expense Office, 334000
Japan, 2002, Amex, financing, 2400
France, 2003, Amex, selling Office, 23
Korea, 2001, Amex, admin. Office, 333
Korea, 2001, Amex, management, 22
USA, 2002, McDonalds, brands, 3340

Basically, all the data must be regrouped according to Pepsi`s
departmental categories only which are:

sales
marketing
finance
human Resources
management

The following are the equivalent names of the departments for Pepsi`s
departments in the other companies (columns separated by "/"). Some of
the Pepsi departments are also the sum of two smaller departments in the
other companies:

Pepsi / McDonalds / Amex
Sales / Sales Department / Selling Office
Marketing / Brands+Market Office / Marketing(US)+Marketing(Europe)
Finance / Expense Office / Financing
Human Resources / HR / People Development + HR office
Management / Administration /Admin. Office + Management

So the above data set needs to be converted to:

country, year, company, department, expenses (dollars)
USA, 2002, Pepsi, marketing, 1004
France, 2001, Pepsi, marketing, 300
Spain, 1999, Pepsi, human resources, 25003
Belgium, 2001, Amex, sales,2400
France, 2001,McDonalds, finance,334000
Japan, 2002,Amex, finance, 2400
France, 2003, Amex, sales, 23
Korea, 2001, Amex, management, 333
Korea, 2001, Amex,management,22
USA, 2002, McDonalds, marketing,3340


Nov 13 '05 #2

P: n/a

MGFoster,

Thanks for the suggestions on regrouping the data. Probably I should
have just put the data in columns rather than separating the columns by
commas in the sample dataset in the message, actually the data in the
large dataset are in columns in an excel spreadsheet (I had just
separated the columns by commas in the developersdex message posting as
it was easier than spacing the columns).

As the data are in an excel spreadsheet, I don't think I'd be able to do
any sql commands and I must admit I'm not familiar with how to do the
sql. Is there any way you'd suggest to do the regrouping if the data
are in an excel sheet?

Thank you, and looking forward to your suggestions.

Best regards,

Dan Graziano

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Nov 13 '05 #3

P: n/a
You can attach an Excel s/s data area to an Access file (linked table);
then you can use SQL (aka queries) to use the s/s data.

--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)

dan graziano wrote:
MGFoster,

Thanks for the suggestions on regrouping the data. Probably I should
have just put the data in columns rather than separating the columns by
commas in the sample dataset in the message, actually the data in the
large dataset are in columns in an excel spreadsheet (I had just
separated the columns by commas in the developersdex message posting as
it was easier than spacing the columns).

As the data are in an excel spreadsheet, I don't think I'd be able to do
any sql commands and I must admit I'm not familiar with how to do the
sql. Is there any way you'd suggest to do the regrouping if the data
are in an excel sheet?


Nov 13 '05 #4

P: n/a
dan graziano <d3********@hotmail.com> wrote in message news:<41**********************@news.newsgroups.ws> ...
I have a very large data set where I have to do considerable regrouping
of values and would like to know how you would suggest doing the
following regrouping?


The following table will help you create your query:

tblDeptEquivalences
Dept EquivalentDept
sales sales
sales Sales Department
sales Selling Office
marketing marketing
marketing Brands
marketing Market Office
marketing Marketing(US)
marketing Marketing(Europe)
finance finance
finance Expense Office
finance Financing
human Resources human resources
human Resoureces HR
human Resources People Development
human Resources HR office
management management
management admin. Office

SELECT country, year, company, (SELECT Dept FROM tblDeptEquivalences
AS B WHERE B.EquivalentDept = A.Dept) AS department, [expenses
(dollars)] FROM tblVeryLargeDataset AS A;

Use as many aliases as you need in tblDeptEquivalences.

James A. Fortune
Nov 13 '05 #5

P: n/a
dan graziano <d3********@hotmail.com> wrote in message news:<41**********************@news.newsgroups.ws> ...
I have a very large data set where I have to do considerable regrouping
of values and would like to know how you would suggest doing the
following regrouping?


The following table will help you create your query:

tblDeptEquivalences
Dept EquivalentDept
sales sales
sales Sales Department
sales Selling Office
marketing marketing
marketing Brands
marketing Market Office
marketing Marketing(US)
marketing Marketing(Europe)
finance finance
finance Expense Office
finance Financing
human Resources human resources
human Resoureces HR
human Resources People Development
human Resources HR office
management management
management admin. Office

SELECT country, year, company, (SELECT Dept FROM tblDeptEquivalences
AS B WHERE B.EquivalentDept = A.Dept) AS department, [expenses
(dollars)] FROM tblVeryLargeDataset AS A;

Use as many aliases as you need in tblDeptEquivalences.

James A. Fortune
Nov 13 '05 #6

P: n/a
MGFoster <me@privacy.com> wrote in message news:<gX*****************@newsread1.news.pas.earth link.net>...
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Using a LEFT JOIN like MGFoster did is better than what I posted since
it will show you if you missed any values in the translation table.
With mine, you need to run something like a SELECT DISTINCT query to
make sure you cover all the possible values to be translated.

James A. Fortune
Nov 13 '05 #7

P: n/a
Thank you to all for the suggestions on how to do the regrouping of
data. They seem like excellent suggestions.

The only thing is, I do not know how to use Access or Sql. I'm strongly
proficient in Excel, and I would love to know how to Access and Sql in
tandem with Excel. But unfortunately, for the moment, I wouldn't know
how to incorporate your suggestions.

I do have Access and the Sql software and would very much like to try
out the suggestions. If it might be possible to have step by step
instructions as to how to go about it as soon as I open Access, that
would be great?

I am very keen to learn Access and Sql and to be able to start applying
them to my work immediately. But the time schedule for my project
doesn't really afford me much time for learning and training. Any
suggestions on how to learn Access and sql quickly and on the fly while
working on this project would be much appreciated.

Or the key practical things to know in Access and Sql which would make a
big difference in working on large data sets in tandem with Excel would
be very helpful.

Thanks again, and looking forward to your suggestions.

Best regards,

Dan Graziano

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Nov 13 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.