473,394 Members | 1,759 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

regrouping data in large dataset

I have a very large data set where I have to do considerable regrouping
of values and would like to know how you would suggest doing the
following regrouping?

The structure of the data set is the following (heading fields in first
row and columns separated by commas):

country, year, company, department, expenses (dollars)
USA, 2002, Pepsi, marketing,1004
France, 2001, Pepsi, marketing,300
Spain, 1999, Pepsi, human resources, 25003
Belgium, 2001, Amex, selling Office, 2400
France, 2001, McDonalds, expense Office, 334000
Japan, 2002, Amex, financing, 2400
France, 2003, Amex, selling Office, 23
Korea, 2001, Amex, admin. Office, 333
Korea, 2001, Amex, management, 22
USA, 2002, McDonalds, brands, 3340

Basically, all the data must be regrouped according to Pepsi`s
departmental categories only which are:

sales
marketing
finance
human Resources
management

The following are the equivalent names of the departments for Pepsi`s
departments in the other companies (columns separated by "/"). Some of
the Pepsi departments are also the sum of two smaller departments in the
other companies:

Pepsi / McDonalds / Amex
Sales / Sales Department / Selling Office
Marketing / Brands+Market Office / Marketing(US)+Marketing(Europe)
Finance / Expense Office / Financing
Human Resources / HR / People Development + HR office
Management / Administration /Admin. Office + Management

So the above data set needs to be converted to:

country, year, company, department, expenses (dollars)
USA, 2002, Pepsi, marketing, 1004
France, 2001, Pepsi, marketing, 300
Spain, 1999, Pepsi, human resources, 25003
Belgium, 2001, Amex, sales,2400
France, 2001,McDonalds, finance,334000
Japan, 2002,Amex, finance, 2400
France, 2003, Amex, sales, 23
Korea, 2001, Amex, management, 333
Korea, 2001, Amex,management,22
USA, 2002, McDonalds, marketing,3340

Thank you for your suggestions.

Best regards,

Dan Graziano


*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Nov 13 '05 #1
7 2132
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Are you importing this data from a text file? If so you could probably
use VBA & a SELECT statement to "normalize" the department names:

Select Case strCompany
Case "Amex"
Select Case strDepartment
Case "Selling Office"
strDepartment = "sales"
Case "Financing"
strDepartment = "finance"
.... etc. ...

Or (my favorite), you could create a "translation" table that holds the
info for each company/department & it's translation to Pepsi's
department names:

CREATE TABLE DeptTranslations (
CompanyID Long NOT NULL , -- link to Companies table
DeptName TEXT(30) NOT NULL , -- Company's name for dept.
PepsiDeptName TEXT(30) NOT NULL , -- Pepsi's name for dept.
CONSTRAINT PK_DeptTranslations PRIMARY KEY (CompanyID, DeptName)
)

The primary key allows only one DeptName per Company ID.

Load the DeptTranslations table w/ appropriate data. E.g.:

Company IDs: McDonalds = 2; Amex = 3

CompanyID DeptName PepsiDeptName
- --------- -------------------- ----------------
2 Sales Department Sales
2 Brands Marketing
2 Market Office Marketing
3 People Development Human Resources
3 HR Office Human Resources
2 Administration Management
.... etc. ...

Import the raw data (that which needs to be translated) to a work table.
Run a query like this to translate the dept name to a Pepsi dept. name:

SELECT C.CompanyName, DT.PepsiDeptName
FROM Work_table AS WT LEFT JOIN (DeptTranslations AS DT INNER JOIN
Companies As C ON DT.CompanyID = C.Companies) ON WT.Company =
C.CompanyName

Include this query in your other queries that need to translate the
dept. names link by the C.CompanyName.

--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)

-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQA/AwUBQRKLSoechKqOuFEgEQIjawCdEtOppjOaDnd5AgT2PbKk/GdrM4oAoICZ
+BWH5xy8OCy47/IDtT2ukx0x
=r5Rh
-----END PGP SIGNATURE-----
dan graziano wrote:
I have a very large data set where I have to do considerable regrouping
of values and would like to know how you would suggest doing the
following regrouping?

The structure of the data set is the following (heading fields in first
row and columns separated by commas):

country, year, company, department, expenses (dollars)
USA, 2002, Pepsi, marketing,1004
France, 2001, Pepsi, marketing,300
Spain, 1999, Pepsi, human resources, 25003
Belgium, 2001, Amex, selling Office, 2400
France, 2001, McDonalds, expense Office, 334000
Japan, 2002, Amex, financing, 2400
France, 2003, Amex, selling Office, 23
Korea, 2001, Amex, admin. Office, 333
Korea, 2001, Amex, management, 22
USA, 2002, McDonalds, brands, 3340

Basically, all the data must be regrouped according to Pepsi`s
departmental categories only which are:

sales
marketing
finance
human Resources
management

The following are the equivalent names of the departments for Pepsi`s
departments in the other companies (columns separated by "/"). Some of
the Pepsi departments are also the sum of two smaller departments in the
other companies:

Pepsi / McDonalds / Amex
Sales / Sales Department / Selling Office
Marketing / Brands+Market Office / Marketing(US)+Marketing(Europe)
Finance / Expense Office / Financing
Human Resources / HR / People Development + HR office
Management / Administration /Admin. Office + Management

So the above data set needs to be converted to:

country, year, company, department, expenses (dollars)
USA, 2002, Pepsi, marketing, 1004
France, 2001, Pepsi, marketing, 300
Spain, 1999, Pepsi, human resources, 25003
Belgium, 2001, Amex, sales,2400
France, 2001,McDonalds, finance,334000
Japan, 2002,Amex, finance, 2400
France, 2003, Amex, sales, 23
Korea, 2001, Amex, management, 333
Korea, 2001, Amex,management,22
USA, 2002, McDonalds, marketing,3340


Nov 13 '05 #2

MGFoster,

Thanks for the suggestions on regrouping the data. Probably I should
have just put the data in columns rather than separating the columns by
commas in the sample dataset in the message, actually the data in the
large dataset are in columns in an excel spreadsheet (I had just
separated the columns by commas in the developersdex message posting as
it was easier than spacing the columns).

As the data are in an excel spreadsheet, I don't think I'd be able to do
any sql commands and I must admit I'm not familiar with how to do the
sql. Is there any way you'd suggest to do the regrouping if the data
are in an excel sheet?

Thank you, and looking forward to your suggestions.

Best regards,

Dan Graziano

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Nov 13 '05 #3
You can attach an Excel s/s data area to an Access file (linked table);
then you can use SQL (aka queries) to use the s/s data.

--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)

dan graziano wrote:
MGFoster,

Thanks for the suggestions on regrouping the data. Probably I should
have just put the data in columns rather than separating the columns by
commas in the sample dataset in the message, actually the data in the
large dataset are in columns in an excel spreadsheet (I had just
separated the columns by commas in the developersdex message posting as
it was easier than spacing the columns).

As the data are in an excel spreadsheet, I don't think I'd be able to do
any sql commands and I must admit I'm not familiar with how to do the
sql. Is there any way you'd suggest to do the regrouping if the data
are in an excel sheet?


Nov 13 '05 #4
dan graziano <d3********@hotmail.com> wrote in message news:<41**********************@news.newsgroups.ws> ...
I have a very large data set where I have to do considerable regrouping
of values and would like to know how you would suggest doing the
following regrouping?


The following table will help you create your query:

tblDeptEquivalences
Dept EquivalentDept
sales sales
sales Sales Department
sales Selling Office
marketing marketing
marketing Brands
marketing Market Office
marketing Marketing(US)
marketing Marketing(Europe)
finance finance
finance Expense Office
finance Financing
human Resources human resources
human Resoureces HR
human Resources People Development
human Resources HR office
management management
management admin. Office

SELECT country, year, company, (SELECT Dept FROM tblDeptEquivalences
AS B WHERE B.EquivalentDept = A.Dept) AS department, [expenses
(dollars)] FROM tblVeryLargeDataset AS A;

Use as many aliases as you need in tblDeptEquivalences.

James A. Fortune
Nov 13 '05 #5
dan graziano <d3********@hotmail.com> wrote in message news:<41**********************@news.newsgroups.ws> ...
I have a very large data set where I have to do considerable regrouping
of values and would like to know how you would suggest doing the
following regrouping?


The following table will help you create your query:

tblDeptEquivalences
Dept EquivalentDept
sales sales
sales Sales Department
sales Selling Office
marketing marketing
marketing Brands
marketing Market Office
marketing Marketing(US)
marketing Marketing(Europe)
finance finance
finance Expense Office
finance Financing
human Resources human resources
human Resoureces HR
human Resources People Development
human Resources HR office
management management
management admin. Office

SELECT country, year, company, (SELECT Dept FROM tblDeptEquivalences
AS B WHERE B.EquivalentDept = A.Dept) AS department, [expenses
(dollars)] FROM tblVeryLargeDataset AS A;

Use as many aliases as you need in tblDeptEquivalences.

James A. Fortune
Nov 13 '05 #6
MGFoster <me@privacy.com> wrote in message news:<gX*****************@newsread1.news.pas.earth link.net>...
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Using a LEFT JOIN like MGFoster did is better than what I posted since
it will show you if you missed any values in the translation table.
With mine, you need to run something like a SELECT DISTINCT query to
make sure you cover all the possible values to be translated.

James A. Fortune
Nov 13 '05 #7
Thank you to all for the suggestions on how to do the regrouping of
data. They seem like excellent suggestions.

The only thing is, I do not know how to use Access or Sql. I'm strongly
proficient in Excel, and I would love to know how to Access and Sql in
tandem with Excel. But unfortunately, for the moment, I wouldn't know
how to incorporate your suggestions.

I do have Access and the Sql software and would very much like to try
out the suggestions. If it might be possible to have step by step
instructions as to how to go about it as soon as I open Access, that
would be great?

I am very keen to learn Access and Sql and to be able to start applying
them to my work immediately. But the time schedule for my project
doesn't really afford me much time for learning and training. Any
suggestions on how to learn Access and sql quickly and on the fly while
working on this project would be much appreciated.

Or the key practical things to know in Access and Sql which would make a
big difference in working on large data sets in tandem with Excel would
be very helpful.

Thanks again, and looking forward to your suggestions.

Best regards,

Dan Graziano

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Nov 13 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
by: D Witherspoon | last post by:
I am developing a Windows Forms application in VB.NET that will use .NET remoting to access the data tier classes. A very simple way I have come up with is by creating typed (.xsd) datasets. For...
1
by: Patrick | last post by:
Hi all, New to .NET, I'm working on an Winforms client application using VS 2005 beta2. My needs considering data storage are the followings: (1) Small files (0 < length < 10 mb), containing...
3
by: Brent | last post by:
Hi, I'm wondering if it is good to use datasets for large amounts of data with many users. I'm talking tables with 130,000 records and 15 columns. And we want current data, so no cached data....
2
by: headware | last post by:
I'm relatively new to ASP.NET and ADO.NET, but I have a basic design question regarding the use of web services and APS.NET applications. Right now we have an application that uses web services to...
4
by: Jonah Olsson | last post by:
Dear All, I'm currently developing a solution where large amounts of personalised emails are being created (and no, this is not spam...) on the ASP.NET platform and being delivered by a Debian...
3
by: DwC | last post by:
Hi, We have a ms access database that we will be using to develop a website which would have fairly low usage levels. We have some experience with windows apps but not so much with asp.net...
9
by: David Harris | last post by:
Ok, so I'm semi-new to .NET, having done everything manually with SQL code back in VB6. So before I program this up completely manually again, I thought I'd ask for better ways to think through...
2
by: palgre | last post by:
Hi ALL, I am working in an windows based application using SQL Server 2000 as database. There are few tables (refer parent tables) in the application which are uploaded by a seprate application....
1
by: Dave | last post by:
I'm having problems getting the GridView to reliably display a large amount of data (50,000+ rows). I am working my way through the excellent book “Real World ASP.NET Best Practices” by Farhan...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.