473,385 Members | 1,798 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Python csv calculate percentage by group

I have a table (csv file) with three columns:

Wood [m2] Polygon Area [m2]
15 A 50
10 A 50
12 B 30
10 C 30
05 D 50
10 D 50

My aim is to calculate the percentage of wood for each Polygon. I want to print this result into a new csv table:

Polygon Percentage of Wood (%)
A 0.5 (=25/50)
B
C
D

I usually use Python through ArcGIS (arcpy module) but the modules are very slow for certain things. This is why I want to try to solve the question without this module. But I cannot figure out how to do this. Any help is greatly appreciated.
Jan 30 '15 #1

✓ answered by bvdet

I don't think you want 15+(20/50)(operator precedence). I think you want (15+20)/50.

Here's where a dictionary comes in handy:
Expand|Select|Wrap|Line Numbers
  1. data = """Wood [m2],Polygon,Area [m2]
  2. 15,A,50
  3. 10,A,50
  4. 12,B,30
  5. 10,C,30
  6. 05,D,50
  7. 10,D,50"""
  8.  
  9. dataLines = data.split("\n")
  10. dd = {}
  11. for line in dataLines[1:]:
  12.     items = line.split(",")
  13.     dd.setdefault(items[1], []).append((float(items[0]), float(items[2])))
  14.  
  15. keys = sorted(dd.keys())
  16. for key in keys:
  17.     print ("Polygon %s: \nPercentage: %0.0f%%" %
  18.            (key, sum((item[0] for item in dd[key]))/dd[key][0][1]*100))
  19.     print "========================"

9 3073
bvdet
2,851 Expert Mod 2GB
You would start by opening the file, reading the file, breaking up the file contents to individual parts and saving in a container object such as a list or dictionary, iterate on the container and perform your calculations, print the output or save to disk. Would not you have to do those steps in ArcGIS?
Jan 30 '15 #2
no, there are arcpy tools which you can call and as I understand they simplify the steps. But the problem is that some of them take very long to run. This website shows me how to read a csv file (https://docs.python.org/2/library/csv.html) and I managed to do that but how can I group the variables? Is there a function?
Jan 30 '15 #3
bvdet
2,851 Expert Mod 2GB
Here's an example of manipulating the data after the file is read:
Expand|Select|Wrap|Line Numbers
  1. data = """Wood [m2],Polygon,Area [m2]
  2. 15,A,50
  3. 10,A,50
  4. 12,B,30
  5. 10,C,30
  6. 05,D,50
  7. 10,D,50"""
  8.  
  9. dataLines = data.split("\n")
  10. for line in dataLines[1:]:
  11.     items = line.split(",")
  12.     print ("Polygon %s: \nPercentage: %0.0f%%" %
  13.            (items[1], float(items[0])/float(items[2])*100))
  14.     print "========================"
And the output:
Expand|Select|Wrap|Line Numbers
  1. >>> Polygon A: 
  2. Percentage: 30%
  3. ========================
  4. Polygon A: 
  5. Percentage: 20%
  6. ========================
  7. Polygon B: 
  8. Percentage: 40%
  9. ========================
  10. Polygon C: 
  11. Percentage: 33%
  12. ========================
  13. Polygon D: 
  14. Percentage: 10%
  15. ========================
  16. Polygon D: 
  17. Percentage: 20%
  18. ========================
  19. >>> 
Jan 30 '15 #4
ok but with this solution I get several output for Polygon A and D. I am interested in summarizing the wooden Areas for each Polygon which has the same name. For Polygon A for example this would be 15+20/50. Is the quickest way to sum up the outputs or to do this step beforehand? Thanks a lot!!
Jan 30 '15 #5
bvdet
2,851 Expert Mod 2GB
I don't think you want 15+(20/50)(operator precedence). I think you want (15+20)/50.

Here's where a dictionary comes in handy:
Expand|Select|Wrap|Line Numbers
  1. data = """Wood [m2],Polygon,Area [m2]
  2. 15,A,50
  3. 10,A,50
  4. 12,B,30
  5. 10,C,30
  6. 05,D,50
  7. 10,D,50"""
  8.  
  9. dataLines = data.split("\n")
  10. dd = {}
  11. for line in dataLines[1:]:
  12.     items = line.split(",")
  13.     dd.setdefault(items[1], []).append((float(items[0]), float(items[2])))
  14.  
  15. keys = sorted(dd.keys())
  16. for key in keys:
  17.     print ("Polygon %s: \nPercentage: %0.0f%%" %
  18.            (key, sum((item[0] for item in dd[key]))/dd[key][0][1]*100))
  19.     print "========================"
Jan 30 '15 #6
I just copied your code and it works perfectly! Thank you so much!! I will try to understand what you did and maybe I can get back to you in case I do not understand something. Thanks!:)
Jan 30 '15 #7
Another question (sry...): If I import my csv file I get the fallowing structure:

['15', 'A', '50']
['10', 'A', '50']
['12', 'B', '30']
['10', 'C', '30']
['5', 'D', '50']
['10', 'D', '50']

How do you import your csv file without listing each row separately? I donīt seem to be able to figure out what I am doing wrong...
Jan 30 '15 #8
Aha, maybe I figured out how to do it:

data = open("Test.csv", "r")
print data.read()

but now I get this error:
Traceback (most recent call last):
File "/home/katharina/Desktop/Test.py", line 14, in <module>
dataLines = data.split("\n")
AttributeError: 'file' object has no attribute 'split'

and if I uncomment the dataLines line the fallowing error appears: Traceback (most recent call last):
File "/home/katharina/Desktop/Test.py", line 16, in <module>
for line in data[1:]:
TypeError: 'file' object has no attribute '__getitem__'

Any clue what I am doing wrong?
Jan 30 '15 #9
bvdet
2,851 Expert Mod 2GB
There are several ways of doing this. You don't have to create a file object.
Expand|Select|Wrap|Line Numbers
  1. data = open("Test.csv", "r").read()
OR
Expand|Select|Wrap|Line Numbers
  1. dataLines = [item.strip() for item in open("Test.csv", "r").readlines()
Feb 2 '15 #10

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: toluj | last post by:
Hi, pls could anyone help me with the script to calculate the percentage btwn two fields in a table.
3
by: tulikapuri | last post by:
Dear Friends, I am using the method to cal. percentage in report but no sucess it gives #Num! instead of a number. I am following the steps as given in help to calculate percentage value on a...
7
by: ngweixiong | last post by:
Hi, I have a Ms Access query which i used to calculate how many times the leadtime is a) less than 7 days b) 7-14 days c) more than 14 days With the query results, i will like to convert...
3
by: smileyangeluv | last post by:
Hi, Would like to get percentage for generated column. Any idea on how to do that?? Following SQL statement SELECT s.Selection_Desc, count(u.user_id) from tbl_system_selection s, tbl_user...
4
ollyb303
by: ollyb303 | last post by:
Hello, Trying to help a friend/colleague out with a database and we've both drawn a blank. Not even sure if this is possible. The database has a table (Table1) with a several columns: ID,...
0
by: Mike Driscoll | last post by:
Hi, I am organizing a Python User's Group for Iowa and am hoping there are some Iowans that frequent this list. if you are one and are interested in getting together with other Python-people,...
5
by: NareshN | last post by:
Hi, I have weekly wise scheduled hours of each employee and no of days scheduled for each employee,now i need to calculate no of employees scheduled less than 24 hours,no of emp's scheduled b/w...
0
by: Prashant Gadeka | last post by:
Hi, I am having following table structure and sample values: ---------------------------------------------------------- APP_ID DATE HOUR STATUS_CODE HITS ...
1
by: lnh6513 | last post by:
In short, I'm trying to create a simple dashboard using MOSS2007, but don't have a lot of the additional plugins or webparts installed\available that would make this easy. I have a massive list...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.