By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,871 Members | 2,575 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,871 IT Pros & Developers. It's quick & easy.

Python csv calculate percentage by group

P: 6
I have a table (csv file) with three columns:

Wood [m2] Polygon Area [m2]
15 A 50
10 A 50
12 B 30
10 C 30
05 D 50
10 D 50

My aim is to calculate the percentage of wood for each Polygon. I want to print this result into a new csv table:

Polygon Percentage of Wood (%)
A 0.5 (=25/50)
B
C
D

I usually use Python through ArcGIS (arcpy module) but the modules are very slow for certain things. This is why I want to try to solve the question without this module. But I cannot figure out how to do this. Any help is greatly appreciated.
Jan 30 '15 #1

✓ answered by bvdet

I don't think you want 15+(20/50)(operator precedence). I think you want (15+20)/50.

Here's where a dictionary comes in handy:
Expand|Select|Wrap|Line Numbers
  1. data = """Wood [m2],Polygon,Area [m2]
  2. 15,A,50
  3. 10,A,50
  4. 12,B,30
  5. 10,C,30
  6. 05,D,50
  7. 10,D,50"""
  8.  
  9. dataLines = data.split("\n")
  10. dd = {}
  11. for line in dataLines[1:]:
  12.     items = line.split(",")
  13.     dd.setdefault(items[1], []).append((float(items[0]), float(items[2])))
  14.  
  15. keys = sorted(dd.keys())
  16. for key in keys:
  17.     print ("Polygon %s: \nPercentage: %0.0f%%" %
  18.            (key, sum((item[0] for item in dd[key]))/dd[key][0][1]*100))
  19.     print "========================"

Share this Question
Share on Google+
9 Replies


bvdet
Expert Mod 2.5K+
P: 2,851
You would start by opening the file, reading the file, breaking up the file contents to individual parts and saving in a container object such as a list or dictionary, iterate on the container and perform your calculations, print the output or save to disk. Would not you have to do those steps in ArcGIS?
Jan 30 '15 #2

P: 6
no, there are arcpy tools which you can call and as I understand they simplify the steps. But the problem is that some of them take very long to run. This website shows me how to read a csv file (https://docs.python.org/2/library/csv.html) and I managed to do that but how can I group the variables? Is there a function?
Jan 30 '15 #3

bvdet
Expert Mod 2.5K+
P: 2,851
Here's an example of manipulating the data after the file is read:
Expand|Select|Wrap|Line Numbers
  1. data = """Wood [m2],Polygon,Area [m2]
  2. 15,A,50
  3. 10,A,50
  4. 12,B,30
  5. 10,C,30
  6. 05,D,50
  7. 10,D,50"""
  8.  
  9. dataLines = data.split("\n")
  10. for line in dataLines[1:]:
  11.     items = line.split(",")
  12.     print ("Polygon %s: \nPercentage: %0.0f%%" %
  13.            (items[1], float(items[0])/float(items[2])*100))
  14.     print "========================"
And the output:
Expand|Select|Wrap|Line Numbers
  1. >>> Polygon A: 
  2. Percentage: 30%
  3. ========================
  4. Polygon A: 
  5. Percentage: 20%
  6. ========================
  7. Polygon B: 
  8. Percentage: 40%
  9. ========================
  10. Polygon C: 
  11. Percentage: 33%
  12. ========================
  13. Polygon D: 
  14. Percentage: 10%
  15. ========================
  16. Polygon D: 
  17. Percentage: 20%
  18. ========================
  19. >>> 
Jan 30 '15 #4

P: 6
ok but with this solution I get several output for Polygon A and D. I am interested in summarizing the wooden Areas for each Polygon which has the same name. For Polygon A for example this would be 15+20/50. Is the quickest way to sum up the outputs or to do this step beforehand? Thanks a lot!!
Jan 30 '15 #5

bvdet
Expert Mod 2.5K+
P: 2,851
I don't think you want 15+(20/50)(operator precedence). I think you want (15+20)/50.

Here's where a dictionary comes in handy:
Expand|Select|Wrap|Line Numbers
  1. data = """Wood [m2],Polygon,Area [m2]
  2. 15,A,50
  3. 10,A,50
  4. 12,B,30
  5. 10,C,30
  6. 05,D,50
  7. 10,D,50"""
  8.  
  9. dataLines = data.split("\n")
  10. dd = {}
  11. for line in dataLines[1:]:
  12.     items = line.split(",")
  13.     dd.setdefault(items[1], []).append((float(items[0]), float(items[2])))
  14.  
  15. keys = sorted(dd.keys())
  16. for key in keys:
  17.     print ("Polygon %s: \nPercentage: %0.0f%%" %
  18.            (key, sum((item[0] for item in dd[key]))/dd[key][0][1]*100))
  19.     print "========================"
Jan 30 '15 #6

P: 6
I just copied your code and it works perfectly! Thank you so much!! I will try to understand what you did and maybe I can get back to you in case I do not understand something. Thanks!:)
Jan 30 '15 #7

P: 6
Another question (sry...): If I import my csv file I get the fallowing structure:

['15', 'A', '50']
['10', 'A', '50']
['12', 'B', '30']
['10', 'C', '30']
['5', 'D', '50']
['10', 'D', '50']

How do you import your csv file without listing each row separately? I donīt seem to be able to figure out what I am doing wrong...
Jan 30 '15 #8

P: 6
Aha, maybe I figured out how to do it:

data = open("Test.csv", "r")
print data.read()

but now I get this error:
Traceback (most recent call last):
File "/home/katharina/Desktop/Test.py", line 14, in <module>
dataLines = data.split("\n")
AttributeError: 'file' object has no attribute 'split'

and if I uncomment the dataLines line the fallowing error appears: Traceback (most recent call last):
File "/home/katharina/Desktop/Test.py", line 16, in <module>
for line in data[1:]:
TypeError: 'file' object has no attribute '__getitem__'

Any clue what I am doing wrong?
Jan 30 '15 #9

bvdet
Expert Mod 2.5K+
P: 2,851
There are several ways of doing this. You don't have to create a file object.
Expand|Select|Wrap|Line Numbers
  1. data = open("Test.csv", "r").read()
OR
Expand|Select|Wrap|Line Numbers
  1. dataLines = [item.strip() for item in open("Test.csv", "r").readlines()
Feb 2 '15 #10

Post your reply

Sign in to post your reply or Sign up for a free account.