Hi Elniunia
It's possible that I don't understand your situation precisely, but perhaps it's similar to mine. I often have data files which have a header row, and then many lines of data.
Eg
Temperature, Voltage, Current, etc
5.002, 1.32, 0.00032, etc
6.003, 1.42, 0.00042, etc
etc
I then find it very convenient to make a dictionary of numpy arrays.
I have this function which I use to create this dictionary of arrays:
- from numpy import *
-
-
def MyOpen(myFile,textRow=0,dataStarts=1,hasHeadings=True,separater=NoneappendWhenNotDigit=True,returnArray=True):
-
"""Opens txt file (myFile), which has a standard format of
-
text headings (with no space) separated by white space, followed
-
by numbers separated in the same way.
-
Output is a dictionary based the first row, with lists.
-
textRow is the row containing the headings.
-
dataStarts is the first row containing the data, and must be bigger
-
that textRow.
-
If there are no text headings then set hasHeadings to
-
False, and they'll be labelled in the dictionary by 'Col0' etc
-
If appendWhenNotDigit=True (default), then all rows will be appended.
-
Setting it to False, will mean that rows containing non-numeric values
-
will not be appended"""
-
f=open(myFile,'r')
-
g=f.readlines()
-
f.close()
-
###change to lists###
-
h=[]
-
for n,i in enumerate(g):
-
if n<dataStarts and n<>textRow: continue
-
-
if separater==None:
-
temp1=i.split()
-
else:
-
temp1=i.split(separater)
-
temp2=[]
-
myAppend=True
-
for j in temp1:
-
#if j.isdigit():
-
# temp2.append(int(j))
-
if isNumber(j.strip()):
-
temp2.append(float(j.strip()))
-
else:
-
temp2.append(j.strip())
-
if n<>textRow and not appendWhenNotDigit:
-
myAppend=False
-
break
-
if myAppend: h.append(temp2)
-
###create dictionary
-
d=dict([])
-
if hasHeadings:
-
for hi in h[0]:
-
d[hi]=[]
-
else:
-
for i in range(len(h[0])):
-
d["Col"+str(i)]=[]
-
for i in range(hasHeadings,len(h)):
-
for j in range(len(h[0])):
-
if hasHeadings:
-
d[h[0][j]].append(h[i][j])
-
else:
-
d["Col"+str(j)].append(h[i][j])
-
if returnArray==True:
-
e=dict([])
-
for k in d.keys():
-
e[k]=array(d[k])
-
return e
-
return d
There are several advantages to doing it this way.
Firstly if you need to calculate another set of results based on the data you've stored, it can be done like this:
-
def calc(a,d,A):
-
"""a is the array based dictionary from the raw data & it will return
-
a dictionary where additional variables have been calculated"""
-
T=a["T/K"]
-
q=a["Theta"]
-
Z=a["Z"]
-
a["10/T"]=10/T
-
a["T-0.5"]=T**(-0.5)
-
return a
But the other thing you can do is first sort your data by AccessionNumber with this function:
- def sort(a,sortName="T/K"):
-
-
"""a is an array dictionary. Sorts all arrays by one of them"""
-
-
#use list.insert(bisect_left(list,element),elemnt) to create
-
-
#a mask and apply it to all the elements
-
-
mask=[]
-
-
vals=[]
-
-
for n,t in enumerate(a[sortName]):
-
-
ins=bisect_left(vals,t)
-
-
mask.insert(ins,n)
-
-
vals.insert(ins,t)
-
-
a2=dict()
-
-
for k in a.keys():
-
-
a2[k]=a[k][mask]
-
-
return a2
-
You just need to pass the dictionary you created to it and the name of the field you want to sort by.
Then I guess you want to remove duplicates. I haven't got a function for it, but something like this will do the job:
- def removeDuplicates(a,sortName):
-
"""a is an array dictionary. Sorts all arrays by one of them"""
-
#use list.insert(bisect_left(list,element),elemnt) to create
-
#a mask and apply it to all the elements
-
a=sort(a,sortName)
-
mask=a[sortName][:-1]==a[sortName][1:]
-
mask=concatenate(array(True),mask)
-
for k in a.keys():
-
a2[k]=a[k][mask]
-
return a2
-
I'm afraid I haven't had a chance to test this code.