473,397 Members | 1,949 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

any other best way of reading the file

440 256MB
Hi,

I am here with placing the Sample code for reading the and Input data mentioned.

Is there any best of reading the file?.

Thanks in advance
PSB
Expand|Select|Wrap|Line Numbers
  1. Sampple1.txt
  2. Rect    1       1       2       7       6
  3. Rect    2       2       3       8       7
  4. Rect    3       3       4       9       8
  5. Tria    4       4       5       9
  6. Pnt     1       0.      0.      0.
  7. Pnt     2       5.      0.      0.
  8. Pnt     3       10.     0.      0.
  9. Pnt     4       15.     0.      0.
  10. Pnt     5       20.     0.      0.
  11. Pnt     6       0.      5.      0.
  12. Pnt     7       5.      5.      0.
  13. Pnt     8       10.     5.      0.
  14. Pnt     9       15.     5.      0.
  15.  
Expand|Select|Wrap|Line Numbers
  1. Sample.py
  2. def read_file_data(strFile):
  3.  
  4.     f = open(strFile,'r')        
  5.  
  6.     pntIDDict = {}
  7.     pntCoordDict = {}
  8.     pntList = []             
  9.     coordList = []
  10.     wireIDDict ={}
  11.  
  12.     while True:
  13.         strTemp = f.readline()
  14.  
  15.         if len(strTemp)>=1:            
  16.             strTemp = strTemp[:(len(strTemp)-1)]                                        
  17.             if strTemp[:3]=='Pnt':                         
  18.                 pntID = int(strTemp[8:16])                        
  19.                 coordList.append((float(strTemp[16:24])))
  20.                 coordList.append((float(strTemp[24:32])))
  21.                 coordList.append((float(strTemp[32:40])))                      
  22.  
  23.                 pntIDDict[pntID]=coordList                        
  24.                 coordList = []                        
  25.  
  26.             elif (strTemp[:4]=='Rect' or strTemp[:4]=='Tria'):
  27.                 wireID = int(strTemp[8:16])                        
  28.                 pntList.append((int(strTemp[16:24])))
  29.                 pntList.append((int(strTemp[24:32])))
  30.                 pntList.append((int(strTemp[32:40])))    
  31.  
  32.                 if (strTemp[:4]=='Rect'):
  33.                     pntList.append((int(strTemp[40:48])))                    
  34.  
  35.                 wireIDDict[wireID]=pntList
  36.                 pntList = []
  37.         else:
  38.             break
  39.  
  40.     f.close()
  41.  
  42.     return pntIDDict,wireIDDict
  43.  
  44. if __name__ == '__main__':
  45.  
  46.     pntIDDict = {}
  47.     wireIDDict = {}
  48.     ntIDDict,wireIDDict = read_file_data ("c:\\Sample1.txt")
  49.     print ntIDDict,wireIDDict
  50.  
  51.  
Mar 18 '07 #1
59 4042
bartonc
6,596 Expert 4TB
Hi,

I am here with placing the Sample code for reading the and Input data mentioned.

Is there any best of reading the file?.

Thanks in advance
PSB
Expand|Select|Wrap|Line Numbers
  1. Sampple1.txt
  2. Rect    1       1       2       7       6
  3. Rect    2       2       3       8       7
  4. Rect    3       3       4       9       8
  5. Tria    4       4       5       9
  6. Pnt     1       0.      0.      0.
  7. Pnt     2       5.      0.      0.
  8. Pnt     3       10.     0.      0.
  9. Pnt     4       15.     0.      0.
  10. Pnt     5       20.     0.      0.
  11. Pnt     6       0.      5.      0.
  12. Pnt     7       5.      5.      0.
  13. Pnt     8       10.     5.      0.
  14. Pnt     9       15.     5.      0.
  15.  
Expand|Select|Wrap|Line Numbers
  1. Sample.py
  2. def read_file_data(strFile):
  3.  
  4.     f = open(strFile,'r')        
  5.  
  6.     pntIDDict = {}
  7.     pntCoordDict = {}
  8.     pntList = []             
  9.     coordList = []
  10.     wireIDDict ={}
  11.  
  12.     while True:
  13.         strTemp = f.readline()
  14.  
  15.         if len(strTemp)>=1:            
  16.             strTemp = strTemp[:(len(strTemp)-1)]                                        
  17.             if strTemp[:3]=='Pnt':                         
  18.                 pntID = int(strTemp[8:16])                        
  19.                 coordList.append((float(strTemp[16:24])))
  20.                 coordList.append((float(strTemp[24:32])))
  21.                 coordList.append((float(strTemp[32:40])))                      
  22.  
  23.                 pntIDDict[pntID]=coordList                        
  24.                 coordList = []                        
  25.  
  26.             elif (strTemp[:4]=='Rect' or strTemp[:4]=='Tria'):
  27.                 wireID = int(strTemp[8:16])                        
  28.                 pntList.append((int(strTemp[16:24])))
  29.                 pntList.append((int(strTemp[24:32])))
  30.                 pntList.append((int(strTemp[32:40])))    
  31.  
  32.                 if (strTemp[:4]=='Rect'):
  33.                     pntList.append((int(strTemp[40:48])))                    
  34.  
  35.                 wireIDDict[wireID]=pntList
  36.                 pntList = []
  37.         else:
  38.             break
  39.  
  40.     f.close()
  41.  
  42.     return pntIDDict,wireIDDict
  43.  
  44. if __name__ == '__main__':
  45.  
  46.     pntIDDict = {}
  47.     wireIDDict = {}
  48.     ntIDDict,wireIDDict = read_file_data ("c:\\Sample1.txt")
  49.     print ntIDDict,wireIDDict
  50.  
  51.  
It easiest to iterate a file using the 'in' operator.
In order to handle spaces (which may vary) or tabs, a list is safer.
This technique is more likely to raise an IndexError, but also more likely to read the values out of the file:
Expand|Select|Wrap|Line Numbers
  1. def read_file_data(strFile):
  2.  
  3.     f = open(strFile,'r')
  4.  
  5.     pntIDDict = {}
  6.     pntCoordDict = {}
  7.     pntList = []
  8.     coordList = []
  9.     wireIDDict ={}
  10.  
  11. ##    while True:
  12. ##        strTemp = f.readline()
  13.     for strTemp in f:
  14.         tmpList = strTemp.split()
  15.  
  16. ##        if len(strTemp)>=1:
  17. ####            strTemp = strTemp[:(len(strTemp)-1)] really a need for this?
  18. ##            strTemp = strTemp[:-1]  # if there is, strings know their length
  19.         if tmpList[0] == 'Pnt':
  20.             pntID = int(tmpList[1])
  21.             coordList.append((float(tmpList[2])))
  22.             coordList.append((float(tmpList[3])))
  23.             coordList.append((float(tmpList[4])))
  24.  
  25.             pntIDDict[pntID]=coordList
  26.             coordList = []
  27.  
  28.         elif (tmpList[0] == 'Rect' or tmpList[0] == 'Tria'):
  29.             wireID = int(tmpList[1])
  30.             pntList.append((int(tmpList[2])))
  31.             pntList.append((int(tmpList[3])))
  32.             pntList.append((int(tmpList[4])))
  33.  
  34.             if (strTemp[:4]=='Rect'):
  35.                 pntList.append((int(tmpList[5])))
  36.  
  37.             wireIDDict[wireID] = pntList
  38.             pntList = []
  39. ##        else:
  40. ##            break
  41.  
  42.     f.close()
  43.  
  44.     return pntIDDict, wireIDDict
  45.  
  46. if __name__ == '__main__':
  47.  
  48.     pntIDDict = {}
  49.     wireIDDict = {}
  50.     ntIDDict,wireIDDict = read_file_data ("text1.txt")
  51.     print ntIDDict,wireIDDict
  52.  
You should also guard against TypeError for fload() and int() in try blocks.
Mar 18 '07 #2
psbasha
440 256MB
Thanks for the suggestion.

Which place I have to place the "try" and "catch" blocks

-PSB
Mar 18 '07 #3
psbasha
440 256MB
BV ,any comments from your side on reading the above input file data.Is it psosible to reduce any lines of code and make the reading data in more precise way.

Thanks
PSB
Mar 18 '07 #4
bartonc
6,596 Expert 4TB
Thanks for the suggestion.

Which place I have to place the "try" and "catch" blocks

-PSB
If you don't care which field is wrong (just want to handle errors gracefully) then wrap all 3 (or 4) conversions in a single block:
Expand|Select|Wrap|Line Numbers
  1. >>> float('abc')
  2.   File "<console>", line 1, in ?
  3. ''' exceptions.ValueError : invalid literal for float(): abc '''
  4.  
  5. >>> try:
  6. ...     float('abc')
  7. ... except ValueError: # Try not to use 'naked' excepts EVER
  8. ...     print "not a float"
  9. ...     
  10. not a float
  11. >>> 
The same goes for the ints.
Mar 18 '07 #5
bvdet
2,851 Expert Mod 2GB
BV ,any comments from your side on reading the above input file data.Is it psosible to reduce any lines of code and make the reading data in more precise way.

Thanks
PSB
Shameless! You know I'm a sucker for file parsing problems :)

Use the convert data function I showed you. Initialize your dictionaries. Read all the lines from the file into a list. No file object is created. Iterate on the list. Create a word list from the line with a list comprehension using strip() and split(' '), skipping the blank strings. Check for keywords in the word list to decide which dictionary to add to using another list comprehension. If the data conversion fails, you have a string.
Expand|Select|Wrap|Line Numbers
  1. def read_file_data(f):
  2.     ptDict = {}
  3.     wireDict = {}
  4.     fList = open(f).readlines()
  5.     for line in fList:
  6.         lineList = [x.lower() for x in line.strip().split(' ') if x != '']
  7.         if 'rect' in lineList or 'tria' in lineList:
  8.             wireDict[convert_data(lineList[1])] = [convert_data(x) for x in lineList[2:]]
  9.         elif 'pnt' in lineList:
  10.             ptDict[convert_data(lineList[1])] = [convert_data(x) for x in lineList[2:]]
  11.     return ptDict,wireDict
Mar 19 '07 #6
psbasha
440 256MB
Shameless! You know I'm a sucker for file parsing problems :)

Use the convert data function I showed you. Initialize your dictionaries. Read all the lines from the file into a list. No file object is created. Iterate on the list. Create a word list from the line with a list comprehension using strip() and split(' '), skipping the blank strings. Check for keywords in the word list to decide which dictionary to add to using another list comprehension. If the data conversion fails, you have a string.
Expand|Select|Wrap|Line Numbers
  1. def read_file_data(f):
  2.     ptDict = {}
  3.     wireDict = {}
  4.     fList = open(f).readlines()
  5.     for line in fList:
  6.         lineList = [x.lower() for x in line.strip().split(' ') if x != '']
  7.         if 'rect' in lineList or 'tria' in lineList:
  8.             wireDict[convert_data(lineList[1])] = [convert_data(x) for x in lineList[2:]]
  9.         elif 'pnt' in lineList:
  10.             ptDict[convert_data(lineList[1])] = [convert_data(x) for x in lineList[2:]]
  11.     return ptDict,wireDict
BV,

If the Point and Wire IDs are having 8-digit number then I am not able to get the details from the above piece of code,sicne we are not having the spaces in between the data.

How to resolve this issue?

Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2. Rect    1000000010000000200000007000000060000000
  3. Rect    2000000020000000300000008000000070000000
  4. Rect    3000000030000000400000009000000080000000
  5. Tria     40000000400000005000000090000000
  6. Pnt     100000000.      0.      0.
  7. Pnt     200000005.      0.      0.
  8. Pnt     3000000010.     0.      0.
  9. Pnt     4000000015.     0.      0.
  10. Pnt     5000000020.     0.      0.
  11. Pnt     600000000.      5.      0.
  12. Pnt     700000005.      5.      0.
  13. Pnt     8000000010.     5.      0.
  14. Pnt     9000000015.     5.      0.
  15.  
Thanks in advance
PSB
Mar 19 '07 #7
bvdet
2,851 Expert Mod 2GB
BV,

If the Point and Wire IDs are having 8-digit number then I am not able to get the details from the above piece of code,sicne we are not having the spaces in between the data.

How to resolve this issue?

Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2. Rect    1000000010000000200000007000000060000000
  3. Rect    2000000020000000300000008000000070000000
  4. Rect    3000000030000000400000009000000080000000
  5. Tria     40000000400000005000000090000000
  6. Pnt     100000000.      0.      0.
  7. Pnt     200000005.      0.      0.
  8. Pnt     3000000010.     0.      0.
  9. Pnt     4000000015.     0.      0.
  10. Pnt     5000000020.     0.      0.
  11. Pnt     600000000.      5.      0.
  12. Pnt     700000005.      5.      0.
  13. Pnt     8000000010.     5.      0.
  14. Pnt     9000000015.     5.      0.
  15.  
Thanks in advance
PSB
Expand|Select|Wrap|Line Numbers
  1. import re
  2. >>> lineList = [x.lower() for x in re.split('[ 0]', line.strip()) if x != '']
  3. >>> lineList
  4. ['rect', '1', '1', '2', '7', '6']
  5. >>>
Mar 19 '07 #8
psbasha
440 256MB
Expand|Select|Wrap|Line Numbers
  1. import re
  2. >>> lineList = [x.lower() for x in re.split('[ 0]', line.strip()) if x != '']
  3. >>> lineList
  4. ['rect', '1', '1', '2', '7', '6']
  5. >>>
Sorry BV,the numbers will not be zero for all.It will be 8-digit number and maximum value will be 99999999
Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2. Rect    1000007110000101200000227000000060000055
  3. Rect    2000009220000105300000048000400071111167
  4. Rect    3000008830000208400000029000500080003000
  5. Tria     40000094400003045000007190000600
  6. Pnt      100100123.      0.      0.
  7. Pnt      200200035.      0.      0.
  8. Pnt      3040000010.     0.      0.
  9. Pnt      4000000015.     0.      0.
  10. Pnt      5005000020.     0.      0.
  11. Pnt      600008000.      5.      0.
  12. Pnt      700000005.      5.      0.
  13. Pnt      8000900010.     5.      0.
  14. Pnt      9000900015.     5.      0.
  15.  
Mar 19 '07 #9
bvdet
2,851 Expert Mod 2GB
Sorry BV,the numbers will not be zero for all.It will be 8-digit number and maximum value will be 99999999
Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2. Rect    1000007110000101200000227000000060000055
  3. Rect    2000009220000105300000048000400071111167
  4. Rect    3000008830000208400000029000500080003000
  5. Tria     40000094400003045000007190000600
  6. Pnt      100100123.      0.      0.
  7. Pnt      200200035.      0.      0.
  8. Pnt      3040000010.     0.      0.
  9. Pnt      4000000015.     0.      0.
  10. Pnt      5005000020.     0.      0.
  11. Pnt      600008000.      5.      0.
  12. Pnt      700000005.      5.      0.
  13. Pnt      8000900010.     5.      0.
  14. Pnt      9000900015.     5.      0.
  15.  
You have lost me now. What numbers do you want to extract from 'Rect' and 'Tria'? Your data files need to be in a consistent format with predictable delimiters to parse in this manner.
Mar 19 '07 #10
psbasha
440 256MB
You have lost me now. What numbers do you want to extract from 'Rect' and 'Tria'? Your data files need to be in a consistent format with predictable delimiters to parse in this manner.
The output should look like this
o/p should be :

WireDict
{10000071:[110000101,20000022,70000000,60000055],
20000092:[2000010,53000000,480004000,71111167],
3000008:[830000208,40000002,90005000,80003000],
40000094:[40000304,50000071,90000600]}

pntDict
{ 100100123:[0.0,0.0,0.0],20020003:[5.0,0.0,0.0],30400000:[10.0,0.0,0.0],
40000000:[15.0,0.0,0.0],50050000:[20.0,0.0,0.0],60000800:[0.0,5.0,0.0],70000000:[5.0,5.0,0.],
80009000:[10.0,5.0,0.0],90009000:[15.0,5.0,0.]
}
Mar 19 '07 #11
psbasha
440 256MB
The output should look like this
o/p should be :

WireDict
{10000071:[110000101,20000022,70000000,60000055],
20000092:[2000010,53000000,480004000,71111167],
3000008:[830000208,40000002,90005000,80003000],
40000094:[40000304,50000071,90000600]}

pntDict
{ 100100123:[0.0,0.0,0.0],20020003:[5.0,0.0,0.0],30400000:[10.0,0.0,0.0],
40000000:[15.0,0.0,0.0],50050000:[20.0,0.0,0.0],60000800:[0.0,5.0,0.0],70000000:[5.0,5.0,0.],
80009000:[10.0,5.0,0.0],90009000:[15.0,5.0,0.]
}
We have to break at every 8-fields of the number or string.So How can I split it ,without using slicing mechanism.
Mar 19 '07 #12
bvdet
2,851 Expert Mod 2GB
We have to break at every 8-fields of the number or string.So How can I split it ,without using slicing mechanism.
Why not use slices? If your data will be in 8 character fields, it seems to me that would be a good method.
Expand|Select|Wrap|Line Numbers
  1. def each8(item):
  2.     cnt = 0
  3.     for x in range(len(item)/8):
  4.         yield item[cnt:cnt+8]
  5.         cnt += 8
  6.  
  7. def read_file_data(f):
  8.     ptDict = {}
  9.     wireDict = {}
  10.     fList = open(f).readlines()
  11.     for line in fList:
  12.         lineList = [x.lower().strip() for x in line.strip().split(' ', 1) if x != '']
  13.         data = [lineList[0], lineList[1][:8], lineList[1][8:]]
  14.         if 'rect' in lineList or 'tria' in lineList:
  15.             wireDict[convert_data(data[1])] = [convert_data(x) for x in each8(data[2])]
  16.         elif 'pnt' in lineList:
  17.             ptDict[convert_data(data[1])] = [convert_data(x) for x in data[2].split() if x != '']
Expand|Select|Wrap|Line Numbers
  1. import re
  2. ..................................
  3.     for line in fList:
  4.         lineList = [x.lower().strip() for x in line.strip().split(' ', 1) if x != '']
  5.  
  6.         if 'rect' in lineList or 'tria' in lineList:
  7.             wireDict[convert_data(lineList[1][:8])] = \
  8.                 [convert_data(x) for x in re.findall(r"\d{8}", lineList[1])]
  9.  
  10.         elif 'pnt' in lineList:
  11.             ptDict[convert_data(lineList[1][:8])] = \
  12.                 [convert_data(y.strip()) for y in [x for x in re.split(r"\d{8}", \
  13.                     lineList[1]) if x != ''][0].split(' ') if y != '']
Take your choice. I'm no expert at regex!
Mar 19 '07 #13
bvdet
2,851 Expert Mod 2GB
I like this version of each8() better:
Expand|Select|Wrap|Line Numbers
  1. def each8(s):
  2.     while len(s) > 0:
  3.         yield s[:8]
  4.         s = s[8:]
Mar 19 '07 #14
psbasha
440 256MB
Expand|Select|Wrap|Line Numbers
  1. Sample1.txt
  2.  
  3. Sample.txt
  4. Pnt      100100123.      0.      0.
  5. Pnt      200200035.      0.      0.
  6. Pnt      3040000010.     0.      0.
  7. Pnt      4000000015.     0.      0.
  8. Pnt      5005000020.     0.      0.
  9. Pnt      600008000.      5.      0.
  10. Pnt      700000005.      5.      0.
  11. Pnt      8000900010.     5.      0.
  12. Pnt      9000900015.     5.      0.     
  13.  
  14.  
Expand|Select|Wrap|Line Numbers
  1. Sample2.txt
  2. Pnt    *         3280311       0          1.36567432E+03 -3.71226532E+02
  3. *         2.01031464E+02       0
  4. Pnt     *         3280502       0          1.25433850E+03 -1.42613068E+02
  5. *         1.80202667E+02       0
  6. Pnt     *         3280503       0          1.27057288E+03 -1.75843582E+02
  7. *         1.84236084E+02       0
  8. Pnt    *         3280504       0          1.28286145E+03 -2.01004501E+02
  9. *         1.87218460E+02       0
  10.  
Expand|Select|Wrap|Line Numbers
  1. Sample3.txt
  2. Pnt*     10260209                       1156.26599      313.992828
  3. *       155.018463
  4. Pnt*     10270106                       1097.15002      250.676315
  5. *       140.789337
  6. Pnt*     10270107                       1115.47864      271.83374
  7. *       144.698837
  8.  
  9.  
Mar 21 '07 #15
psbasha
440 256MB
Expand|Select|Wrap|Line Numbers
  1. Sample1.txt
  2.  
  3. Sample.txt
  4. Pnt      100100123.      0.      0.
  5. Pnt      200200035.      0.      0.
  6. Pnt      3040000010.     0.      0.
  7. Pnt      4000000015.     0.      0.
  8. Pnt      5005000020.     0.      0.
  9. Pnt      600008000.      5.      0.
  10. Pnt      700000005.      5.      0.
  11. Pnt      8000900010.     5.      0.
  12. Pnt      9000900015.     5.      0.     
  13.  
  14.  
Expand|Select|Wrap|Line Numbers
  1. Sample2.txt
  2. Pnt    *         3280311       0          1.36567432E+03 -3.71226532E+02
  3. *         2.01031464E+02       0
  4. Pnt     *         3280502       0          1.25433850E+03 -1.42613068E+02
  5. *         1.80202667E+02       0
  6. Pnt     *         3280503       0          1.27057288E+03 -1.75843582E+02
  7. *         1.84236084E+02       0
  8. Pnt    *         3280504       0          1.28286145E+03 -2.01004501E+02
  9. *         1.87218460E+02       0
  10.  
Expand|Select|Wrap|Line Numbers
  1. Sample3.txt
  2. Pnt*     10260209                       1156.26599      313.992828
  3. *       155.018463
  4. Pnt*     10270106                       1097.15002      250.676315
  5. *       140.789337
  6. Pnt*     10270107                       1115.47864      271.83374
  7. *       144.698837
  8.  
  9.  
I am getting inconsistency input data from different softwares,but I have to write a generic Pyton code where I can read any input data format as mentioned in the above examples
Mar 21 '07 #16
psbasha
440 256MB
I am getting inconsistency input data from different softwares,but I have to write a generic Pyton code where I can read any input data format as mentioned in the above examples
Could any body help me in resolving this issue of handling the generic format data.

Thanks in advance
PSB
Mar 21 '07 #17
bvdet
2,851 Expert Mod 2GB
Expand|Select|Wrap|Line Numbers
  1. Sample1.txt
  2.  
  3. Sample.txt
  4. Pnt      100100123.      0.      0.
  5. Pnt      200200035.      0.      0.
  6. Pnt      3040000010.     0.      0.
  7. Pnt      4000000015.     0.      0.
  8. Pnt      5005000020.     0.      0.
  9. Pnt      600008000.      5.      0.
  10. Pnt      700000005.      5.      0.
  11. Pnt      8000900010.     5.      0.
  12. Pnt      9000900015.     5.      0.     
  13.  
  14.  
Expand|Select|Wrap|Line Numbers
  1. Sample2.txt
  2. Pnt    *         3280311       0          1.36567432E+03 -3.71226532E+02
  3. *         2.01031464E+02       0
  4. Pnt     *         3280502       0          1.25433850E+03 -1.42613068E+02
  5. *         1.80202667E+02       0
  6. Pnt     *         3280503       0          1.27057288E+03 -1.75843582E+02
  7. *         1.84236084E+02       0
  8. Pnt    *         3280504       0          1.28286145E+03 -2.01004501E+02
  9. *         1.87218460E+02       0
  10.  
Expand|Select|Wrap|Line Numbers
  1. Sample3.txt
  2. Pnt*     10260209                       1156.26599      313.992828
  3. *       155.018463
  4. Pnt*     10270106                       1097.15002      250.676315
  5. *       140.789337
  6. Pnt*     10270107                       1115.47864      271.83374
  7. *       144.698837
  8.  
  9.  
I think we have taken care of Sample1, have we not? Can you explain Sample2 and Sample3 format? Is the point data really on two separate lines? What is the significance of the asterisk? Why are there zeros mixed in with the numbers in scientific notation? Help us help you.
Mar 22 '07 #18
psbasha
440 256MB
I think we have taken care of Sample1, have we not? Can you explain Sample2 and Sample3 format? Is the point data really on two separate lines? What is the significance of the asterisk? Why are there zeros mixed in with the numbers in scientific notation? Help us help you.
a) "I think we have taken care of Sample1, have we not?"

Yes

b) "Can you explain Sample2 and Sample3 format?"

This format is some what different with Sample-1

The X,Y,Z co-ordinates are not written in a single line.They are splitted into two lines.Each String/Number is of 16-Field data
The maximum length of the line is ( 79)
c)Is the point data really on two separate lines?
Yes
d)What is the significance of the asterisk?
The "*" in the second line may be used as continuation of the fields

e) Why are there zeros mixed in with the numbers in scientific notation?
Pnt * 3280504 0 1.28286145E+03 -2.01004501E+02
* 1.87218460E+02 0
Currently I dont need of this zero's.It is also one of the ID which may be refering to some number later

Thanks in advacne
PSB
Mar 22 '07 #19
bvdet
2,851 Expert Mod 2GB
a) "I think we have taken care of Sample1, have we not?"

Yes

b) "Can you explain Sample2 and Sample3 format?"

This format is some what different with Sample-1

The X,Y,Z co-ordinates are not written in a single line.They are splitted into two lines.Each String/Number is of 16-Field data
The maximum length of the line is ( 79)
c)Is the point data really on two separate lines?
Yes
d)What is the significance of the asterisk?
The "*" in the second line may be used as continuation of the fields

e) Why are there zeros mixed in with the numbers in scientific notation?
Pnt * 3280504 0 1.28286145E+03 -2.01004501E+02
* 1.87218460E+02 0
Currently I dont need of this zero's.It is also one of the ID which may be refering to some number later

Thanks in advacne
PSB
Here's one way of adding the data in this format to your point dictionary:
Expand|Select|Wrap|Line Numbers
  1. >>> patt = re.compile(r'''\d+\.\d+E\+\d+|
  2. ... \d+\.\d+E\+\d+|
  3. ... -\d+\.\d+E\+\d+|
  4. ... -\d+\.\d+E-\d+|
  5. ... \d+\.\d+E-\d+|
  6. ... \d+\.\d+|
  7. ... -\d+\.\d+|
  8. ... \d+''', re.X
  9. ... )
  10. >>> patt
  11. <_sre.SRE_Pattern object at 0x00DE68D0>
  12. >>> s = 'Pnt    *         3280311       0          +1.36567432E+03 -3.71226532E+02'
  13. >>> re.findall(patt,s)
  14. ['3280311', '0', '1.36567432E+03', '-3.71226532E+02']
  15. >>> dd = {}
  16. >>> lst = re.findall(patt,s)
  17. >>> dd[int(lst[0])] = [float(i) for i in lst[1:] if i != '0']
  18. >>> dd
  19. {3280311: [1365.6743200000001, -371.22653200000002]}
  20. >>> s1 = '*       155.018463'
  21. >>> lst1 = re.findall(patt,s)
  22. >>> dd[int(lst[0])] = dd[int(lst[0])]+[float(i) for i in lst1 if i != '0']
  23. >>> dd
  24. {3280311: [1365.6743200000001, -371.22653200000002, 155.018463]}
  25. >>>
You can add an elif for the word 'pnt' in combination with '*'. Whoever designed the output for this data ought to be ..............
Mar 22 '07 #20
psbasha
440 256MB
Hi BV,

Is there any other simple approach available?.It looks like we have to do the formating of the values for readiing it.

Thanks
PSB
Mar 23 '07 #21
bvdet
2,851 Expert Mod 2GB
Hi BV,

Is there any other simple approach available?.It looks like we have to do the formating of the values for readiing it.

Thanks
PSB
The code I showed you works. I guess you could do splits, strips, slices. etc., but I don't think it would be simpler. After incorporating that code into the other code I showed you, you should get output like this:
Expand|Select|Wrap|Line Numbers
  1. >>> Point dictionary:
  2. 30400000 = [10.0, 0.0, 0.0]
  3. 40000000 = [15.0, 0.0, 0.0]
  4. 2 = [2, 5.0, 0.0, 0.0]
  5. 3 = [3, 10.0, 0.0, 0.0]
  6. 4 = [4, 15.0, 0.0, 0.0]
  7. 5 = [5, 20.0, 0.0, 0.0]
  8. 6 = [6, 0.0, 5.0, 0.0]
  9. 1 = [1, 0.0, 0.0, 0.0]
  10. 8 = [8, 10.0, 5.0, 0.0]
  11. 9 = [9, 15.0, 5.0, 0.0]
  12. 10270106 = [1097.15002, 250.67631499999999, 140.78933699999999]
  13. 10270107 = [1115.47864, 271.83373999999998, 144.698837]
  14. 10010012 = [3.0, 0.0, 0.0]
  15. 60000800 = [0.0, 5.0, 0.0]
  16. 20020003 = [5.0, 0.0, 0.0]
  17. 10260209 = [1156.2659900000001, 313.99282799999997, 155.018463]
  18. 80009000 = [10.0, 5.0, 0.0]
  19. 7 = [7, 5.0, 5.0, 0.0]
  20. 3280311 = [1365.6743200000001, -371.22653200000002, 201.031464]
  21. 50050000 = [20.0, 0.0, 0.0]
  22. 90009000 = [15.0, 5.0, 0.0]
  23. 70000000 = [5.0, 5.0, 0.0]
  24. 3280502 = [1254.3385000000001, -142.613068, 180.20266699999999]
  25. 3280503 = [1270.5728799999999, -175.843582, 184.23608400000001]
  26. 3280504 = [1282.8614500000001, -201.004501, 187.21845999999999]
  27.  
  28. Wire dictionary:
  29. 10000000 = [10000000, 20000000, 70000000]
  30. 20000000 = [20000000, 30000000, 80000000]
  31. 30000000 = [30000000, 40000000, 90000000]
  32. 10000071 = [10000101, 20000022, 70000000, 60000055]
  33. 40000000 = [40000000, 50000000]
  34. 30000088 = [30000208, 40000002, 90005000, 80003000]
  35. 20000092 = [20000105, 30000004, 80004000, 71111167]
  36. 40000094 = [40000304, 50000071, 90000600]
from data like this:
Expand|Select|Wrap|Line Numbers
  1. Rect    1000007110000101200000227000000060000055
  2. Rect    2000009220000105300000048000400071111167
  3. Rect    3000008830000208400000029000500080003000
  4. Tria     40000094400003045000007190000600
  5. Pnt      100100123.      0.      0.
  6. Pnt      200200035.      0.      0.
  7. Pnt      3040000010.     0.      0.
  8. Pnt      4000000015.     0.      0.
  9. Pnt      5005000020.     0.      0.
  10. Pnt      600008000.      5.      0.
  11. Pnt      700000005.      5.      0.
  12. Pnt      8000900010.     5.      0.
  13. Pnt      9000900015.     5.      0.
  14. Pnt      100100123.      0.      0.
  15. Pnt      200200035.      0.      0.
  16. Pnt      3040000010.     0.      0.
  17. Pnt      4000000015.     0.      0.
  18. Pnt      5005000020.     0.      0.
  19. Pnt      600008000.      5.      0.
  20. Pnt      700000005.      5.      0.
  21. Pnt      8000900010.     5.      0.
  22. Pnt      9000900015.     5.      0.
  23. Rect    100000001000000020000000700000006
  24. Rect    200000002000000030000000800000007
  25. Rect    300000003000000040000000900000008
  26. Tria    4000000040000000500000009
  27. Pnt     1       0.      0.      0.
  28. Pnt     2       5.      0.      0.
  29. Pnt     3       10.     0.      0.
  30. Pnt     4       15.     0.      0.
  31. Pnt     5       20.     0.      0.
  32. Pnt     6       0.      5.      0.
  33. Pnt     7       5.      5.      0.
  34. Pnt     8       10.     5.      0.
  35. Pnt     9       15.     5.      0.
  36.  
  37.  
  38.  
  39.  
  40. Pnt    *         3280311       0          1.36567432E+03 -3.71226532E+02
  41. *         2.01031464E+02       0
  42. Pnt     *         3280502       0          1.25433850E+03 -1.42613068E+02
  43. *         1.80202667E+02       0
  44. Pnt     *         3280503       0          1.27057288E+03 -1.75843582E+02
  45. *         1.84236084E+02       0
  46. Pnt    *         3280504       0          1.28286145E+03 -2.01004501E+02
  47. *         1.87218460E+02       0
  48.  
  49. Pnt*     10260209                       1156.26599      313.992828
  50. *       155.018463
  51. Pnt*     10270106                       1097.15002      250.676315
  52. *       140.789337
  53. Pnt*     10270107                       1115.47864      271.83374
  54. *       144.698837
  55.  
  56.  
The data files were not formatted is the best manner for reading.
Mar 23 '07 #22
bvdet
2,851 Expert Mod 2GB
Maybe this will be easier to follow:
Expand|Select|Wrap|Line Numbers
  1. def read_file_data(f):
  2.     ptDict = {}
  3.     wireDict = {}
  4.     fList = open(f).readlines()
  5.  
  6.     in_pnt = False
  7.     patt = re.compile(r'''\d+\.\d+E\+\d+|           # engineering notation ++
  8.                           -\d+\.\d+E\+\d+|          # engineering notation -+
  9.                           -\d+\.\d+E-\d+|           # engineering notation --
  10.                           \d+\.\d+E-\d+|            # engineering notation +-
  11.                           \d+\.\d+|                 # positive float format
  12.                           -\d+\.\d+|                # negative float format
  13.                           \d+                       # positive integer
  14.                           ''', re.X
  15.                       )
  16.  
  17.     for line in fList:
  18.         lineList = [x.lower().strip() for x in line.strip().split(' ', 1) if x != '']
Mar 24 '07 #23
psbasha
440 256MB
Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2. $$$$$
  3. START
  4. COLOR RED
  5. LINETYPE SOLID
  6. END
  7. $$$$$$$
  8. PLine    1        6      1.5     9.375   .001    .001
  9. $ Line Details
  10. Line*    1               1                1              2
  11. *        .002952         .992547         .121827
  12. $
  13. Rect     2        1       2       3       7       6
  14. Rect     3        1       3       4       8       7
  15. PRect*   4               11              15              16
  16. *        10              11              0.3
  17. Rect*    4               1               5               6
  18. *        10              11              0.
  19. Othr*    1               1               5               6
  20. *        10              11              0.              0.
  21. *        10              11              0.              1.0
  22. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  23. Tria     5        1       7       2       11
  24. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  25. Point    1               0.0     0.0     0.0
  26. Point    2               1.0     0.0     0.0
  27. Point    3               2.0     0.0     0.0
  28. Point    4               3.0     0.0     0.0
  29. Point    5               0.0     1.0     0.0
  30. Point    6               1.0     1.0     0.0
  31. Point    7               2.0     1.0     0.0
  32. Point    8               4.0     1.0     0.0
  33. Point*   9                              0.0             2.0
  34. *          0.0
  35. Point  *3280504         0               1.28286145E+03  1.28286145E+03
  36. *       -2.01004501E+02
  37. $
  38. END
  39.  
  40.  
  41.  


Expand|Select|Wrap|Line Numbers
  1. Sample.py
  2.  
  3.  
  4. def read_file_data(strFile):
  5.     f = open(strFile,'r')
  6.     pointID = 0
  7.     curvetID = 0
  8.     pointIDDict = {}        
  9.     pointList = []             
  10.     coordList = []
  11.     attrdict=[]
  12.  
  13.     curveIDDict = {}
  14.     curveOneDimIDDict = {}
  15.     curveTwoDimIDPointIDDict = {}        
  16.     largeFieldFlag = False
  17.  
  18.     curveCardLargeFieldFlag = False
  19.     bTriaFlag = False
  20.     bRectFlag = False
  21.     bOnlyPointCoord = True
  22.     b1DCurveFlag = False
  23.     propDict={}
  24.  
  25.     strTemp = f.readlines()
  26.     for line in strTemp:
  27.         if(line.startswith('Point') or line.startswith('Point*') or line.startswith('Point  *') or line.startswith('*') and bOnlyPointCoord):
  28.  
  29.             if (line.startswith('Point') and (line[:8].strip().isalpha())):
  30.                 pointID = int(line[8:16])                        
  31.                 coordList.append((float(line[24:32])))
  32.                 coordList.append((float(line[32:40])))                
  33.                 coordList.append((float(line[40:48])))
  34.                 largeFieldFlag = False
  35.             elif (line.startswith('Point*') or line.startswith('Point  *')):
  36.                 pointID = int(line[8:24])                        
  37.                 coordList.append((float(line[40:56])))
  38.                 coordList.append((float(line[56:72])))
  39.                 largeFieldFlag = True
  40.                 bOnlyPointCoord = True
  41.             elif (line.startswith('*') and largeFieldFlag):                  
  42.                 coordList.append((float(line[8:24])))
  43.                 largeFieldFlag = False                    
  44.             if ( pointID and largeFieldFlag == False):
  45.                 pointIDDict[pointID]=coordList                    
  46.                 pointID =0   
  47.                 coordList = []
  48.  
  49.             bOnlyPointCoord = True
  50.         elif (line.startswith('Rect') or line.startswith('Tria') or  \
  51.               line.startswith('Line') and line[:8].strip().isalpha() or \
  52.               line.startswith('Rect*') or line.startswith('Tria*') or\
  53.               line.startswith('Line*')or line.startswith('*')):
  54.  
  55.             if (line.startswith('Rect  ') or \
  56.                 line.startswith('Line') and line[:8].strip().isalpha() ):
  57.  
  58.                 curvetID = int(line[8:16])                        
  59.                 pointList.append((int(line[24:32])))
  60.                 pointList.append((int(line[32:40])))
  61.                 b1DCurveFlag = True
  62.  
  63.                 if (line[:4]=='Tria'or line[:4]=='Rect'):                        
  64.                     pointList.append((int(line[40:48])))
  65.                     b1DCurveFlag = False
  66.  
  67.                     if (line[:4]=='Rect' ):
  68.                         pointList.append((int(line[48:56])))
  69.  
  70.                 curveCardLargeFieldFlag = False
  71.  
  72.             elif   (line.startswith('Rect*') or line.startswith('Tria*') or \
  73.                     line.startswith('Line*')):
  74.                 curvetID = int(line[8:24])                        
  75.                 pointList.append((int(line[40:56])))
  76.                 pointList.append((int(line[56:72])))
  77.                 curveCardLargeFieldFlag = True
  78.                 bOnlyPointCoord = False
  79.                 b1DCurveFlag = True
  80.                 if line.startswith('Rect*') :
  81.                     bRectFlag = True
  82.                     bTriaFlag = False
  83.                 elif line.startswith('Tria*'):
  84.                     bTriaFlag = True
  85.                     bRectFlag = False                        
  86.  
  87.             elif line.startswith('*') and curveCardLargeFieldFlag:                    
  88.                 if (bTriaFlag or bRectFlag):
  89.                     pointList.append((int(line[8:24])))
  90.                     b1DCurveFlag = False                                
  91.                     if bRectFlag:
  92.                         pointList.append((int(line[24:40])))
  93.  
  94.                 bTriaFlag = False
  95.                 bRectFlag = False
  96.  
  97.                 curveCardLargeFieldFlag = False
  98.  
  99.             if ( curvetID and curveCardLargeFieldFlag == False):                    
  100.                 # Map ElementID and Node ID's of that element
  101.                 curveIDDict[curvetID]=pointList
  102.                 if b1DCurveFlag:
  103.                     curveOneDimIDDict[curvetID]= pointList
  104.                     b1DCurveFlag = False
  105.                 else:
  106.                     curveTwoDimIDPointIDDict[curvetID]= pointList
  107.                     b1DCurveFlag = False                    
  108.  
  109.                 curveCardLargeFieldFlag = False
  110.                 bOnlyPointCoord = False
  111.                 curvetID = 0                    
  112.                 pointList = []          
  113.  
  114.     f.close()
  115.  
  116.     #Node
  117.     #For all Nodes
  118.     print pointIDDict
  119.  
  120.     print curveIDDict
  121.  
  122.     print  curveOneDimIDDict
  123.  
  124.     print curveTwoDimIDPointIDDict  
  125.  
  126.  
  127. if __name__ == '__main__':
  128.     read_file_data("C:\\ReadFile\\SampleData.txt")
  129.  
  130.  
Above is the sample text file ,and the sample code for the above file reading.I would like to avoid using the flags and so many variables to define.Is it possible to use regular expression and reduce the piece of code

Thanks
PSB
Dec 25 '07 #24
psbasha
440 256MB
Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2. $$$$$
  3. START
  4. COLOR RED
  5. LINETYPE SOLID
  6. END
  7. $$$$$$$
  8. PLine    1        6      1.5     9.375   .001    .001
  9. $ Line Details
  10. Line*    1               1                1              2
  11. *        .002952         .992547         .121827
  12. $
  13. Rect     2        1       2       3       7       6
  14. Rect     3        1       3       4       8       7
  15. PRect*   4               11              15              16
  16. *        10              11              0.3
  17. Rect*    4               1               5               6
  18. *        10              11              0.
  19. Othr*    1               1               5               6
  20. *        10              11              0.              0.
  21. *        10              11              0.              1.0
  22. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  23. Tria     5        1       7       2       11
  24. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  25. Point    1               0.0     0.0     0.0
  26. Point    2               1.0     0.0     0.0
  27. Point    3               2.0     0.0     0.0
  28. Point    4               3.0     0.0     0.0
  29. Point    5               0.0     1.0     0.0
  30. Point    6               1.0     1.0     0.0
  31. Point    7               2.0     1.0     0.0
  32. Point    8               4.0     1.0     0.0
  33. Point*   9                              0.0             2.0
  34. *          0.0
  35. Point  *3280504         0               1.28286145E+03  1.28286145E+03
  36. *       -2.01004501E+02
  37. $
  38. END
  39.  
  40.  
  41.  


Expand|Select|Wrap|Line Numbers
  1. Sample.py
  2.  
  3.  
  4. def read_file_data(strFile):
  5.     f = open(strFile,'r')
  6.     pointID = 0
  7.     curvetID = 0
  8.     pointIDDict = {}        
  9.     pointList = []             
  10.     coordList = []
  11.     attrdict=[]
  12.  
  13.     curveIDDict = {}
  14.     curveOneDimIDDict = {}
  15.     curveTwoDimIDPointIDDict = {}        
  16.     largeFieldFlag = False
  17.  
  18.     curveCardLargeFieldFlag = False
  19.     bTriaFlag = False
  20.     bRectFlag = False
  21.     bOnlyPointCoord = True
  22.     b1DCurveFlag = False
  23.     propDict={}
  24.  
  25.     strTemp = f.readlines()
  26.     for line in strTemp:
  27.         if(line.startswith('Point') or line.startswith('Point*') or line.startswith('Point  *') or line.startswith('*') and bOnlyPointCoord):
  28.  
  29.             if (line.startswith('Point') and (line[:8].strip().isalpha())):
  30.                 pointID = int(line[8:16])                        
  31.                 coordList.append((float(line[24:32])))
  32.                 coordList.append((float(line[32:40])))                
  33.                 coordList.append((float(line[40:48])))
  34.                 largeFieldFlag = False
  35.             elif (line.startswith('Point*') or line.startswith('Point  *')):
  36.                 pointID = int(line[8:24])                        
  37.                 coordList.append((float(line[40:56])))
  38.                 coordList.append((float(line[56:72])))
  39.                 largeFieldFlag = True
  40.                 bOnlyPointCoord = True
  41.             elif (line.startswith('*') and largeFieldFlag):                  
  42.                 coordList.append((float(line[8:24])))
  43.                 largeFieldFlag = False                    
  44.             if ( pointID and largeFieldFlag == False):
  45.                 pointIDDict[pointID]=coordList                    
  46.                 pointID =0   
  47.                 coordList = []
  48.  
  49.             bOnlyPointCoord = True
  50.         elif (line.startswith('Rect') or line.startswith('Tria') or  \
  51.               line.startswith('Line') and line[:8].strip().isalpha() or \
  52.               line.startswith('Rect*') or line.startswith('Tria*') or\
  53.               line.startswith('Line*')or line.startswith('*')):
  54.  
  55.             if (line.startswith('Rect  ') or \
  56.                 line.startswith('Line') and line[:8].strip().isalpha() ):
  57.  
  58.                 curvetID = int(line[8:16])                        
  59.                 pointList.append((int(line[24:32])))
  60.                 pointList.append((int(line[32:40])))
  61.                 b1DCurveFlag = True
  62.  
  63.                 if (line[:4]=='Tria'or line[:4]=='Rect'):                        
  64.                     pointList.append((int(line[40:48])))
  65.                     b1DCurveFlag = False
  66.  
  67.                     if (line[:4]=='Rect' ):
  68.                         pointList.append((int(line[48:56])))
  69.  
  70.                 curveCardLargeFieldFlag = False
  71.  
  72.             elif   (line.startswith('Rect*') or line.startswith('Tria*') or \
  73.                     line.startswith('Line*')):
  74.                 curvetID = int(line[8:24])                        
  75.                 pointList.append((int(line[40:56])))
  76.                 pointList.append((int(line[56:72])))
  77.                 curveCardLargeFieldFlag = True
  78.                 bOnlyPointCoord = False
  79.                 b1DCurveFlag = True
  80.                 if line.startswith('Rect*') :
  81.                     bRectFlag = True
  82.                     bTriaFlag = False
  83.                 elif line.startswith('Tria*'):
  84.                     bTriaFlag = True
  85.                     bRectFlag = False                        
  86.  
  87.             elif line.startswith('*') and curveCardLargeFieldFlag:                    
  88.                 if (bTriaFlag or bRectFlag):
  89.                     pointList.append((int(line[8:24])))
  90.                     b1DCurveFlag = False                                
  91.                     if bRectFlag:
  92.                         pointList.append((int(line[24:40])))
  93.  
  94.                 bTriaFlag = False
  95.                 bRectFlag = False
  96.  
  97.                 curveCardLargeFieldFlag = False
  98.  
  99.             if ( curvetID and curveCardLargeFieldFlag == False):                    
  100.                 # Map ElementID and Node ID's of that element
  101.                 curveIDDict[curvetID]=pointList
  102.                 if b1DCurveFlag:
  103.                     curveOneDimIDDict[curvetID]= pointList
  104.                     b1DCurveFlag = False
  105.                 else:
  106.                     curveTwoDimIDPointIDDict[curvetID]= pointList
  107.                     b1DCurveFlag = False                    
  108.  
  109.                 curveCardLargeFieldFlag = False
  110.                 bOnlyPointCoord = False
  111.                 curvetID = 0                    
  112.                 pointList = []          
  113.  
  114.     f.close()
  115.  
  116.     #Node
  117.     #For all Nodes
  118.     print pointIDDict
  119.  
  120.     print curveIDDict
  121.  
  122.     print  curveOneDimIDDict
  123.  
  124.     print curveTwoDimIDPointIDDict  
  125.  
  126.  
  127. if __name__ == '__main__':
  128.     read_file_data("C:\\Shakil\\ReadFile\\SampleData.txt")
  129.  
  130.  
Above is the sample text file ,and the sample code for the above file reading.I would like to avoid using the flags and so many variables to define.Is it possible to use regular expression and reduce the piece of code

Thanks
PSB
In some scenarios I have to read following data in the file

PLine 1 6 1.5 9.375 .001 .001
PRect* 4 11 15 16
* 10 11 0.3
Othr* 1 1 5 6
* 10 11 0. 0.
* 10 11 0. 1.0

In Some scenarios the Point data will be defined as below

Point *3280505 0 1.28286145+03 1.28286145-03
* -2.01004501+02

1.28286145+03 is same as 1.28286145E+03
1.28286145-03 is same as 1.28286145E-03

How to handle the above scenarios while reading the file

Thanks
PSB
Dec 25 '07 #25
psbasha
440 256MB
PLine 1 6 1.5 9.375 .001 .001
PRect* 4 11 15 16
* 10 11 0.3
Othr* 1 1 5 6
* 10 11 0. 0.
* 10 11 0. 1.0

I have not written a code for the above Card lines to store the properties of the curves.

In some cases the Point coordinates are represented as shown below

Point *3280505 0 1.28286145+03 1.28286145-03
* -2.01004501+02

1.28286145+03 is same as 1.28286145E+03
1.28286145-03 is same as 1.28286145E-03

Is anybody suggest me ,how to store and print the data?

Thanks
PSB
Dec 25 '07 #26
psbasha
440 256MB
PLine 1 6 1.5 9.375 .001 .001
PRect* 4 11 15 16
* 10 11 0.3
Othr* 1 1 5 6
* 10 11 0. 0.
* 10 11 0. 1.0

I have not written a code for the above Card lines to store the properties of the curves.

In some cases the Point coordinates are represented as shown below

Point *3280505 0 1.28286145+03 1.28286145-03
* -2.01004501+02

1.28286145+03 is same as 1.28286145E+03
1.28286145-03 is same as 1.28286145E-03

Is anybody suggest me ,how to store and print the data?

Thanks
PSB
Any suggestions to the above queries ?
Dec 25 '07 #27
psbasha
440 256MB
Hi BV,

Any suggestions on the above code.

Thanks
PSB
Dec 27 '07 #28
bvdet
2,851 Expert Mod 2GB
Try this:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. def convert_data(s):
  4.     for func in (int, float):
  5.         try:
  6.             n = func(s)
  7.             return n
  8.         except:
  9.             pass
  10.     return s
  11.  
  12. pattnum = re.compile(r'''
  13.                       \d+\.\d+E\+\d+|           # engineering notation ++
  14.                       -\d+\.\d+E\+\d+|          # engineering notation -+
  15.                       -\d+\.\d+E-\d+|           # engineering notation --
  16.                       \d+\.\d+E-\d+|            # engineering notation +-
  17.                       \d+\.\d+|                 # positive float format
  18.                       -\d+\.\d+|                # negative float format
  19.                       \d+\.|                    # positive float format
  20.                       -\d+\.|                   # negative float format
  21.                       \.\d+|                    # positive float format
  22.                       -\.\d+|                   # negative float format
  23.                       \d+                       # positive integer
  24.                       ''', re.X
  25.                   )
  26.  
  27. def parseData(fn, *kargs):
  28.     fileList = [item.strip() for item in open(fn).readlines()\
  29.                 if not item.startswith('$')]
  30.     pattkey = re.compile('|'.join([r'\b(%s)' % item for item in kargs]))
  31.     '''
  32.     print pattkey
  33.     print pattkey.pattern
  34.     '''
  35.     # create dictionary with keys from kargs
  36.     masterDict = dict(zip(kargs, [[] for _ in kargs]))
  37.     inData = False
  38.     for line in fileList:
  39.         if inData and line.startswith('*'):
  40.             data.extend(re.findall(pattnum, line))
  41.         elif inData and not line.startswith('*'):
  42.             masterDict[m.group(0)].append([convert_data(item)\
  43.                                            for item in data])
  44.             inData = False
  45.             m = pattkey.match(line)
  46.             if m:
  47.                 # m.group(0) is the current keyword
  48.                 if '*' in line.split()[0]:
  49.                     inData = True
  50.                     data = re.findall(pattnum, line)
  51.                 else:
  52.                     data = re.findall(pattnum, line)
  53.                     masterDict[m.group(0)].append([convert_data(item)\
  54.                                                    for item in data])
  55.         else:
  56.             m = pattkey.match(line)
  57.             if m:
  58.                 # m.group(0) is the current keyword
  59.                 if '*' in line.split()[0]:
  60.                     inData = True
  61.                     data = re.findall(pattnum, line)
  62.                 else:
  63.                     data = re.findall(pattnum, line)
  64.                     masterDict[m.group(0)].append([convert_data(item)\
  65.                                                    for item in data])
  66.     return masterDict
  67.  
  68. fn = 'H:\\TEMP\\temsys\\sample_points8.txt'
  69. keywords =  ['Point', 'Othr', 'Rect', 'PRect', 'PLine', 'Line', 'Tria']   
  70. dd = parseData(fn, *keywords)
  71. for key in dd:
  72.     print key
  73.     for item in dd[key]:
  74.         print '    %s' % item
  75.  
Output:
Expand|Select|Wrap|Line Numbers
  1. >>> Point
  2.     [1, 0.0, 0.0, 0.0]
  3.     [2, 1.0, 0.0, 0.0]
  4.     [3, 2.0, 0.0, 0.0]
  5.     [4, 3.0, 0.0, 0.0]
  6.     [5, 0.0, 1.0, 0.0]
  7.     [6, 1.0, 1.0, 0.0]
  8.     [7, 2.0, 1.0, 0.0]
  9.     [8, 4.0, 1.0, 0.0]
  10.     [9, 0.0, 2.0, 0.0]
  11.     [3280504, 0, 1282.8614500000001, 1282.8614500000001]
  12. PLine
  13.     [1, 6, 1.5, 9.375, 0.001, 0.001]
  14. Tria
  15.     [5, 1, 7, 2, 11]
  16. PRect
  17.     [4, 11, 15, 16, 10, 11, 0.29999999999999999]
  18. Line
  19.     [1, 1, 1, 2, 0.0029520000000000002, 0.99254699999999996, 0.121827]
  20. Rect
  21.     [2, 1, 2, 3, 7, 6]
  22.     [3, 1, 3, 4, 8, 7]
  23.     [4, 1, 5, 6, 10, 11, 0.0]
  24. Othr
  25.     [1, 1, 5, 6, 10, 11, 0.0, 0.0, 10, 11, 0.0, 1.0]
  26.  
Dec 27 '07 #29
bvdet
2,851 Expert Mod 2GB
I made a few modifications so it would work properly. It probably needs some more work, but I will leave it up to you. Let us know how it turns out.
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. def convert_data(s):
  4.     for func in (int, float):
  5.         try:
  6.             n = func(s)
  7.             return n
  8.         except:
  9.             pass
  10.     return s
  11.  
  12. pattnum = re.compile(r'''
  13.                       -\d+\.\d+E\+\d+|          # engineering notation -+
  14.                       \d+\.\d+E\+\d+|           # engineering notation ++
  15.                       -\d+\.\d+E-\d+|           # engineering notation --
  16.                       \d+\.\d+E-\d+|            # engineering notation +-
  17.                       -\d+\.\d+|                # negative float format
  18.                       \d+\.\d+|                 # positive float format
  19.                       -\d+\.|                   # negative float format
  20.                       \d+\.|                    # positive float format
  21.                       -\.\d+|                   # negative float format
  22.                       \.\d+|                    # positive float format
  23.                       \d+                       # positive integer
  24.                       ''', re.X
  25.                   )
  26.  
  27. def parseData(fn, *kargs):
  28.     fileList = [item.strip() for item in open(fn).readlines()\
  29.                 if not item.startswith('$')]
  30.     pattkey = re.compile('|'.join([r'\b(%s)' % item for item in kargs]))
  31.     '''
  32.     print pattkey
  33.     print pattkey.pattern
  34.     '''
  35.     # create dictionary with keys from kargs
  36.     masterDict = dict(zip(kargs, [[] for _ in kargs]))
  37.     inData = False
  38.     for line in fileList:
  39.         if inData and line.startswith('*'):
  40.             data.extend(re.findall(pattnum, line))
  41.         elif inData and not line.startswith('*'):
  42.             masterDict[m.group(0)].append([convert_data(item)\
  43.                                            for item in data])
  44.             inData = False
  45.             m = pattkey.match(line)
  46.             if m:
  47.                 # m.group(0) is the current keyword
  48.                 if '*' in line:
  49.                     inData = True
  50.                     data = re.findall(pattnum, line)
  51.                 else:
  52.                     data = re.findall(pattnum, line)
  53.                     masterDict[m.group(0)].append([convert_data(item)\
  54.                                                    for item in data])
  55.         else:
  56.             m = pattkey.match(line)
  57.             if m:
  58.                 # m.group(0) is the current keyword
  59.                 if '*' in line:
  60.                     inData = True
  61.                     data = re.findall(pattnum, line)
  62.                 else:
  63.                     data = re.findall(pattnum, line)
  64.                     masterDict[m.group(0)].append([convert_data(item)\
  65.                                                    for item in data])
  66.     return masterDict
  67.  
  68. fn = 'sample.txt'
  69. keywords =  ['Point', 'Othr', 'Rect', 'PRect', 'PLine', 'Line', 'Tria']   
  70. dd = parseData(fn, *keywords)
  71. for key in dd:
  72.     print key
  73.     for item in dd[key]:
  74.         print '    %s' % item
Dec 28 '07 #30
psbasha
440 256MB
[Thanks BV..You are really great..you are too good in regular expressions and file parsing

If the file contains the Point data as shown below
Expand|Select|Wrap|Line Numbers
  1. Sample
  2. Point  *3280505         0               1.28286145-03  1.28286145E+03
  3. *       -2.01004501+02
  4.  
The output should be
[3280505, 0, 0.00128286145, 1282.8614500000001, -201.00450099999998]

But we are getting the output as
[3280505, 0, 1.28286145, 3, 1282.8614500000001, -2.0100450099999998, 2]

How to fix the above exponent data?.

-PSB
Dec 28 '07 #31
bvdet
2,851 Expert Mod 2GB
[Thanks BV..You are really great..you are too good in regular expressions and file parsing

If the file contains the Point data as shown below
Expand|Select|Wrap|Line Numbers
  1. Sample
  2. Point  *3280505         0               1.28286145-03  1.28286145E+03
  3. *       -2.01004501+02
  4.  
The output should be
[3280505, 0, 0.00128286145, 1282.8614500000001, -201.00450099999998]

But we are getting the output as
[3280505, 0, 1.28286145, 3, 1282.8614500000001, -2.0100450099999998, 2]

How to fix the above exponent data?.

-PSB
You are welcome. :)

Your data is invalid, because there is no 'E' indicating exponential notation. You will need to correct the data before processing it so it can be converted to a floating point number. This pattern matches the invalid data:
Expand|Select|Wrap|Line Numbers
  1. pattinvalid = re.compile(r'''
  2.                           \d+\.\d+\+\d+|           # invalid eng notation +
  3.                           \d+\.\d+-\d+             # invalid eng notation -
  4.                           ''', re.X
  5.                          )              
This code corrects the data:
Expand|Select|Wrap|Line Numbers
  1. ........if pattinvalid.search(line):
  2.             for item in pattinvalid.findall(line):
  3.                 line = line.replace(item, item.replace('-', 'E-').replace('+', 'E+'))
Dec 28 '07 #32
psbasha
440 256MB
Hi BV,

Could you please help me in understanding the below piece of code in simpler way.

fileList = [item.strip() for item in open(fn).readlines()\
if not item.startswith('$')]

I mean after 'for' loop and 'if' condition we are not using the ':' for the block begin.
How is it different from ordinary 'for' and 'if' with ':' ussage?.Whether both are same or to reduce the lines of code and better readability of code we will follow the above approach.What is the above approach of writing is called in Phython?

Can you provide the links for learning the above concepts.

I was trying to implement the invalid data format code in the main code provided by you,but I am not able to succeeded in it.If I understand the above concept I hope I can implement the invalid logic very easily
Thanks
PSB
Dec 29 '07 #33
The best way i could find out for you.You go the thru the link .Hope this will be helpful .

BestFileReadingMethod
Dec 29 '07 #34
bvdet
2,851 Expert Mod 2GB
Hi BV,

Could you please help me in understanding the below piece of code in simpler way.

fileList = [item.strip() for item in open(fn).readlines()\
if not item.startswith('$')]

I mean after 'for' loop and 'if' condition we are not using the ':' for the block begin.
How is it different from ordinary 'for' and 'if' with ':' ussage?.Whether both are same or to reduce the lines of code and better readability of code we will follow the above approach.What is the above approach of writing is called in Python?

Can you provide the links for learning the above concepts.

I was trying to implement the invalid data format code in the main code provided by you,but I am not able to succeeded in it.If I understand the above concept I hope I can implement the invalid logic very easily
Thanks
PSB
The code assigned to fileList creates a list as the variable name implies and is called a list comprehension. This list comprehension is equivalent to:
Expand|Select|Wrap|Line Numbers
  1. f = open(fn)
  2.     fileList = []
  3.     for line in f:
  4.         if not line.startswith('$'):
  5.             fileList.append(line.strip())
  6.     f.close()
To read more about list comprehensions - LINK
For more links, do a web search on 'list comprehension python'.

The full source code for parsing your sample data file:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. def convert_data(s):
  4.     for func in (int, float):
  5.         try:
  6.             n = func(s)
  7.             return n
  8.         except:
  9.             pass
  10.     return s
  11.  
  12. pattnum = re.compile(r'''
  13.                       -\d+\.\d+E\+\d+|          # engineering notation -+
  14.                       \d+\.\d+E\+\d+|           # engineering notation ++
  15.                       -\d+\.\d+E-\d+|           # engineering notation --
  16.                       \d+\.\d+E-\d+|            # engineering notation +-
  17.                       -\d+\.\d+|                # negative float format
  18.                       \d+\.\d+|                 # positive float format
  19.                       -\d+\.|                   # negative float format
  20.                       \d+\.|                    # positive float format
  21.                       -\.\d+|                   # negative float format
  22.                       \.\d+|                    # positive float format
  23.                       \d+                       # positive integer
  24.                       ''', re.X
  25.                      )
  26.  
  27. pattinvalid = re.compile(r'''
  28.                           \d+\.\d+\+\d+|           # invalid eng notation +
  29.                           \d+\.\d+-\d+             # invalid eng notation -
  30.                           ''', re.X
  31.                          )                          
  32.  
  33. def parseData(fn, *kargs):
  34.     fileList = [item.strip() for item in open(fn).readlines()\
  35.                 if not item.startswith('$')]
  36.  
  37.     pattkey = re.compile('|'.join([r'\b(%s)' % item for item in kargs]))
  38.  
  39.     # create dictionary with keys from kargs
  40.     masterDict = dict(zip(kargs, [[] for _ in kargs]))
  41.     inData = False
  42.     for line in fileList:
  43.  
  44.         # check for invalid data
  45.         if pattinvalid.search(line):
  46.             for item in pattinvalid.findall(line):
  47.                 line = line.replace(item, item.replace('-', 'E-').replace('+', 'E+'))
  48.  
  49.         if inData and line.startswith('*'):
  50.             data.extend(re.findall(pattnum, line))
  51.         elif inData and not line.startswith('*'):
  52.             masterDict[m.group(0)].append([convert_data(item)\
  53.                                            for item in data])
  54.             inData = False
  55.             m = pattkey.match(line)
  56.             if m:
  57.                 # m.group(0) is the current keyword
  58.                 if '*' in line:
  59.                     inData = True
  60.                     data = re.findall(pattnum, line)
  61.                 else:
  62.                     data = re.findall(pattnum, line)
  63.                     masterDict[m.group(0)].append([convert_data(item)\
  64.                                                    for item in data])
  65.         else:
  66.             m = pattkey.match(line)
  67.             if m:
  68.                 # m.group(0) is the current keyword
  69.                 if '*' in line:
  70.                     inData = True
  71.                     data = re.findall(pattnum, line)
  72.                 else:
  73.                     data = re.findall(pattnum, line)
  74.                     masterDict[m.group(0)].append([convert_data(item)\
  75.                                                    for item in data])
  76.     return masterDict
  77.  
  78. if __name__ == '__main__':
  79.     fn = 'sample_points.txt'
  80.     keywords =  ['Point', 'Othr', 'Rect', 'PRect', 'PLine', 'Line', 'Tria']   
  81.     dd = parseData(fn, *keywords)
  82.     for key in dd:
  83.         print key
  84.         for item in dd[key]:
  85.             print '    %s' % item
  86.  
  87. ''' Output
  88. >>> Point
  89.     [1, 0.0, 0.0, 0.0]
  90.     [2, 1.0, 0.0, 0.0]
  91.     [3, 2.0, 0.0, 0.0]
  92.     [4, -3.0, 0.0, 0.0]
  93.     [5, 0.0, 1.0, 0.0]
  94.     [6, 1.0, 1.0, 0.0]
  95.     [7, 2.0, 1.0, 0.0]
  96.     [8, 4.0, 1.0, 0.0]
  97.     [9, 0.0, -2.0, 0.0]
  98.     [3280504, 0, 1282.8614500000001, 1282.8614500000001, -201.004501]
  99.     [3280606, 0, 0.0069264000650000003, -1282.8614500000001, -10100.4501, -0.014385767359999999]
  100. PLine
  101.     [1, 6, 1.5, 9.375, 0.001, -0.001]
  102. Tria
  103.     [5, 1, 7, 2, 11]
  104. PRect
  105.     [4, 11, 15, 16, 10, 11, 0.29999999999999999]
  106. Line
  107.     [1, 1, 1, 2, 0.0029520000000000002, 0.99254699999999996, 0.121827]
  108. Rect
  109.     [2, 1, 2, 3, 7, 6]
  110.     [3, 1, 3, 4, 8, 7]
  111.     [4, 1, 5, 6, 10, 11, 0.0]
  112. Othr
  113.     [1, 1, 5, 6, 10, 11, 0.0, 0.0, 10, 11, 0.0, 1.0]
  114. >>> 
  115. '''
  116.  
  117. ''' Data File Contents
  118. $$$$$
  119. START
  120. COLOR RED
  121. LINETYPE SOLID
  122. END
  123. $$$$$$$
  124. PLine    1        6      1.5     9.375   .001   -.001
  125. $ Line Details
  126. Line*    1               1                1              2
  127. *        .002952         .992547         .121827
  128. $
  129. Rect     2        1       2       3       7       6
  130. Rect     3        1       3       4       8       7
  131. PRect*   4               11              15              16
  132. *        10              11              0.3
  133. Rect*    4               1               5               6
  134. *        10              11              0.
  135. Othr*    1               1               5               6
  136. *        10              11              0.              0.
  137. *        10              11              0.              1.0
  138. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  139. Tria     5        1       7       2       11
  140. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  141. Point    1               0.0     0.0     0.0
  142. Point    2               1.0     0.0     0.0
  143. Point    3               2.0     0.0     0.0
  144. Point    4              -3.0     0.0     0.0
  145. Point    5               0.0     1.0     0.0
  146. Point    6               1.0     1.0     0.0
  147. Point    7               2.0     1.0     0.0
  148. Point    8               4.0     1.0     0.0
  149. Point*   9                               0.0            -2.0
  150. *          0.0
  151. Point  *3280504         0               1.28286145E+03  1.28286145+03
  152. *       -2.01004501E+02
  153. #
  154. Point  *3280606         0               6.926400065-03  -1.28286145+03
  155. *       -1.01004501+04  -1.438576736-02
  156. $
  157. END
  158. '''
You should test this on real data for valid results. I cannot guarantee that this is a final solution for you.
Dec 29 '07 #35
psbasha
440 256MB
Expand|Select|Wrap|Line Numbers
  1. SampleFile
  2. $$$$$
  3. START
  4. COLOR RED
  5. LINETYPE SOLID
  6. END
  7. $$$$$$$
  8. PLine   1        6      1.5     9.375   .001    .001
  9. $ Line Details
  10. Line*   1               1                1              2
  11. *       .002952         .992547         .121827
  12. $
  13.  
  14. Rect    2        1       2       3       7       6
  15. Rect    3        1       3       4       8       7
  16. PRect*  4               11              15              16
  17. *       10              11              0.3
  18. Rect*   4               1               5               6
  19. *       10              11              0.
  20. Othr*   1               1               5               6
  21. *       10              11              0.              0.
  22. *       10              11              0.              1.0
  23. Oth1*   1               1               5               6
  24. *       10              11              0.              0.
  25. *       10              11              0.              1.0
  26. *       10              11              0.              1.0
  27. *       10              11              0.              1.0
  28. Rect*   5               1               5               6
  29. *       10              11              0.
  30. Rect    1000000010000000200000007000000060000000
  31. Rect    2000000020000000300000008000000070000000
  32. Rect    3000000030000000400000009000000080000000
  33. Tria    40000000400000005000000090000000
  34. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$  $
  35. Tria     5        1       7       2       11
  36. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$  
  37. Point   1               0.0     0.0     0.0
  38. Point   2               1.0     0.0     0.0
  39. Point   3               2.0     0.0     0.0
  40. Point   4               3.0     0.0     0.0
  41. Point   5               0.0     1.0     0.0
  42. Point   6               1.0     1.0     0.0
  43. Point   7               2.0     1.0     0.0
  44. Point   8               4.0     1.0     0.0
  45. Point*  9                              0.0             2.0
  46. *       0.0
  47. Point  *3280504         0               1.28286145E+03  1.28286145E+03
  48. *       -2.01004501E+02
  49. Point  *3280505         0               1.28286145-03  1.28286145+03
  50. *       -2.01004501+02
  51. Point   100000000.      0.      0.
  52. Point   200000005.      0.      0.
  53. Point   3000000010.     0.      0.
  54. Point   4000000015.     0.      0.
  55. Point   5000000020.     0.      0.
  56. Point   600000000.      5.      0.
  57. Point   700000005.      5.      0.
  58. Point   8000000010.     5.      0.
  59. Point   9000000015.     5.      0.
  60. $
  61. END
  62.  
Dec 29 '07 #36
psbasha
440 256MB
In the above format ( i.e 8 Digit and 16 Digit) ,if we have complete '8' digits format in the column ,the output is shown incorrect.

Find the output below:

Expand|Select|Wrap|Line Numbers
  1. Output
  2. >>> Point
  3.     [1, 0.0, 0.0, 0.0]
  4.     [2, 1.0, 0.0, 0.0]
  5.     [3, 2.0, 0.0, 0.0]
  6.     [4, 3.0, 0.0, 0.0]
  7.     [5, 0.0, 1.0, 0.0]
  8.     [6, 1.0, 1.0, 0.0]
  9.     [7, 2.0, 1.0, 0.0]
  10.     [8, 4.0, 1.0, 0.0]
  11.     [9, 0.0, 2.0, 0.0]
  12.     [3280504, 0, 1282.8614500000001, 1282.8614500000001, -201.004501]
  13.     [3280505, 0, 0.0012828614500000001, 1282.8614500000001, -201.004501]
  14.     [100000000.0, 0.0, 0.0]
  15.     [200000005.0, 0.0, 0.0]
  16.     [3000000010.0, 0.0, 0.0]
  17.     [4000000015.0, 0.0, 0.0]
  18.     [5000000020.0, 0.0, 0.0]
  19.     [600000000.0, 5.0, 0.0]
  20.     [700000005.0, 5.0, 0.0]
  21.     [8000000010.0, 5.0, 0.0]
  22.     [9000000015.0, 5.0, 0.0]
  23. PLine
  24.     [1, 6, 1.5, 9.375, 0.001, 0.001]
  25. Tria
  26.     [40000000400000005000000090000000L]
  27.     [5, 1, 7, 2, 11]
  28. PRect
  29.     [4, 11, 15, 16, 10, 11, 0.29999999999999999]
  30. Line
  31.     [1, 1, 1, 2, 0.0029520000000000002, 0.99254699999999996, 0.121827]
  32. Rect
  33.     [2, 1, 2, 3, 7, 6]
  34.     [3, 1, 3, 4, 8, 7]
  35.     [4, 1, 5, 6, 10, 11, 0.0]
  36.     [5, 1, 5, 6, 10, 11, 0.0]
  37.     [1000000010000000200000007000000060000000L]
  38.     [2000000020000000300000008000000070000000L]
  39.     [3000000030000000400000009000000080000000L]
  40. Othr
  41.     [1, 1, 5, 6, 10, 11, 0.0, 0.0, 10, 11, 0.0, 1.0]
  42.  
Incorrect output data are
Expand|Select|Wrap|Line Numbers
  1. Incorrect
  2.  
  3.     [100000000.0, 0.0, 0.0]
  4.     [200000005.0, 0.0, 0.0]
  5.     [3000000010.0, 0.0, 0.0]
  6.     [4000000015.0, 0.0, 0.0]
  7.     [5000000020.0, 0.0, 0.0]
  8.     [600000000.0, 5.0, 0.0]
  9.     [700000005.0, 5.0, 0.0]
  10.     [8000000010.0, 5.0, 0.0]
  11.     [9000000015.0, 5.0, 0.0]
  12. Tria
  13.     [40000000400000005000000090000000L]
  14.  
  15. Rect
  16.     [1000000010000000200000007000000060000000L]
  17.     [2000000020000000300000008000000070000000L]
  18.     [3000000030000000400000009000000080000000L]
  19.  
The above incorrect output data has to be seperated by commas.How to fix the above scenario when we have complete 8 or 16 digit format field?

Thanks
PSB
Dec 29 '07 #37
psbasha
440 256MB
Shown below is the Corrected the Inputdata file into correct format

Thanks
PSB
Dec 30 '07 #38
psbasha
440 256MB
Expand|Select|Wrap|Line Numbers
  1. Correct Formated Input file
  2. $$$$$
  3. START
  4. COLOR RED
  5. LINETYPE SOLID
  6. END
  7. $$$$$$$
  8. PLine   1        6      1.5     9.375   .001    .001
  9. $ Line Details
  10. Line*   1               1                1              2
  11. *       .002952         .992547         .121827
  12. $
  13.  
  14. Rect    2        1      2       3       7       6
  15. Rect    3        1      3       4       8       7
  16. PRect*  4               11              15              16
  17. *       10              11              0.3
  18. Rect*   4               1               5               6
  19. *       10              11              0.
  20. Othr*   1               1               5               6
  21. *       10              11              0.              0.
  22. *       10              11              0.              1.0
  23. Oth1*   1               1               5               6
  24. *       10              11              0.              0.
  25. *       10              11              0.              1.0
  26. *       10              11              0.              1.0
  27. *       10              11              0.              1.0
  28. Rect*   5               1               5               6
  29. *       10              11              0.
  30. Rect    10000000    10000000200000007000000060000000
  31. Rect    20000000    20000000300000008000000070000000
  32. Rect    30000000    30000000400000009000000080000000
  33. Tria    40000000    400000005000000090000000
  34. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$  $
  35. Tria    5        1      7       2       11
  36. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$  
  37. Point   1               0.0     0.0     0.0
  38. Point   2               1.0     0.0     0.0
  39. Point   3               2.0     0.0     0.0
  40. Point   4               3.0     0.0     0.0
  41. Point   5               0.0     1.0     0.0
  42. Point   6               1.0     1.0     0.0
  43. Point   7               2.0     1.0     0.0
  44. Point   8               4.0     1.0     0.0
  45. Point*  9                               0.0             2.0
  46. *       0.0
  47. Point  *3280504         0               1.28286145E+03  1.28286145E+03
  48. *       -2.01004501E+02
  49. Point  *3280505         0               1.28286145-03  1.28286145+03
  50. *       -2.01004501+02
  51. Point   10000000    0.      0.      0.
  52. Point   20000000    5.      0.      0.
  53. Point   30000000    10.     0.      0.
  54. Point   40000000    15.     0.      0.
  55. Point   50000000    20.     0.      0.
  56. Point   60000000    0.      5.      0.
  57. Point   70000000    5.      5.      0.
  58. Point   80000000    10.     5.      0.
  59. Point   90000000    15.     5.      0.
  60. $
  61. END
  62.  
We can use this above input data for testing
Thanks
PSB
Dec 30 '07 #39
bvdet
2,851 Expert Mod 2GB
You will need to make a small change to regex pattern pattnum:
Expand|Select|Wrap|Line Numbers
  1. pattnum = re.compile(r'''
  2.                       -\d+\.\d+E\+\d+|          # engineering notation -+
  3.                       \d+\.\d+E\+\d+|           # engineering notation ++
  4.                       -\d+\.\d+E-\d+|           # engineering notation --
  5.                       \d+\.\d+E-\d+|            # engineering notation +-
  6.                       -\d+\.\d+|                # negative float format
  7.                       \d+\.\d+|                 # positive float format
  8.                       -\d+\.|                   # negative float format
  9.                       \d+\.|                    # positive float format
  10.                       -\.\d+|                   # negative float format
  11.                       \.\d+|                    # positive float format
  12.                       \d{1,8}                   # positive integer
  13.                       ''', re.X
  14.                      )
This will prevent the matching of more than 8 digits at a time. Further adjustments may be required.
Dec 30 '07 #40
psbasha
440 256MB
Thanks BV for your suggestion.

I tried to play around the Pattern you have suggested .Still I am getting the Incorrect data.

Tria
[40000000, 400000005000000090000000L]

Rect
[10000000, 10000000200000007000000060000000L]
[20000000, 20000000300000008000000070000000L]
[30000000, 30000000400000009000000080000000L]


-PSB
Dec 30 '07 #41
bvdet
2,851 Expert Mod 2GB
Look carefully at the suggested pattern. That pattern produces the following output from your corrected sample data:
Expand|Select|Wrap|Line Numbers
  1. >>> Point
  2.     [1, 0.0, 0.0, 0.0]
  3.     [2, 1.0, 0.0, 0.0]
  4.     [3, 2.0, 0.0, 0.0]
  5.     [4, 3.0, 0.0, 0.0]
  6.     [5, 0.0, 1.0, 0.0]
  7.     [6, 1.0, 1.0, 0.0]
  8.     [7, 2.0, 1.0, 0.0]
  9.     [8, 4.0, 1.0, 0.0]
  10.     [9, 0.0, 2.0, 0.0]
  11.     [3280504, 0, 1282.8614500000001, 1282.8614500000001, -201.004501]
  12.     [3280505, 0, 0.0012828614500000001, 1282.8614500000001, -201.004501]
  13.     [10000000, 0.0, 0.0, 0.0]
  14.     [20000000, 5.0, 0.0, 0.0]
  15.     [30000000, 10.0, 0.0, 0.0]
  16.     [40000000, 15.0, 0.0, 0.0]
  17.     [50000000, 20.0, 0.0, 0.0]
  18.     [60000000, 0.0, 5.0, 0.0]
  19.     [70000000, 5.0, 5.0, 0.0]
  20.     [80000000, 10.0, 5.0, 0.0]
  21.     [90000000, 15.0, 5.0, 0.0]
  22. PLine
  23.     [1, 6, 1.5, 9.375, 0.001, 0.001]
  24. Tria
  25.     [40000000, 40000000, 50000000, 90000000]
  26.     [5, 1, 7, 2, 11]
  27. PRect
  28.     [4, 11, 15, 16, 10, 11, 0.29999999999999999]
  29. Line
  30.     [1, 1, 1, 2, 0.0029520000000000002, 0.99254699999999996, 0.121827]
  31. Oth1
  32.     [1, 1, 1, 5, 6, 10, 11, 0.0, 0.0, 10, 11, 0.0, 1.0, 10, 11, 0.0, 1.0, 10, 11, 0.0, 1.0]
  33. Rect
  34.     [2, 1, 2, 3, 7, 6]
  35.     [3, 1, 3, 4, 8, 7]
  36.     [4, 1, 5, 6, 10, 11, 0.0]
  37.     [5, 1, 5, 6, 10, 11, 0.0]
  38.     [10000000, 10000000, 20000000, 70000000, 60000000]
  39.     [20000000, 20000000, 30000000, 80000000, 70000000]
  40.     [30000000, 30000000, 40000000, 90000000, 80000000]
  41. Othr
  42.     [1, 1, 5, 6, 10, 11, 0.0, 0.0, 10, 11, 0.0, 1.0]
  43. >>> 
Dec 30 '07 #42
psbasha
440 256MB
You are right BV.Sorry, I have not copied the entire pattern you have suggested.I have copied the last statement of the pattern in my code.So I have missed one statement of the pattern.

Thanks for your suggestion and help BV.

-PSB
Dec 30 '07 #43
psbasha
440 256MB
BV,

suggest me books and links for the regular expression to start with Basics and later for advance concepts

Thanks
PSB
Dec 30 '07 #44
bvdet
2,851 Expert Mod 2GB
BV,

suggest me books and links for the regular expression to start with Basics and later for advance concepts

Thanks
PSB
This link has some good introductory and intermediate information on regular expressions - LINK

I have been using Kodos for experimenting and testing regular expressions and mostly learned by practicing with and incorporating into my scripts when needed. I do not consider myself an expert on re. Trial and error may be the hard way, but that's the way I learned what I know about Python.
Dec 30 '07 #45
psbasha
440 256MB
This link has some good introductory and intermediate information on regular expressions - LINK

I have been using Kodos for experimenting and testing regular expressions and mostly learned by practicing with and incorporating into my scripts when needed. I do not consider myself an expert on re. Trial and error may be the hard way, but that's the way I learned what I know about Python.

Expand|Select|Wrap|Line Numbers
  1. Re
  2. pattnum = re.compile(r'''
  3.                       -\d+\.\d+E\+\d+|          # engineering notation -+
  4.                       \d+\.\d+E\+\d+|           # engineering notation ++
  5.                       -\d+\.\d+E-\d+|           # engineering notation --
  6.                       \d+\.\d+E-\d+|            # engineering notation +-
  7.                       -\d+\.\d+|                # negative float format
  8.                       \d+\.\d+|                 # positive float format
  9.                       -\d+\.|                   # negative float format
  10.                       \d+\.|                    # positive float format
  11.                       -\.\d+|                   # negative float format
  12.                       \.\d+|                    # positive float format
  13.                       \d{1,8}                   # positive integer
  14.                       ''', re.X
  15.  
  16.  
  17. key_patt = re.compile(r'/([A-Za-z_-]+)/')
  18. data_patt = re.compile(r'\d+\.\d+|\d+|\w+')
  19.  
  20.  
  21.  
Hi BV,

Can you please elaborate the explanation for the above pattern ,with simple examples.What each pattern line stands for?. How are we deciding to go for this types of pattern.

Thanks
PSB
Jan 5 '08 #46
bvdet
2,851 Expert Mod 2GB
Expand|Select|Wrap|Line Numbers
  1. Re
  2. pattnum = re.compile(r'''
  3.                       -\d+\.\d+E\+\d+|          # engineering notation -+
  4.                       \d+\.\d+E\+\d+|           # engineering notation ++
  5.                       -\d+\.\d+E-\d+|           # engineering notation --
  6.                       \d+\.\d+E-\d+|            # engineering notation +-
  7.                       -\d+\.\d+|                # negative float format
  8.                       \d+\.\d+|                 # positive float format
  9.                       -\d+\.|                   # negative float format
  10.                       \d+\.|                    # positive float format
  11.                       -\.\d+|                   # negative float format
  12.                       \.\d+|                    # positive float format
  13.                       \d{1,8}                   # positive integer
  14.                       ''', re.X
  15.  
  16.  
  17. key_patt = re.compile(r'/([A-Za-z_-]+)/')
  18. data_patt = re.compile(r'\d+\.\d+|\d+|\w+')
  19.  
  20.  
  21.  
Hi BV,

Can you please elaborate the explanation for the above pattern ,with simple examples.What each pattern line stands for?. How are we deciding to go for this types of pattern.

Thanks
PSB
Each line in pattnum matches a slightly different format of number as noted in the comments. The last line (''', re.X) contins the VERBOSE flag, which tells the compiler to ignore unecsaped whitespace and comments. The next to last line (\d{1,8}) greedily matches between 1 and eight digits at a time. That is what we fixed earlier to work with your formatted data.

key_patt matches words like this:
/ABC_abc-def/
The brackets '[......]' tell the compiler to match the set of characters enclosed. Since the slash characters are outside the brackets, they must enclose the word in a given string to match. That's how we matched your keywords.

data_patt matches a floating point number, integer or alphanumeric character. The character '|' tells the compiler to match the patttern to the left OR the pattern to the right in a given string.
Jan 5 '08 #47
psbasha
440 256MB
You will need to make a small change to regex pattern pattnum:
Expand|Select|Wrap|Line Numbers
  1. pattnum = re.compile(r'''
  2.                       -\d+\.\d+E\+\d+|          # engineering notation -+
  3.                       \d+\.\d+E\+\d+|           # engineering notation ++
  4.                       -\d+\.\d+E-\d+|           # engineering notation --
  5.                       \d+\.\d+E-\d+|            # engineering notation +-
  6.                       -\d+\.\d+|                # negative float format
  7.                       \d+\.\d+|                 # positive float format
  8.                       -\d+\.|                   # negative float format
  9.                       \d+\.|                    # positive float format
  10.                       -\.\d+|                   # negative float format
  11.                       \.\d+|                    # positive float format
  12.                       \d{1,8}                   # positive integer
  13.                       ''', re.X
  14.                      )
This will prevent the matching of more than 8 digits at a time. Further adjustments may be required.
Expand|Select|Wrap|Line Numbers
  1. SampleData
  2. Line1*  1               1                1              2
  3. *       .002952         .992547         .121827
  4. $
  5.  
  6. Rect2   2        1      2       3       7       6
  7. Rect    3        1      3       4       8       7
  8. PRect2* 4               11              15              16
  9. *       10              11              0.3
  10. Rect2*   4               1               5               6
  11. *       10              11              0.
  12. Othr*   1               1               5               6
  13. *       10              11              0.              0.
  14. *       10              11              0.              1.0
  15. Oth1*   1               1               5               6
  16. *       10              11              0.              0.
  17. *       10              11              0.              1.0
  18. *       10              11              0.              1.0
  19. *       10              11              0.              1.0
  20. Rect*   5               1               5               6
  21. *       10              11              0.
  22. Rect    10000000    10000000200000007000000060000000
  23. Rect    20000000    20000000300000008000000070000000
  24. Rect    30000000    30000000400000009000000080000000
  25. Tria3   40000000    400000005000000090000000
  26. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$  $
  27. Tria    6        1      7       2       11
  28. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$  
  29. Point   1               0.0     0.0     0.0
  30. Point   2               1.0     0.0     0.0
  31. Point   3               2.0     0.0     0.0
  32. Point   4               3.0     0.0     0.0
  33. Point   5               0.0     1.0     0.0
  34. Point   6               1.0     1.0     0.0
  35. Point   7               2.0     1.0     0.0
  36. Point   8               4.0     1.0     0.0
  37. Point*  9                               0.0             2.0
  38. *       0.0
  39. Point  *3280504         0               1.28286145E+03  1.28286145E+03
  40. *       -2.01004501E+02
  41. Point  *3280505         0               1.28286145-03  1.28286145+03
  42. *       -2.01004501+02
  43. Point   10000000    0.      0.      0.
  44. Point   20000000    5.      0.      0.
  45. Point   30000000    10.     0.      0.
  46. Point   40000000    15.     0.      0.
  47. Point   50000000    20.     0.      0.
  48. Point   60000000    0.      5.      0.
  49. Point   70000000    5.      5.      0.
  50. Point   80000000    10.     5.      0.
  51. Point   90000000    15.     5.      0.
  52. $
  53. END
  54.  
if the keywords are defined as below
keywords = ['Point', 'Othr', 'Rect2', 'Rect','PRect', 'PLine', 'Line1', 'Tria'3,'Oth1']
The output we are getting is Incorrect.
Expand|Select|Wrap|Line Numbers
  1. Output
  2. $Incorrect output
  3. Line1
  4.     [1, 1, 1, 1, 2, 0.0029520000000000002, 0.99254699999999996, 0.121827]
  5. Tria3
  6.     [3,40000000, 40000000, 50000000, 90000000]
  7.  
  8. Oth1
  9.     [1, 1, 1, 5, 6, 10, 11, 0.0, 0.0, 10, 11, 0.0, 1.0, 10, 11, 0.0, 1.0, 10, 11, 0.0, 1.0]
  10.  
  11. Rect2
  12.     [2, 2, 1, 2, 3, 7, 6]
  13.     [2, 4, 1, 5, 6, 10, 11, 0.0]
  14. $Correct output is 
  15. Line1
  16.     [1, 1, 1, 2, 0.0029520000000000002, 0.99254699999999996, 0.121827]
  17. Tria3
  18.     [40000000, 40000000, 50000000, 90000000]
  19.  
  20. Oth1
  21.     [ 1, 1, 5, 6, 10, 11, 0.0, 0.0, 10, 11, 0.0, 1.0, 10, 11, 0.0, 1.0, 10, 11, 0.0, 1.0]
  22.  
  23. Rect2
  24.     [ 2, 1, 2, 3, 7, 6]
  25.     [ 4, 1, 5, 6, 10, 11, 0.0]
  26.  
  27.  
The source code is taking the 'Rect2' keyword number '2' also.sinmilarly for Line1,Tria3
Jan 6 '08 #48
bvdet
2,851 Expert Mod 2GB
Try making adjustments to pattnum and pattkey:
Expand|Select|Wrap|Line Numbers
  1. # last line in pattnum
  2. # matches integers of length between 1 and 8 digits,
  3. # if not preceded by an alpha character
  4. # matching as many repetitions possible
  5. ................(?<![a-zA-Z])\d{1,8}  # positive integer
  6.  
  7. # matches keywords listed in kargs
  8. # may or may not have a trailing asterisk
  9. # there must be a word boundary both ends
  10. ....pattkey = re.compile('|'.join([r'\b(%s)\*?\b' % item for item in kargs]))
Jan 6 '08 #49
psbasha
440 256MB
Try making adjustments to pattnum and pattkey:
Expand|Select|Wrap|Line Numbers
  1. # last line in pattnum
  2. # matches integers of length between 1 and 8 digits,
  3. # if not preceded by an alpha character
  4. # matching as many repetitions possible
  5. ................(?<![a-zA-Z])\d{1,8}  # positive integer
  6.  
  7. # matches keywords listed in kargs
  8. # may or may not have a trailing asterisk
  9. # there must be a word boundary both ends
  10. ....pattkey = re.compile('|'.join([r'\b(%s)\*?\b' % item for item in kargs]))
I try to add the pattern as suggested at the last

Expand|Select|Wrap|Line Numbers
  1. Pat
  2. pattnum = re.compile(r'''
  3.                       -\d+\.\d+E\+\d+|          # engineering notation -+
  4.                       \d+\.\d+E\+\d+|           # engineering notation ++
  5.                       -\d+\.\d+E-\d+|           # engineering notation --
  6.                       \d+\.\d+E-\d+|            # engineering notation +-
  7.                       -\d+\.\d+|                # negative float format
  8.                       \d+\.\d+|                 # positive float format
  9.                       -\d+\.|                   # negative float format
  10.                       \d+\.|                    # positive float format
  11.                       -\.\d+|                   # negative float format
  12.                       \.\d+|                    # positive float format
  13.                       \d{1,8}|
  14.                       \(?<![a-zA-Z])\d{1,8}  # positive integer
  15.                       ''', re.X
  16.                      )
  17.  
and the line

pattkey = re.compile('|'.join([r'\b(%s)\*?\b' % item for item in kargs]))

I hope there is a syntac error in the pattern

\(?<![a-zA-Z])\d{1,8} # positive integer.

I am getting the following error
Expand|Select|Wrap|Line Numbers
  1. Error
  2.   File "C:\\Sample.py", line 12, in ?
  3.     pattnum = re.compile(r'''
  4.   File "C:\Python24\lib\sre.py", line 180, in compile
  5.     return _compile(pattern, flags)
  6.   File "C:\Python24\lib\sre.py", line 227, in _compile
  7.     raise error, v # invalid expression
  8. error: unbalanced parenthesis
  9.  
  10.  
Jan 6 '08 #50

Sign in to post your reply or Sign up for a free account.

Similar topics

136
by: Matt Kruse | last post by:
http://www.JavascriptToolbox.com/bestpractices/ I started writing this up as a guide for some people who were looking for general tips on how to do things the 'right way' with Javascript. Their...
5
by: Andrew S. Giles | last post by:
I thought I would post here, as I am sure someone, somewhere has run into this problem, and might have a good solution for me. I am writing an applicaiton in C# that will accept data and then put...
5
by: booksnore | last post by:
I am reading some very large files greater than 10 GB. Some of the files (not all) contain a header and footer record identified by "***" in the first three characters of the record. I need to...
10
by: jojobar | last post by:
Hello, I am trying to use vs.net 2005 to migrate a project originally in vs.net 2003. I started with creation of a "web site", and then created folders for each component of the site. I read...
0
by: David Helgason | last post by:
I think those best practices threads are a treat to follow (might even consider archiving some of them in a sort of best-practices faq), so here's one more. In coding an game asset server I want...
14
by: Jon Rea | last post by:
I am currently cleaning up an application which was origainlly hashed together with speed of coding in mind and therefore contains quite a few "hacky" shortcuts. As part of this "revamping"...
2
yashg
by: yashg | last post by:
I am building a data backup application in C# using Sockets. It has a server component and a client component. The client is going to upload files to the server through TCP sockets. I've got all...
21
by: Owen Zhang | last post by:
What is the best way to implement "tail -f" in C or C++ and higher performance compared to either unix shell command "tail -f" or perl File::Tail ? Any suggestion appreciated. Thanks.
4
by: Brian | last post by:
HI, I have two sets of data, the largest set of data contains 370 rows... both sets only have two columns. I want to be able to distribute the data with my applaction. The other option, would be...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.