By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,345 Members | 1,758 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,345 IT Pros & Developers. It's quick & easy.

File Parsing

100+
P: 440
Hi ,

Below is the file format ,which has Keywords in the file.I would like to store the data in the different variables ( Parameters,Points ,Lines ,Circle)

Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2.  
  3. $$$$Header$$$$$$$$$$$$
  4. $$$$Parameter$$$$$$$$$
  5.  
  6. /Parameter_Value/ 1.0
  7.  
  8. /Point/ 
  9. 10.0 10.0 10.0 $ Comment: Point Data
  10. 20.0 20.0 20.0 
  11.  
  12. $$$$$$Line$$$$$$$$
  13.  
  14. /Line/ $Line Data
  15.  
  16. 10.0 15.0 0.0
  17. 20.0 10.0 0.0 
  18.  
  19. $$$$$$Circle$$$$$$$$
  20. /Circle/
  21.  
  22. 10.0 $Radius
  23.  
  24. 0.0 0.0 0.0  $Center
  25.  
Can body help me in the best way ( Oprtimized way - interms of lines of code) of writing the code.

Thanks
PSB
Aug 10 '07 #1
Share this Question
Share on Google+
21 Replies


100+
P: 440
Hi ,

Below is the file format ,which has Keywords in the file.I would like to store the data in the different variables ( Parameters,Points ,Lines ,Circle)

Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2.  
  3. $$$$Header$$$$$$$$$$$$
  4. $$$$Parameter$$$$$$$$$
  5.  
  6. /Parameter_Value/ 1.0
  7.  
  8. /Point/ 
  9. 10.0 10.0 10.0 $ Comment: Point Data
  10. 20.0 20.0 20.0 
  11.  
  12. $$$$$$Line$$$$$$$$
  13.  
  14. /Line/ $Line Data
  15.  
  16. 10.0 15.0 0.0
  17. 20.0 10.0 0.0 
  18.  
  19. $$$$$$Circle$$$$$$$$
  20. /Circle/
  21.  
  22. 10.0 $Radius
  23.  
  24. 0.0 0.0 0.0  $Center
  25.  
Can body help me in the best way ( Oprtimized way - interms of lines of code) of writing the code.

Thanks
PSB
The above is a sample data only.I have to read different Unique Geometry elements data in that file format having the different and unique key word

-PSB
Aug 10 '07 #2

bvdet
Expert Mod 2.5K+
P: 2,851
The above is a sample data only.I have to read different Unique Geometry elements data in that file format having the different and unique key word

-PSB
It's probably not the best way, but it seems to work. All dictionary values are lists:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. key_patt = re.compile(r'/([A-Za-z_]+)/')
  4. data_patt = re.compile(r'\d+.\d+')
  5. fn = 'data.txt'
  6.  
  7. f = open(fn)
  8. key = None
  9. dd = {}
  10. lineList = [line.strip() for line in open(fn).readlines() \
  11.             if line != '\n' and not line.startswith('$')]
  12. for line in lineList:
  13.     try:
  14.         line = line[:line.index('$')]
  15.     except:
  16.         pass
  17.     m = key_patt.search(line)
  18.     if m:
  19.         key = m.group(1)
  20.         dd[key] = []
  21.         if data_patt.search(line):
  22.             dd[key] = [float(data_patt.search(line).group(0))]
  23.         else:
  24.             dd[key] = []
  25.     else:
  26.         m1 = data_patt.search(line)
  27.         if m1:
  28.             dd[key].append([float(n) for n in data_patt.findall(line)])
  29.  
  30. for key in dd:
  31.     print '%s = %s' % (key, dd[key])
Did you ever resolve the point translation issue (this thread )? You never responded after I posted what I thought was a solution for you. A little feedback would be appreciated. Here's the output:
>>> Line = [[10.0, 15.0, 0.0], [20.0, 10.0, 0.0]]
Parameter_Value = [1.0]
Circle = [[10.0], [0.0, 0.0, 0.0]]
Point = [[10.0, 10.0, 10.0], [20.0, 20.0, 20.0]]
>>>
Aug 11 '07 #3

100+
P: 440
It's probably not the best way, but it seems to work. All dictionary values are lists:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. key_patt = re.compile(r'/([A-Za-z_]+)/')
  4. data_patt = re.compile(r'\d+.\d+')
  5. fn = 'data.txt'
  6.  
  7. f = open(fn)
  8. key = None
  9. dd = {}
  10. lineList = [line.strip() for line in open(fn).readlines() \
  11.             if line != '\n' and not line.startswith('$')]
  12. for line in lineList:
  13.     try:
  14.         line = line[:line.index('$')]
  15.     except:
  16.         pass
  17.     m = key_patt.search(line)
  18.     if m:
  19.         key = m.group(1)
  20.         dd[key] = []
  21.         if data_patt.search(line):
  22.             dd[key] = [float(data_patt.search(line).group(0))]
  23.         else:
  24.             dd[key] = []
  25.     else:
  26.         m1 = data_patt.search(line)
  27.         if m1:
  28.             dd[key].append([float(n) for n in data_patt.findall(line)])
  29.  
  30. for key in dd:
  31.     print '%s = %s' % (key, dd[key])
Did you ever resolve the point translation issue (this thread )? You never responded after I posted what I thought was a solution for you. A little feedback would be appreciated. Here's the output:
>>> Line = [[10.0, 15.0, 0.0], [20.0, 10.0, 0.0]]
Parameter_Value = [1.0]
Circle = [[10.0], [0.0, 0.0, 0.0]]
Point = [[10.0, 10.0, 10.0], [20.0, 20.0, 20.0]]
>>>
Thanks BV for the solution.
The Point translation problem I have took the portion of the code snippet and solved with your approach.But if you have better approach than previous one,you can post the solution.So that I can use that approach.

-PSB
Aug 11 '07 #4

100+
P: 440
Hi,

I have the below file format,how to read in a concise way,
The file looke like this
Expand|Select|Wrap|Line Numbers
  1. Sample Data
  2. 4 Types
  3. _up,1
  4. _low,2
  5. _left,5
  6. _right,6
  7.  
  8. 2Flags
  9. _low,no
  10. _up,yes
  11.  
  12. 1 Data
  13. x, 10
  14.  
  15. 4 Values
  16. 1,0,0
  17. 1,1,0
  18. 1,1,1
  19. 1,1,0
  20.  
  21. 2 Planes Type-1
  22. 1,0,0
  23. 0,0,0
  24. 0,1,0
  25. 0,0,0
  26. 0,1,0
  27. 0,0,1
  28.  
  29.  
In case of plane there are 2 planes defined,we have to have 2plane data seperate.

Thanks
PSB
Aug 24 '07 #5

100+
P: 440
It's probably not the best way, but it seems to work. All dictionary values are lists:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. key_patt = re.compile(r'/([A-Za-z_]+)/')
  4. data_patt = re.compile(r'\d+.\d+')
  5. fn = 'data.txt'
  6.  
  7. f = open(fn)
  8. key = None
  9. dd = {}
  10. lineList = [line.strip() for line in open(fn).readlines() \
  11.             if line != '\n' and not line.startswith('$')]
  12. for line in lineList:
  13.     try:
  14.         line = line[:line.index('$')]
  15.     except:
  16.         pass
  17.     m = key_patt.search(line)
  18.     if m:
  19.         key = m.group(1)
  20.         dd[key] = []
  21.         if data_patt.search(line):
  22.             dd[key] = [float(data_patt.search(line).group(0))]
  23.         else:
  24.             dd[key] = []
  25.     else:
  26.         m1 = data_patt.search(line)
  27.         if m1:
  28.             dd[key].append([float(n) for n in data_patt.findall(line)])
  29.  
  30. for key in dd:
  31.     print '%s = %s' % (key, dd[key])
Did you ever resolve the point translation issue (this thread )? You never responded after I posted what I thought was a solution for you. A little feedback would be appreciated. Here's the output:
>>> Line = [[10.0, 15.0, 0.0], [20.0, 10.0, 0.0]]
Parameter_Value = [1.0]
Circle = [[10.0], [0.0, 0.0, 0.0]]
Point = [[10.0, 10.0, 10.0], [20.0, 20.0, 20.0]]
>>>
Hi BV,

I have tried with above piece of code for reading some more filed formats as mentioned below ,the peice of code is not supporting this field format.Can you please suggest how to group for digits and alphanumeric values for the below scenarios.

Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2. Sample.txt
  3.  
  4. $$$$Header$$$$$$$$$$$$
  5. $$$$Parameter$$$$$$$$$
  6.  
  7. /Parameter_Value/ 1.0
  8.  
  9. /Point/ 
  10. 10.0 10.0 10.0 $ Comment: Point Data
  11. 20.0 20.0 20.0 
  12.  
  13. $$$$$$Line$$$$$$$$
  14.  
  15. /Line/ $Line Data
  16.  
  17. 10.0 15.0 0.0
  18. 20.0 10.0 0.0 
  19.  
  20. $$$$$$Circle$$$$$$$$
  21. /Circle/
  22.  
  23. 10.0 $Radius
  24.  
  25. 0.0 0.0 0.0  $Center
  26.  
  27.  
  28. /DashedLineType/            21 $Dashed Line
  29.  
  30. /XMin_XMax_YMin_YMax/        1 27 1 37 $ Min and Max value
  31.  
  32. /LineFlag/        yes $ Flag to update
  33.  
  34.  
  35. /XY-Plane/ 'Planes'
  36. 1,0,0
  37. 0,1,0
  38. 0,0,0
  39.  
  40. /XY-Plane/ 'Planes'
  41. 2,0,0
  42. 0,2,0
  43. 0,0,0
  44.  
  45.  
  46. /Format/
  47. $Values    
  48.     3     3     1    50    25    28   'Yes'  1 
  49.  
  50.  
Thanks
PSB
Sep 14 '07 #6

bvdet
Expert Mod 2.5K+
P: 2,851
Hi BV,

I have tried with above piece of code for reading some more filed formats as mentioned below ,the peice of code is not supporting this field format.Can you please suggest how to group for digits and alphanumeric values for the below scenarios.

Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2. Sample.txt
  3.  
  4. $$$$Header$$$$$$$$$$$$
  5. $$$$Parameter$$$$$$$$$
  6.  
  7. /Parameter_Value/ 1.0
  8.  
  9. /Point/ 
  10. 10.0 10.0 10.0 $ Comment: Point Data
  11. 20.0 20.0 20.0 
  12.  
  13. $$$$$$Line$$$$$$$$
  14.  
  15. /Line/ $Line Data
  16.  
  17. 10.0 15.0 0.0
  18. 20.0 10.0 0.0 
  19.  
  20. $$$$$$Circle$$$$$$$$
  21. /Circle/
  22.  
  23. 10.0 $Radius
  24.  
  25. 0.0 0.0 0.0  $Center
  26.  
  27.  
  28. /DashedLineType/            21 $Dashed Line
  29.  
  30. /XMin_XMax_YMin_YMax/        1 27 1 37 $ Min and Max value
  31.  
  32. /LineFlag/        yes $ Flag to update
  33.  
  34.  
  35. /XY-Plane/ 'Planes'
  36. 1,0,0
  37. 0,1,0
  38. 0,0,0
  39.  
  40. /XY-Plane/ 'Planes'
  41. 2,0,0
  42. 0,2,0
  43. 0,0,0
  44.  
  45.  
  46. /Format/
  47. $Values    
  48.     3     3     1    50    25    28   'Yes'  1 
  49.  
  50.  
Thanks
PSB
When I write data to a file, I always set up a structured format that is easy to parse. You should try it. This code seems to work:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. # thanks ilikepython!
  4. def indexList(s, item, start = 0):
  5.     return [i + start for (i, obj) in enumerate(s[start:]) if obj == item]
  6.  
  7. def convertType(s):
  8.     for func in (int, float, eval):
  9.         try:
  10.             n = func(s)
  11.             return n
  12.         except:
  13.             pass
  14.     return s
  15.  
  16. key_patt = re.compile(r'/([A-Za-z_-]+)/')
  17. data_patt = re.compile(r'\d+\.\d+|\d+|\w+')
  18. fn = 'parameter.txt'
  19.  
  20. key = None
  21. dd = {}
  22. lineList = [line.strip() for line in open(fn).readlines() if line != '\n' and not line.startswith('$')]
  23. for line in lineList:
  24.     try:
  25.         line = line[:line.index('$')]
  26.     except:
  27.         pass
  28.     m = key_patt.search(line)
  29.     if m:
  30.         key = m.group(1)
  31.         line1 = line[indexList(line, '/')[1]+1:]
  32.         if data_patt.search(line1):
  33.             if dd.has_key(key):
  34.                 dd[key] = dd[key]+[convertType(item) for item in data_patt.findall(line1)]
  35.             else:
  36.                 dd[key] = [convertType(item) for item in data_patt.findall(line1)]
  37.         else:
  38.             dd[key] = []
  39.     else:
  40.         m1 = data_patt.search(line)
  41.         if m1:
  42.             dd[key].append([convertType(n) for n in data_patt.findall(line)])
  43.  
  44. for key in dd:
  45.     print '%s = %s' % (key, dd[key])
>>> DashedLineType = [21]
Parameter_Value = [1.0]
Point = [[10.0, 10.0, 10.0], [20.0, 20.0, 20.0]]
XY-Plane = ['Planes', [1, 0, 0], [0, 1, 0], [0, 0, 0], 'Planes', [2, 0, 0], [0, 2, 0], [0, 0, 0]]
Format = [[3, 3, 1, 50, 25, 28, 'Yes', 1]]
XMin_XMax_YMin_YMax = [1, 27, 1, 37]
LineFlag = ['yes']
Line = [[10.0, 15.0, 0.0], [20.0, 10.0, 0.0]]
Circle = [[10.0], [0.0, 0.0, 0.0]]
>>>
Sep 14 '07 #7

100+
P: 440
Expand|Select|Wrap|Line Numbers
  1. SampleTest
  2.  
  3. $$$$Header$$$$$$$$$$$$
  4. $$$$Parameter$$$$$$$$$
  5.  
  6. /Parameter_range/ 1 1
  7.  
  8. /Flag1/ 1
  9. /Flag2/ 1
  10. /DummyFlag1/ 1
  11.  
  12. /STOP/ Line and Circle
  13.  
  14. $$$$
  15.  
  16. /LineThick/ 0.1 $$$Line Thickness
  17.  
  18. $$$$
  19.  
  20. /Top1/ 10 $$Value1
  21. /Top2/ 11 $$Value2
  22.  
  23.  $$$
  24. /Bot1/  20 $$Comment
  25. /Bot2/ 30 $$Comment
  26. /Bot4/ 40 $$Comment
  27.  
  28. $$
  29. /TOl1/ -0.05
  30. /TOl2/ 0.01
  31.  
  32. $$$$$$Line IDs$$$$$$$$
  33.  
  34. /NOT/  10 11 12 1
  35. /NOT/  10 11 12 2
  36. /Ok/   11 12 1  3
  37.  
  38. /MAT/ $$
  39. 1 $Begin
  40. 100.    40.    30.    2.0    0 ****22 ksdas
  41. 2
  42. 200.    40.    60.    2.0    0 ****22 ksdas
  43. 3
  44. 600.    40.    30.    5.0    0 ****22 ksdas
  45. 4
  46. 500.    40.    70.    2.0    0 ****22 ksdas
  47. 0 $End
  48. 2 ***Values $Begin  
  49. 1000.  .1
  50. 2000.  .2
  51. 3000.  .3
  52. 4000.  .6
  53.    0.  .0 $End
  54.  
  55. 3 ***Values $Begin  
  56. 3000.  .1
  57. 5000.  .2
  58. 6000.  .3
  59. 7000.  .6
  60.    0.  .0 $End
  61. 0 $End
  62.  
  63. 2 ***Values $Begin  
  64. 1000.  .1
  65. 2000.  .2
  66. 3000.  .3
  67. 4000.  .6
  68.    0.  .0 $End
  69.  
  70. 3 ***Values $Begin  
  71. 13000.  .1
  72. 45000.  .2
  73. 56000.  .3
  74. 87000.  .6
  75.     0.  .0 $End
  76. 0 $End
  77. 2 $Begin
  78.     2.0 .00
  79.     2.0 .210
  80.     3.0 .235
  81.     0.  .0 $End
  82. 3 $Begin
  83.     2.0 .00
  84.     2.0 .210
  85.     3.0 .235
  86.     0.  .0 $End
  87. 0 $End
  88. /4*ALL/ $ ***
  89.  11       1       1       1     69716.   1000
  90.  11       1       1       5     76296.   1000
  91.  31       1       1       6     74926.   1000
  92.  31       1       1       7     74653.   1000
  93.  
I have using the above sameple code for reading and storing the data.But I am getting the following error as mentioned below.How to cutomize the above piece of code for reading the above sample file?

PythonWin 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on win32.
File "C:\Sample-Mat.py", line 42, in ?
dd[key].append([convertType(n) for n in data_patt.findall(line)])

Thanks
PSB
Jan 5 '08 #8

bvdet
Expert Mod 2.5K+
P: 2,851
You will have to explain how you need the data tabulated. I have no idea what most of the data is.
Jan 5 '08 #9

100+
P: 440
You will have to explain how you need the data tabulated. I have no idea what most of the data is.
Expand|Select|Wrap|Line Numbers
  1. Description
  2. Hi BV,
  3.  
  4. we have the kewords in the '/ /'.The respective data is available beside or below the keywords for some cases.
  5.  
  6. The data should be stored as shwon below ,but using the dict and list using regular expression.
  7.  
  8. parameter_range = [1,1]
  9.  
  10. Flag1 = 1
  11. .....
  12.  
  13. STOP = 'Line and Circel'
  14.  
  15. Top1 = 10
  16. Top2 = 11
  17. ...
  18.  
  19. Bot1 = 20
  20. Bot2 = 30
  21. ....
  22.  
  23. Tol1 = -0.05
  24. Tol2 = 0.01
  25.  
  26. NOT = [[ 10,11,12,1],[10,11,12,2]]
  27. OK = [[ 11,12,1,3]]
  28. MAT = { 1:[100.,40.,30.,20.,0],2:[200.,40.,60.,2.0,0],3:[600,40.,30.,5.0,0],4:[500.,40.,70.,2.0,0]}
  29.  
  30. # 2-integer number  is the start for the block and '0. .0' is the end
  31. MATc = {2:[[1000., 0.1],[2000. ,.2],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.1],[5000.,0.2],[6000.,.4],[7000.,.6]}
  32. # 0. .0 is the end of the sub block
  33. # o is the end of the block
  34.  
  35. #Similarly for the other block
  36. # 2-integer number  is the start for the block and '0. .0' is the end
  37. MATT = {2:[[1000., 0.1],[2000. ,.5],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.9],[5000.,0.2],[6000.,.4],[7000.,.6]}
  38. # 0. .0 is the end of the sub block
  39. # o is the end of the block
  40.  
  41. #Similarly for the other block
  42. # 2-integer number  is the start for the block and '0. .0' is the end
  43.  
  44. Factor = {2:[[0.00,2.0],[.210,2.0],[0.235,3.0]],3:[[0.00,2.0],[.2110,2.0],[0.2135,3.0]]}
  45. # 0. .0 is the end of the sub block
  46. # o is the end of the block
  47.  
  48.  
  49.  
  50. ALL = [[ 11,1,1,1,69716.,1000],[ 11,1,1,5,76296.,1000],[ 31,1,1,6,74926.,1000],[ 31, 1,1,7,74653.,1000]]
  51.  
Jan 5 '08 #10

bvdet
Expert Mod 2.5K+
P: 2,851
Where does MATT, MATc, and Factor come from?
Jan 6 '08 #11

bvdet
Expert Mod 2.5K+
P: 2,851
BTW, your script fails on your data because all comment lines must begin with '$'. It fails on the first line of data.
Jan 6 '08 #12

100+
P: 440
Where does MATT, MATc, and Factor come from?
Sorry I am explaining how the data can be stored in the dictonary variables or over all data..Its only example to store the data.

MATT,MATc and Factor are variables.

Thanks
PSB
Jan 6 '08 #13

100+
P: 440
BTW, your script fails on your data because all comment lines must begin with '$'. It fails on the first line of data.

Sorry ,all the commnets start with '$' sign.
Jan 6 '08 #14

bvdet
Expert Mod 2.5K+
P: 2,851
I made a few minor changes to the code in the earlier solution. Following is the entire source code and output from your data file (with the first line commented out):
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. def indexList(s, item, i=0):
  4.     i_list = []
  5.     while True:
  6.         try:
  7.             i = s.index(item, i)
  8.             i_list.append(i)
  9.             i += 1
  10.         except:
  11.             break
  12.     return i_list
  13.  
  14. def convertType(s):
  15.     for func in (int, float, eval):
  16.         try:
  17.             n = func(s)
  18.             return n
  19.         except:
  20.             pass
  21.     return s
  22.  
  23. key_patt = re.compile(r'/([A-Za-z_\-0-9]+)/')
  24. data_patt = re.compile(r'\d+\.\d+|\d+|\w+')
  25.  
  26. # function to strip comments
  27. def strip_comments(s):
  28.     if '$' in s:
  29.         return s[:s.index('$')]
  30.     elif '*' in s:
  31.         return s[:s.index('*')]
  32.     return s
  33.  
  34. def parse_data(fn):
  35.     key = None
  36.     dd = {}
  37.     lineList = [strip_comments(line.strip()) for line in open(fn).readlines()\
  38.                 if line != '\n' and not line.startswith('$')]
  39.     for line in lineList:
  40.         m = key_patt.search(line)
  41.         if m:
  42.             key = m.group(1)
  43.             line1 = line[indexList(line, '/')[1]+1:]
  44.             if data_patt.search(line1):
  45.                 if dd.has_key(key):
  46.                     dd[key] = dd[key]+[convertType(item) for item in \
  47.                                        data_patt.findall(line1)]
  48.                 else:
  49.                     dd[key] = [convertType(item) for item in \
  50.                                data_patt.findall(line1)]
  51.             else:
  52.                 dd[key] = []
  53.         else:
  54.             m1 = data_patt.search(line)
  55.             if m1:
  56.                 dd[key].append([convertType(n) for n in \
  57.                                 data_patt.findall(line)])
  58.     return dd
  59.  
  60. if __name__ == '__main__':
  61.     #fn = r'H:\TEMP\temsys\parameter.txt'
  62.     fn = r'H:\TEMP\temsys\sample_data1.txt'
  63.     dataDict = parse_data(fn)
  64.     for key in dataDict:
  65.         print '%s = %s' % (key, dataDict[key])
  66.  
  67. >>> Ok = [11, 12, 1, 3]
  68. MAT = [[1], [100, 40, 30, 2.0, 0], [2], [200, 40, 60, 2.0, 0], [3], [600, 40, 30, 5.0, 0], [4], [500, 40, 70, 2.0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [3000, 1], [5000, 2], [6000, 3], [7000, 6], [0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [13000, 1], [45000, 2], [56000, 3], [87000, 6], [0, 0], [0], [2], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [3], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [0], [4, 'ALL'], [11, 1, 1, 1, 69716, 1000], [11, 1, 1, 5, 76296, 1000], [31, 1, 1, 6, 74926, 1000], [31, 1, 1, 7, 74653, 1000]]
  69. LineThick = [0.10000000000000001]
  70. TOl2 = [0.01]
  71. TOl1 = [0.050000000000000003]
  72. STOP = ['Line', 'and', 'Circle']
  73. Top2 = [11]
  74. Top1 = [10]
  75. Bot4 = [40]
  76. Bot1 = [20]
  77. NOT = [10, 11, 12, 1, 10, 11, 12, 2]
  78. Flag2 = [1]
  79. Flag1 = [1]
  80. Parameter_range = [1, 1]
  81. Bot2 = [30]
  82. DummyFlag1 = [1]
  83. >>> 
I understand that this is not your final solution. Maybe you can come up with a way to parse the 'MAT' data.
Jan 6 '08 #15

100+
P: 440
BTW, your script fails on your data because all comment lines must begin with '$'. It fails on the first line of data.
Expand|Select|Wrap|Line Numbers
  1. SampleInputData
  2.  
  3. $$$$Header$$$$$$$$$$$$
  4. $$$$Parameter$$$$$$$$$
  5.  
  6. /Parameter_range/ 1 1
  7.  
  8. /Flag1/ 1
  9. /Flag2/ 1
  10. /DummyFlag1/ 1
  11.  
  12. /STOP/ Line and Circle
  13.  
  14. $$$$
  15.  
  16. /LineThick/ 0.1 $$$Line Thickness
  17.  
  18. $$$$
  19.  
  20. /Top1/ 10 $$Value1
  21. /Top2/ 11 $$Value2
  22.  
  23.  $$$
  24. /Bot1/  20 $$Comment
  25. /Bot2/ 30 $$Comment
  26. /Bot4/ 40 $$Comment
  27.  
  28. $$
  29. /TOl1/ -0.05
  30. /TOl2/ 0.01
  31.  
  32. $$$$$$Line IDs$$$$$$$$
  33.  
  34. /NOT/  10 11 12 1
  35. /NOT/  10 11 12 2
  36. /Ok/   11 12 1  3
  37.  
  38. /MAT/ $$
  39. 1 $Begin
  40. 100.    40. 30.  2.0   0 ****22 ksdas
  41. 2
  42. 200.    40. 60.  2.0   0 ****22 ksdas
  43. 3
  44. 600.    40. 30.  5.0   0 ****22 ksdas
  45. 4
  46. 500.    40. 70.  2.0   0 ****22 ksdas
  47. 0 $End
  48. 2 ***Values $Begin  
  49. 1000.  .1
  50. 2000.  .2
  51. 3000.  .3
  52. 4000.  .6
  53.    0.  .0 $End
  54.  
  55. 3 ***Values $Begin  
  56. 3000.  .1
  57. 5000.  .2
  58. 6000.  .3
  59. 7000.  .6
  60.    0.  .0 $End
  61. 0 $End
  62.  
  63. 2 ***Values $Begin  
  64. 1000.  .1
  65. 2000.  .2
  66. 3000.  .3
  67. 4000.  .6
  68.    0.  .0 $End
  69.  
  70. 3 ***Values $Begin  
  71. 13000.  .1
  72. 45000.  .2
  73. 56000.  .3
  74. 87000.  .6
  75.     0.  .0 $End
  76. 0 $End
  77. 2 $Begin
  78.     2.0 .00
  79.     2.0 .210
  80.     3.0 .235
  81.     0.  .0 $End
  82. 3 $Begin
  83.     2.0 .00
  84.     2.0 .210
  85.     3.0 .235
  86.     0.  .0 $End
  87. 0 $End
  88. /4*ALL/ $ ***
  89.  11       1       1       1     69716.   1000
  90.  11       1       1       5     76296.   1000
  91.  31       1       1       6     74926.   1000
  92.  31       1       1       7     74653.   1000
  93.  
Jan 6 '08 #16

100+
P: 440
>>> Ok = [11, 12, 1, 3]
MAT = [[1], [100, 40, 30, 2.0, 0], [2], [200, 40, 60, 2.0, 0], [3], [600, 40, 30, 5.0, 0], [4], [500, 40, 70, 2.0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [3000, 1], [5000, 2], [6000, 3], [7000, 6], [0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [13000, 1], [45000, 2], [56000, 3], [87000, 6], [0, 0], [0], [2], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [3], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [0], [4, 'ALL'], [11, 1, 1, 1, 69716, 1000], [11, 1, 1, 5, 76296, 1000], [31, 1, 1, 6, 74926, 1000], [31, 1, 1, 7, 74653, 1000]]
LineThick = [0.10000000000000001]
TOl2 = [0.01]
TOl1 = [0.050000000000000003]
STOP = ['Line', 'and', 'Circle']
Top2 = [11]
Top1 = [10]
Bot4 = [40]
Bot1 = [20]
NOT = [10, 11, 12, 1, 10, 11, 12, 2]
Flag2 = [1]
Flag1 = [1]
Parameter_range = [1, 1]
Bot2 = [30]
DummyFlag1 = [1]
>>> [/code]I understand that this is not your final solution. Maybe you can come up with a way to parse the 'MAT' data.[/quote]

Expand|Select|Wrap|Line Numbers
  1. Description
  2. >>> 
  3. ------------------------------------------------------------------
  4. Ok = [11, 12, 1, 3]  # This should be [[11, 12, 1, 3] ].In this we have one or more 
  5.  
  6. MAT = [[1], [100, 40, 30, 2.0, 0], [2], [200, 40, 60, 2.0, 0], [3], [600, 40, 30, 5.0, 0], [4], [500, 40, 70, 2.0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [3000, 1], [5000, 2], [6000, 3], [7000, 6], [0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [13000, 1], [45000, 2], [56000, 3], [87000, 6], [0, 0], [0], [2], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [3], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [0], [4, 'ALL'], [11, 1, 1, 1, 69716, 1000], [11, 1, 1, 5, 76296, 1000], [31, 1, 1, 6, 74926, 1000], [31, 1, 1, 7, 74653, 1000]]
  7.  
  8. #The mat should have only these values
  9. MAT = { 1:[100.,40.,30.,20.,0],2:[200.,40.,60.,2.0,0],3:[600,40.,30.,5.0,0],4:[500.,40.,70.,2.0,0]}
  10. -------------------------------------------------------------------------------
  11. #Other  block data should be
  12. {2:[[1000., 0.1],[2000. ,.2],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.1],[5000.,0.2],[6000.,.4],[7000.,.6]}
  13.  
  14.  {2:[[1000., 0.1],[2000. ,.5],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.9],[5000.,0.2],[6000.,.4],[7000.,.6]}
  15.  
  16. {2:[[0.00,2.0],[.210,2.0],[0.235,3.0]],3:[[0.00,2.0],[.2110,2.0],[0.2135,3.0]]}
  17.  
  18. ALL = [[ 11,1,1,1,69716.,1000],[ 11,1,1,5,76296.,1000],[ 31,1,1,6,74926.,1000],[ 31, 1,1,7,74653.,1000]]
  19.  
  20. -------------------------------------------------------------------------------
  21. LineThick = [0.10000000000000001]
  22. TOl2 = [0.01]
  23. TOl1 = [0.050000000000000003]
  24. ----------------------------------------------------------
  25. STOP = ['Line', 'and', 'Circle']
  26. #It should be  STOP = 'Line and Circle'
  27. ----------------------------------------------------------
  28. Top2 = [11]
  29. Top1 = [10]
  30. Bot4 = [40]
  31. Bot1 = [20]
  32. ----------------------------------------------------------
  33. NOT = [10, 11, 12, 1, 10, 11, 12, 2]
  34.  
  35. #it should be stored as 
  36. NOT = [[ 10,11,12,1],[10,11,12,2]]
  37.  
  38. ----------------------------------------------------------
  39. Flag2 = [1]
  40. Flag1 = [1]
  41. Parameter_range = [1, 1]
  42. Bot2 = [30]
  43. DummyFlag1 = [1]
  44.  
  45.  
The above is the description for some of the variables to be stored and the for some cases where we have single data ,need not be created as list.

Help me in fixing to get the data as mentioned above description
Jan 6 '08 #17

bvdet
Expert Mod 2.5K+
P: 2,851

Expand|Select|Wrap|Line Numbers
  1. Description
  2. >>> 
  3. ------------------------------------------------------------------
  4. Ok = [11, 12, 1, 3]  # This should be [[11, 12, 1, 3] ].In this we have one or more 
  5.  
  6. MAT = [[1], [100, 40, 30, 2.0, 0], [2], [200, 40, 60, 2.0, 0], [3], [600, 40, 30, 5.0, 0], [4], [500, 40, 70, 2.0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [3000, 1], [5000, 2], [6000, 3], [7000, 6], [0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [13000, 1], [45000, 2], [56000, 3], [87000, 6], [0, 0], [0], [2], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [3], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [0], [4, 'ALL'], [11, 1, 1, 1, 69716, 1000], [11, 1, 1, 5, 76296, 1000], [31, 1, 1, 6, 74926, 1000], [31, 1, 1, 7, 74653, 1000]]
  7.  
  8. #The mat should have only these values
  9. MAT = { 1:[100.,40.,30.,20.,0],2:[200.,40.,60.,2.0,0],3:[600,40.,30.,5.0,0],4:[500.,40.,70.,2.0,0]}
  10. -------------------------------------------------------------------------------
  11. #Other  block data should be
  12. {2:[[1000., 0.1],[2000. ,.2],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.1],[5000.,0.2],[6000.,.4],[7000.,.6]}
  13.  
  14.  {2:[[1000., 0.1],[2000. ,.5],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.9],[5000.,0.2],[6000.,.4],[7000.,.6]}
  15.  
  16. {2:[[0.00,2.0],[.210,2.0],[0.235,3.0]],3:[[0.00,2.0],[.2110,2.0],[0.2135,3.0]]}
  17.  
  18. ALL = [[ 11,1,1,1,69716.,1000],[ 11,1,1,5,76296.,1000],[ 31,1,1,6,74926.,1000],[ 31, 1,1,7,74653.,1000]]
  19.  
  20. -------------------------------------------------------------------------------
  21. LineThick = [0.10000000000000001]
  22. TOl2 = [0.01]
  23. TOl1 = [0.050000000000000003]
  24. ----------------------------------------------------------
  25. STOP = ['Line', 'and', 'Circle']
  26. #It should be  STOP = 'Line and Circle'
  27. ----------------------------------------------------------
  28. Top2 = [11]
  29. Top1 = [10]
  30. Bot4 = [40]
  31. Bot1 = [20]
  32. ----------------------------------------------------------
  33. NOT = [10, 11, 12, 1, 10, 11, 12, 2]
  34.  
  35. #it should be stored as 
  36. NOT = [[ 10,11,12,1],[10,11,12,2]]
  37.  
  38. ----------------------------------------------------------
  39. Flag2 = [1]
  40. Flag1 = [1]
  41. Parameter_range = [1, 1]
  42. Bot2 = [30]
  43. DummyFlag1 = [1]
  44.  
  45.  
The above is the description for some of the variables to be stored and the for some cases where we have single data ,need not be created as list.

Help me in fixing to get the data as mentioned above description
What you are saying is that the parsing must be customized for certain keywords. The code as is will parse all the data in a consistent manner. You must make an effort to solve this problem yourself. Post back your solution and we can try to help you from there.
Jan 6 '08 #18

100+
P: 440
What you are saying is that the parsing must be customized for certain keywords. The code as is will parse all the data in a consistent manner. You must make an effort to solve this problem yourself. Post back your solution and we can try to help you from there.
Expand|Select|Wrap|Line Numbers
  1. SampleCode
  2. key_patt = re.compile(r'/([A-Za-z_\-0-9]+)/')
  3. data_patt = re.compile(r'\d+\.\d+|\d+|-\.\d+|\w+') 
  4. def parse_data(fn):
  5.     key = None
  6.     bMFlag = False
  7.     iCount = 0
  8.     dataList = []
  9.     dd = {}
  10.  
  11.     matDataDict = {}
  12.     matCDict = {}
  13.     matTDict = {}
  14.     comFactDict = {}
  15.     bMCFlag = False
  16.     bMTFlag = False
  17.     bMatDataBlockFlag = False
  18.     bDataFlag = False
  19.     otherFList =[]
  20.     bmatStartFlag = True
  21.     bmatEndFlag = False
  22.     dataListList = []
  23.  
  24.     lineList = [strip_comments(line.strip()) for line in open(fn).readlines()\
  25.                 if line != '\n' and not line.startswith('$')]
  26.  
  27.     for line in lineList:
  28.         m = key_patt.search(line)
  29.         if m:
  30.             key = m.group(1)
  31.             line1 = line[indexList(line, '/')[1]+1:]
  32.             if key == 'NOT':
  33.                 dataList = [convertType(item) for item in \
  34.                                 data_patt.findall(line1)]
  35.                 dataListList.append(dataList)                
  36.             else:
  37.                 if data_patt.search(line1):                
  38.                     if dd.has_key(key):
  39.                         dd[key] = dd[key]+[convertType(item) for item in \
  40.                                            data_patt.findall(line1)]
  41.                     else:
  42.                         dd[key] = [convertType(item) for item in \
  43.                                    data_patt.findall(line1)]
  44.                 else:
  45.                     dd[key] = []
  46.                     bMFlag = True
  47.                     bMatDataBlockFlag = True
  48.         else:
  49.             if 'ALL' in line:
  50.                 bDataFlag = True
  51.                 bMatDataBlockFlag = False
  52.             elif bDataFlag:
  53.                 if bDataFlag and line != '\n':
  54.                     line1 = line.split()
  55.                     otherFList.append(line1)
  56.                 elif bDataFlag and '\n':
  57.                     bDataFlag = False
  58.             elif  bMatDataBlockFlag:                               
  59.                 if line.startswith('0') and  '0.  .0' != line and not line.startswith('0.  .0'):                
  60.                     bMFlag = False                
  61.                     if bMCFlag :
  62.                         bMCFlag = False
  63.                         bMTFlag = True                    
  64.                     else:
  65.                         if bMTFlag :
  66.                             bMTFlag = False
  67.                             iCount =0
  68.                         elif not bMCFlag :
  69.                             bMCFlag = True                            
  70.                             iCount =0                                                                
  71.                         else:
  72.                             pass                            
  73.                 else:
  74.                     if bMFlag:
  75.                         m1 = data_patt.search(line)
  76.                         if m1:
  77.                             if bmatStartFlag:
  78.                                 dataList = []
  79.                                 list1 = [convertType(n) for n in \
  80.                                             data_patt.findall(line)]
  81.                                 matID = list1[0]
  82.                                 bmatStartFlag = False
  83.                             else:
  84.                                 dataList = [convertType(n) for n in \
  85.                                             data_patt.findall(line)]
  86.                                 matDataDict[matID] = dataList
  87.                                 dataList = []
  88.                                 bmatStartFlag = True                                
  89.                     elif bMCFlag:                    
  90.                         if iCount ==0:
  91.                             dataList =[]
  92.                             line1 = line.split()
  93.                             matID = int(line1[0])
  94.                             iCount = iCount + 1                        
  95.                         #elif '0.  .0' != line and not line.startswith('0.  .0'):
  96.                         elif not line.startswith('0.  .0'):
  97.                             line1 = line.split()
  98.                             dataList.append([float( line1[0]),float(line1[1])])                        
  99.                         elif  '0.  .0' == line or line.startswith('0.  .0') :
  100.                             matCDict[matID] = dataList
  101.                             iCount =0
  102.                     elif bMTFlag:
  103.                         if iCount ==0:
  104.                             dataList = []
  105.                             line1 = line.split()
  106.                             matID = int(line1[0])
  107.                             iCount = iCount + 1                                                
  108.                         elif not line.startswith('0.  .0'):
  109.                             line1 = line.split()
  110.                             dataList.append([float( line1[0]),float(line1[1])])                                                
  111.                         elif '0.  .0' == line or line.startswith('0.  .0'):
  112.                             matTDict[matID] = dataList
  113.                             iCount =0
  114.                     elif not bMCFlag and not bMTFlag:
  115.                         if iCount ==0:
  116.                             dataList = []
  117.                             line1 = line.split()
  118.                             matID = int(line1[0])
  119.                             iCount = iCount + 1                                                
  120.                         elif not line.startswith('0.  .0'):
  121.                             line1 = line.split()                        
  122.                             dataList.append([float( line1[1]),float( line1[0])])                        
  123.                         elif  '0.  .0' == line or line.startswith('0.  .0'):
  124.                             comFactDict[matID] = dataList
  125.                             iCount =0                    
  126.  
  127.     dd['NOT'] =dataListList
  128.     print 'matDataDict',matDataDict                            
  129.     print 'matCDict',matCDict
  130.     print 'matTDict',matTDict
  131.     print 'comFactDict',comFactDict
  132.     print ',otherFList',otherFList
  133.     return dd
  134.  
Please find the solution.Let me know whether this can be done in more precise and better way.

Thanks
PSB
Jan 11 '08 #19

100+
P: 440
Expand|Select|Wrap|Line Numbers
  1. Output
  2. matDataDict {1: [100, 40, 30, 2.0, 0], 2: [200, 40, 60, 2.0, 0], 3: [600, 40, 30, 5.0, 0], 4: [500, 40, 70, 2.0, 0]}
  3. matCDict {2: [[1000.0, 0.10000000000000001], [2000.0, 0.20000000000000001], [3000.0, 0.29999999999999999], [4000.0, 0.59999999999999998]], 3: [[3000.0, 0.10000000000000001], [5000.0, 0.20000000000000001], [6000.0, 0.40000000000000002], [7000.0, 0.59999999999999998]]}
  4. matTDict {2: [[1000.0, 0.10000000000000001], [2000.0, 0.5], [3000.0, 0.29999999999999999], [4000.0, 0.59999999999999998]], 3: [[13000.0, 0.90000000000000002], [45000.0, 0.20000000000000001], [56000.0, 0.29999999999999999], [87000.0, 0.59999999999999998]]}
  5. comFactDict {2: [[0.0, 2.0], [0.20999999999999999, 2.0], [0.23499999999999999, 3.0]], 3: [[0.0, 2.0], [0.21099999999999999, 2.0], [0.23150000000000001, 3.0]]}
  6. ,otherFList [['11', '1', '1', '1', '69716.', '1000'], ['11', '1', '1', '5', '76296.', '1000'], ['31', '1', '1', '6', '74926.', '1000'], ['31', '1', '1', '7', '74653.', '1000']]
  7. Ok = [11, 12, 1, 3]
  8. MAT = []
  9. LineThick = [0.10000000000000001]
  10. TOl2 = [0.01]
  11. TOl1 = [0.050000000000000003]
  12. STOP = ['Line', 'and', 'Circle']
  13. Top2 = [11]
  14. Top1 = [10]
  15. Bot4 = [40]
  16. Bot1 = [20]
  17. NOT = [[10, 11, 12, 1], [10, 11, 12, 2]]
  18. Flag2 = [1]
  19. Flag1 = [1]
  20. Parameter_range = [1, 1]
  21. Bot2 = [30]
  22. DummyFlag1 = [1]
  23.  
  24.  
Jan 11 '08 #20

100+
P: 440
BV,

Your suggestion is required.

Thanks
PSB
Jan 11 '08 #21

bvdet
Expert Mod 2.5K+
P: 2,851
BV,

Your suggestion is required.

Thanks
PSB
I don't have time to do it right now, as work deadlines are approaching. I will try to look at it later.
Jan 12 '08 #22

Post your reply

Sign in to post your reply or Sign up for a free account.