473,408 Members | 2,450 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,408 software developers and data experts.

File Parsing

440 256MB
Hi ,

Below is the file format ,which has Keywords in the file.I would like to store the data in the different variables ( Parameters,Points ,Lines ,Circle)

Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2.  
  3. $$$$Header$$$$$$$$$$$$
  4. $$$$Parameter$$$$$$$$$
  5.  
  6. /Parameter_Value/ 1.0
  7.  
  8. /Point/ 
  9. 10.0 10.0 10.0 $ Comment: Point Data
  10. 20.0 20.0 20.0 
  11.  
  12. $$$$$$Line$$$$$$$$
  13.  
  14. /Line/ $Line Data
  15.  
  16. 10.0 15.0 0.0
  17. 20.0 10.0 0.0 
  18.  
  19. $$$$$$Circle$$$$$$$$
  20. /Circle/
  21.  
  22. 10.0 $Radius
  23.  
  24. 0.0 0.0 0.0  $Center
  25.  
Can body help me in the best way ( Oprtimized way - interms of lines of code) of writing the code.

Thanks
PSB
Aug 10 '07 #1
21 1680
psbasha
440 256MB
Hi ,

Below is the file format ,which has Keywords in the file.I would like to store the data in the different variables ( Parameters,Points ,Lines ,Circle)

Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2.  
  3. $$$$Header$$$$$$$$$$$$
  4. $$$$Parameter$$$$$$$$$
  5.  
  6. /Parameter_Value/ 1.0
  7.  
  8. /Point/ 
  9. 10.0 10.0 10.0 $ Comment: Point Data
  10. 20.0 20.0 20.0 
  11.  
  12. $$$$$$Line$$$$$$$$
  13.  
  14. /Line/ $Line Data
  15.  
  16. 10.0 15.0 0.0
  17. 20.0 10.0 0.0 
  18.  
  19. $$$$$$Circle$$$$$$$$
  20. /Circle/
  21.  
  22. 10.0 $Radius
  23.  
  24. 0.0 0.0 0.0  $Center
  25.  
Can body help me in the best way ( Oprtimized way - interms of lines of code) of writing the code.

Thanks
PSB
The above is a sample data only.I have to read different Unique Geometry elements data in that file format having the different and unique key word

-PSB
Aug 10 '07 #2
bvdet
2,851 Expert Mod 2GB
The above is a sample data only.I have to read different Unique Geometry elements data in that file format having the different and unique key word

-PSB
It's probably not the best way, but it seems to work. All dictionary values are lists:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. key_patt = re.compile(r'/([A-Za-z_]+)/')
  4. data_patt = re.compile(r'\d+.\d+')
  5. fn = 'data.txt'
  6.  
  7. f = open(fn)
  8. key = None
  9. dd = {}
  10. lineList = [line.strip() for line in open(fn).readlines() \
  11.             if line != '\n' and not line.startswith('$')]
  12. for line in lineList:
  13.     try:
  14.         line = line[:line.index('$')]
  15.     except:
  16.         pass
  17.     m = key_patt.search(line)
  18.     if m:
  19.         key = m.group(1)
  20.         dd[key] = []
  21.         if data_patt.search(line):
  22.             dd[key] = [float(data_patt.search(line).group(0))]
  23.         else:
  24.             dd[key] = []
  25.     else:
  26.         m1 = data_patt.search(line)
  27.         if m1:
  28.             dd[key].append([float(n) for n in data_patt.findall(line)])
  29.  
  30. for key in dd:
  31.     print '%s = %s' % (key, dd[key])
Did you ever resolve the point translation issue (this thread )? You never responded after I posted what I thought was a solution for you. A little feedback would be appreciated. Here's the output:
>>> Line = [[10.0, 15.0, 0.0], [20.0, 10.0, 0.0]]
Parameter_Value = [1.0]
Circle = [[10.0], [0.0, 0.0, 0.0]]
Point = [[10.0, 10.0, 10.0], [20.0, 20.0, 20.0]]
>>>
Aug 11 '07 #3
psbasha
440 256MB
It's probably not the best way, but it seems to work. All dictionary values are lists:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. key_patt = re.compile(r'/([A-Za-z_]+)/')
  4. data_patt = re.compile(r'\d+.\d+')
  5. fn = 'data.txt'
  6.  
  7. f = open(fn)
  8. key = None
  9. dd = {}
  10. lineList = [line.strip() for line in open(fn).readlines() \
  11.             if line != '\n' and not line.startswith('$')]
  12. for line in lineList:
  13.     try:
  14.         line = line[:line.index('$')]
  15.     except:
  16.         pass
  17.     m = key_patt.search(line)
  18.     if m:
  19.         key = m.group(1)
  20.         dd[key] = []
  21.         if data_patt.search(line):
  22.             dd[key] = [float(data_patt.search(line).group(0))]
  23.         else:
  24.             dd[key] = []
  25.     else:
  26.         m1 = data_patt.search(line)
  27.         if m1:
  28.             dd[key].append([float(n) for n in data_patt.findall(line)])
  29.  
  30. for key in dd:
  31.     print '%s = %s' % (key, dd[key])
Did you ever resolve the point translation issue (this thread )? You never responded after I posted what I thought was a solution for you. A little feedback would be appreciated. Here's the output:
>>> Line = [[10.0, 15.0, 0.0], [20.0, 10.0, 0.0]]
Parameter_Value = [1.0]
Circle = [[10.0], [0.0, 0.0, 0.0]]
Point = [[10.0, 10.0, 10.0], [20.0, 20.0, 20.0]]
>>>
Thanks BV for the solution.
The Point translation problem I have took the portion of the code snippet and solved with your approach.But if you have better approach than previous one,you can post the solution.So that I can use that approach.

-PSB
Aug 11 '07 #4
psbasha
440 256MB
Hi,

I have the below file format,how to read in a concise way,
The file looke like this
Expand|Select|Wrap|Line Numbers
  1. Sample Data
  2. 4 Types
  3. _up,1
  4. _low,2
  5. _left,5
  6. _right,6
  7.  
  8. 2Flags
  9. _low,no
  10. _up,yes
  11.  
  12. 1 Data
  13. x, 10
  14.  
  15. 4 Values
  16. 1,0,0
  17. 1,1,0
  18. 1,1,1
  19. 1,1,0
  20.  
  21. 2 Planes Type-1
  22. 1,0,0
  23. 0,0,0
  24. 0,1,0
  25. 0,0,0
  26. 0,1,0
  27. 0,0,1
  28.  
  29.  
In case of plane there are 2 planes defined,we have to have 2plane data seperate.

Thanks
PSB
Aug 24 '07 #5
psbasha
440 256MB
It's probably not the best way, but it seems to work. All dictionary values are lists:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. key_patt = re.compile(r'/([A-Za-z_]+)/')
  4. data_patt = re.compile(r'\d+.\d+')
  5. fn = 'data.txt'
  6.  
  7. f = open(fn)
  8. key = None
  9. dd = {}
  10. lineList = [line.strip() for line in open(fn).readlines() \
  11.             if line != '\n' and not line.startswith('$')]
  12. for line in lineList:
  13.     try:
  14.         line = line[:line.index('$')]
  15.     except:
  16.         pass
  17.     m = key_patt.search(line)
  18.     if m:
  19.         key = m.group(1)
  20.         dd[key] = []
  21.         if data_patt.search(line):
  22.             dd[key] = [float(data_patt.search(line).group(0))]
  23.         else:
  24.             dd[key] = []
  25.     else:
  26.         m1 = data_patt.search(line)
  27.         if m1:
  28.             dd[key].append([float(n) for n in data_patt.findall(line)])
  29.  
  30. for key in dd:
  31.     print '%s = %s' % (key, dd[key])
Did you ever resolve the point translation issue (this thread )? You never responded after I posted what I thought was a solution for you. A little feedback would be appreciated. Here's the output:
>>> Line = [[10.0, 15.0, 0.0], [20.0, 10.0, 0.0]]
Parameter_Value = [1.0]
Circle = [[10.0], [0.0, 0.0, 0.0]]
Point = [[10.0, 10.0, 10.0], [20.0, 20.0, 20.0]]
>>>
Hi BV,

I have tried with above piece of code for reading some more filed formats as mentioned below ,the peice of code is not supporting this field format.Can you please suggest how to group for digits and alphanumeric values for the below scenarios.

Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2. Sample.txt
  3.  
  4. $$$$Header$$$$$$$$$$$$
  5. $$$$Parameter$$$$$$$$$
  6.  
  7. /Parameter_Value/ 1.0
  8.  
  9. /Point/ 
  10. 10.0 10.0 10.0 $ Comment: Point Data
  11. 20.0 20.0 20.0 
  12.  
  13. $$$$$$Line$$$$$$$$
  14.  
  15. /Line/ $Line Data
  16.  
  17. 10.0 15.0 0.0
  18. 20.0 10.0 0.0 
  19.  
  20. $$$$$$Circle$$$$$$$$
  21. /Circle/
  22.  
  23. 10.0 $Radius
  24.  
  25. 0.0 0.0 0.0  $Center
  26.  
  27.  
  28. /DashedLineType/            21 $Dashed Line
  29.  
  30. /XMin_XMax_YMin_YMax/        1 27 1 37 $ Min and Max value
  31.  
  32. /LineFlag/        yes $ Flag to update
  33.  
  34.  
  35. /XY-Plane/ 'Planes'
  36. 1,0,0
  37. 0,1,0
  38. 0,0,0
  39.  
  40. /XY-Plane/ 'Planes'
  41. 2,0,0
  42. 0,2,0
  43. 0,0,0
  44.  
  45.  
  46. /Format/
  47. $Values    
  48.     3     3     1    50    25    28   'Yes'  1 
  49.  
  50.  
Thanks
PSB
Sep 14 '07 #6
bvdet
2,851 Expert Mod 2GB
Hi BV,

I have tried with above piece of code for reading some more filed formats as mentioned below ,the peice of code is not supporting this field format.Can you please suggest how to group for digits and alphanumeric values for the below scenarios.

Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2. Sample.txt
  3.  
  4. $$$$Header$$$$$$$$$$$$
  5. $$$$Parameter$$$$$$$$$
  6.  
  7. /Parameter_Value/ 1.0
  8.  
  9. /Point/ 
  10. 10.0 10.0 10.0 $ Comment: Point Data
  11. 20.0 20.0 20.0 
  12.  
  13. $$$$$$Line$$$$$$$$
  14.  
  15. /Line/ $Line Data
  16.  
  17. 10.0 15.0 0.0
  18. 20.0 10.0 0.0 
  19.  
  20. $$$$$$Circle$$$$$$$$
  21. /Circle/
  22.  
  23. 10.0 $Radius
  24.  
  25. 0.0 0.0 0.0  $Center
  26.  
  27.  
  28. /DashedLineType/            21 $Dashed Line
  29.  
  30. /XMin_XMax_YMin_YMax/        1 27 1 37 $ Min and Max value
  31.  
  32. /LineFlag/        yes $ Flag to update
  33.  
  34.  
  35. /XY-Plane/ 'Planes'
  36. 1,0,0
  37. 0,1,0
  38. 0,0,0
  39.  
  40. /XY-Plane/ 'Planes'
  41. 2,0,0
  42. 0,2,0
  43. 0,0,0
  44.  
  45.  
  46. /Format/
  47. $Values    
  48.     3     3     1    50    25    28   'Yes'  1 
  49.  
  50.  
Thanks
PSB
When I write data to a file, I always set up a structured format that is easy to parse. You should try it. This code seems to work:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. # thanks ilikepython!
  4. def indexList(s, item, start = 0):
  5.     return [i + start for (i, obj) in enumerate(s[start:]) if obj == item]
  6.  
  7. def convertType(s):
  8.     for func in (int, float, eval):
  9.         try:
  10.             n = func(s)
  11.             return n
  12.         except:
  13.             pass
  14.     return s
  15.  
  16. key_patt = re.compile(r'/([A-Za-z_-]+)/')
  17. data_patt = re.compile(r'\d+\.\d+|\d+|\w+')
  18. fn = 'parameter.txt'
  19.  
  20. key = None
  21. dd = {}
  22. lineList = [line.strip() for line in open(fn).readlines() if line != '\n' and not line.startswith('$')]
  23. for line in lineList:
  24.     try:
  25.         line = line[:line.index('$')]
  26.     except:
  27.         pass
  28.     m = key_patt.search(line)
  29.     if m:
  30.         key = m.group(1)
  31.         line1 = line[indexList(line, '/')[1]+1:]
  32.         if data_patt.search(line1):
  33.             if dd.has_key(key):
  34.                 dd[key] = dd[key]+[convertType(item) for item in data_patt.findall(line1)]
  35.             else:
  36.                 dd[key] = [convertType(item) for item in data_patt.findall(line1)]
  37.         else:
  38.             dd[key] = []
  39.     else:
  40.         m1 = data_patt.search(line)
  41.         if m1:
  42.             dd[key].append([convertType(n) for n in data_patt.findall(line)])
  43.  
  44. for key in dd:
  45.     print '%s = %s' % (key, dd[key])
>>> DashedLineType = [21]
Parameter_Value = [1.0]
Point = [[10.0, 10.0, 10.0], [20.0, 20.0, 20.0]]
XY-Plane = ['Planes', [1, 0, 0], [0, 1, 0], [0, 0, 0], 'Planes', [2, 0, 0], [0, 2, 0], [0, 0, 0]]
Format = [[3, 3, 1, 50, 25, 28, 'Yes', 1]]
XMin_XMax_YMin_YMax = [1, 27, 1, 37]
LineFlag = ['yes']
Line = [[10.0, 15.0, 0.0], [20.0, 10.0, 0.0]]
Circle = [[10.0], [0.0, 0.0, 0.0]]
>>>
Sep 14 '07 #7
psbasha
440 256MB
Expand|Select|Wrap|Line Numbers
  1. SampleTest
  2.  
  3. $$$$Header$$$$$$$$$$$$
  4. $$$$Parameter$$$$$$$$$
  5.  
  6. /Parameter_range/ 1 1
  7.  
  8. /Flag1/ 1
  9. /Flag2/ 1
  10. /DummyFlag1/ 1
  11.  
  12. /STOP/ Line and Circle
  13.  
  14. $$$$
  15.  
  16. /LineThick/ 0.1 $$$Line Thickness
  17.  
  18. $$$$
  19.  
  20. /Top1/ 10 $$Value1
  21. /Top2/ 11 $$Value2
  22.  
  23.  $$$
  24. /Bot1/  20 $$Comment
  25. /Bot2/ 30 $$Comment
  26. /Bot4/ 40 $$Comment
  27.  
  28. $$
  29. /TOl1/ -0.05
  30. /TOl2/ 0.01
  31.  
  32. $$$$$$Line IDs$$$$$$$$
  33.  
  34. /NOT/  10 11 12 1
  35. /NOT/  10 11 12 2
  36. /Ok/   11 12 1  3
  37.  
  38. /MAT/ $$
  39. 1 $Begin
  40. 100.    40.    30.    2.0    0 ****22 ksdas
  41. 2
  42. 200.    40.    60.    2.0    0 ****22 ksdas
  43. 3
  44. 600.    40.    30.    5.0    0 ****22 ksdas
  45. 4
  46. 500.    40.    70.    2.0    0 ****22 ksdas
  47. 0 $End
  48. 2 ***Values $Begin  
  49. 1000.  .1
  50. 2000.  .2
  51. 3000.  .3
  52. 4000.  .6
  53.    0.  .0 $End
  54.  
  55. 3 ***Values $Begin  
  56. 3000.  .1
  57. 5000.  .2
  58. 6000.  .3
  59. 7000.  .6
  60.    0.  .0 $End
  61. 0 $End
  62.  
  63. 2 ***Values $Begin  
  64. 1000.  .1
  65. 2000.  .2
  66. 3000.  .3
  67. 4000.  .6
  68.    0.  .0 $End
  69.  
  70. 3 ***Values $Begin  
  71. 13000.  .1
  72. 45000.  .2
  73. 56000.  .3
  74. 87000.  .6
  75.     0.  .0 $End
  76. 0 $End
  77. 2 $Begin
  78.     2.0 .00
  79.     2.0 .210
  80.     3.0 .235
  81.     0.  .0 $End
  82. 3 $Begin
  83.     2.0 .00
  84.     2.0 .210
  85.     3.0 .235
  86.     0.  .0 $End
  87. 0 $End
  88. /4*ALL/ $ ***
  89.  11       1       1       1     69716.   1000
  90.  11       1       1       5     76296.   1000
  91.  31       1       1       6     74926.   1000
  92.  31       1       1       7     74653.   1000
  93.  
I have using the above sameple code for reading and storing the data.But I am getting the following error as mentioned below.How to cutomize the above piece of code for reading the above sample file?

PythonWin 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on win32.
File "C:\Sample-Mat.py", line 42, in ?
dd[key].append([convertType(n) for n in data_patt.findall(line)])

Thanks
PSB
Jan 5 '08 #8
bvdet
2,851 Expert Mod 2GB
You will have to explain how you need the data tabulated. I have no idea what most of the data is.
Jan 5 '08 #9
psbasha
440 256MB
You will have to explain how you need the data tabulated. I have no idea what most of the data is.
Expand|Select|Wrap|Line Numbers
  1. Description
  2. Hi BV,
  3.  
  4. we have the kewords in the '/ /'.The respective data is available beside or below the keywords for some cases.
  5.  
  6. The data should be stored as shwon below ,but using the dict and list using regular expression.
  7.  
  8. parameter_range = [1,1]
  9.  
  10. Flag1 = 1
  11. .....
  12.  
  13. STOP = 'Line and Circel'
  14.  
  15. Top1 = 10
  16. Top2 = 11
  17. ...
  18.  
  19. Bot1 = 20
  20. Bot2 = 30
  21. ....
  22.  
  23. Tol1 = -0.05
  24. Tol2 = 0.01
  25.  
  26. NOT = [[ 10,11,12,1],[10,11,12,2]]
  27. OK = [[ 11,12,1,3]]
  28. MAT = { 1:[100.,40.,30.,20.,0],2:[200.,40.,60.,2.0,0],3:[600,40.,30.,5.0,0],4:[500.,40.,70.,2.0,0]}
  29.  
  30. # 2-integer number  is the start for the block and '0. .0' is the end
  31. MATc = {2:[[1000., 0.1],[2000. ,.2],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.1],[5000.,0.2],[6000.,.4],[7000.,.6]}
  32. # 0. .0 is the end of the sub block
  33. # o is the end of the block
  34.  
  35. #Similarly for the other block
  36. # 2-integer number  is the start for the block and '0. .0' is the end
  37. MATT = {2:[[1000., 0.1],[2000. ,.5],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.9],[5000.,0.2],[6000.,.4],[7000.,.6]}
  38. # 0. .0 is the end of the sub block
  39. # o is the end of the block
  40.  
  41. #Similarly for the other block
  42. # 2-integer number  is the start for the block and '0. .0' is the end
  43.  
  44. Factor = {2:[[0.00,2.0],[.210,2.0],[0.235,3.0]],3:[[0.00,2.0],[.2110,2.0],[0.2135,3.0]]}
  45. # 0. .0 is the end of the sub block
  46. # o is the end of the block
  47.  
  48.  
  49.  
  50. ALL = [[ 11,1,1,1,69716.,1000],[ 11,1,1,5,76296.,1000],[ 31,1,1,6,74926.,1000],[ 31, 1,1,7,74653.,1000]]
  51.  
Jan 5 '08 #10
bvdet
2,851 Expert Mod 2GB
Where does MATT, MATc, and Factor come from?
Jan 6 '08 #11
bvdet
2,851 Expert Mod 2GB
BTW, your script fails on your data because all comment lines must begin with '$'. It fails on the first line of data.
Jan 6 '08 #12
psbasha
440 256MB
Where does MATT, MATc, and Factor come from?
Sorry I am explaining how the data can be stored in the dictonary variables or over all data..Its only example to store the data.

MATT,MATc and Factor are variables.

Thanks
PSB
Jan 6 '08 #13
psbasha
440 256MB
BTW, your script fails on your data because all comment lines must begin with '$'. It fails on the first line of data.

Sorry ,all the commnets start with '$' sign.
Jan 6 '08 #14
bvdet
2,851 Expert Mod 2GB
I made a few minor changes to the code in the earlier solution. Following is the entire source code and output from your data file (with the first line commented out):
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. def indexList(s, item, i=0):
  4.     i_list = []
  5.     while True:
  6.         try:
  7.             i = s.index(item, i)
  8.             i_list.append(i)
  9.             i += 1
  10.         except:
  11.             break
  12.     return i_list
  13.  
  14. def convertType(s):
  15.     for func in (int, float, eval):
  16.         try:
  17.             n = func(s)
  18.             return n
  19.         except:
  20.             pass
  21.     return s
  22.  
  23. key_patt = re.compile(r'/([A-Za-z_\-0-9]+)/')
  24. data_patt = re.compile(r'\d+\.\d+|\d+|\w+')
  25.  
  26. # function to strip comments
  27. def strip_comments(s):
  28.     if '$' in s:
  29.         return s[:s.index('$')]
  30.     elif '*' in s:
  31.         return s[:s.index('*')]
  32.     return s
  33.  
  34. def parse_data(fn):
  35.     key = None
  36.     dd = {}
  37.     lineList = [strip_comments(line.strip()) for line in open(fn).readlines()\
  38.                 if line != '\n' and not line.startswith('$')]
  39.     for line in lineList:
  40.         m = key_patt.search(line)
  41.         if m:
  42.             key = m.group(1)
  43.             line1 = line[indexList(line, '/')[1]+1:]
  44.             if data_patt.search(line1):
  45.                 if dd.has_key(key):
  46.                     dd[key] = dd[key]+[convertType(item) for item in \
  47.                                        data_patt.findall(line1)]
  48.                 else:
  49.                     dd[key] = [convertType(item) for item in \
  50.                                data_patt.findall(line1)]
  51.             else:
  52.                 dd[key] = []
  53.         else:
  54.             m1 = data_patt.search(line)
  55.             if m1:
  56.                 dd[key].append([convertType(n) for n in \
  57.                                 data_patt.findall(line)])
  58.     return dd
  59.  
  60. if __name__ == '__main__':
  61.     #fn = r'H:\TEMP\temsys\parameter.txt'
  62.     fn = r'H:\TEMP\temsys\sample_data1.txt'
  63.     dataDict = parse_data(fn)
  64.     for key in dataDict:
  65.         print '%s = %s' % (key, dataDict[key])
  66.  
  67. >>> Ok = [11, 12, 1, 3]
  68. MAT = [[1], [100, 40, 30, 2.0, 0], [2], [200, 40, 60, 2.0, 0], [3], [600, 40, 30, 5.0, 0], [4], [500, 40, 70, 2.0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [3000, 1], [5000, 2], [6000, 3], [7000, 6], [0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [13000, 1], [45000, 2], [56000, 3], [87000, 6], [0, 0], [0], [2], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [3], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [0], [4, 'ALL'], [11, 1, 1, 1, 69716, 1000], [11, 1, 1, 5, 76296, 1000], [31, 1, 1, 6, 74926, 1000], [31, 1, 1, 7, 74653, 1000]]
  69. LineThick = [0.10000000000000001]
  70. TOl2 = [0.01]
  71. TOl1 = [0.050000000000000003]
  72. STOP = ['Line', 'and', 'Circle']
  73. Top2 = [11]
  74. Top1 = [10]
  75. Bot4 = [40]
  76. Bot1 = [20]
  77. NOT = [10, 11, 12, 1, 10, 11, 12, 2]
  78. Flag2 = [1]
  79. Flag1 = [1]
  80. Parameter_range = [1, 1]
  81. Bot2 = [30]
  82. DummyFlag1 = [1]
  83. >>> 
I understand that this is not your final solution. Maybe you can come up with a way to parse the 'MAT' data.
Jan 6 '08 #15
psbasha
440 256MB
BTW, your script fails on your data because all comment lines must begin with '$'. It fails on the first line of data.
Expand|Select|Wrap|Line Numbers
  1. SampleInputData
  2.  
  3. $$$$Header$$$$$$$$$$$$
  4. $$$$Parameter$$$$$$$$$
  5.  
  6. /Parameter_range/ 1 1
  7.  
  8. /Flag1/ 1
  9. /Flag2/ 1
  10. /DummyFlag1/ 1
  11.  
  12. /STOP/ Line and Circle
  13.  
  14. $$$$
  15.  
  16. /LineThick/ 0.1 $$$Line Thickness
  17.  
  18. $$$$
  19.  
  20. /Top1/ 10 $$Value1
  21. /Top2/ 11 $$Value2
  22.  
  23.  $$$
  24. /Bot1/  20 $$Comment
  25. /Bot2/ 30 $$Comment
  26. /Bot4/ 40 $$Comment
  27.  
  28. $$
  29. /TOl1/ -0.05
  30. /TOl2/ 0.01
  31.  
  32. $$$$$$Line IDs$$$$$$$$
  33.  
  34. /NOT/  10 11 12 1
  35. /NOT/  10 11 12 2
  36. /Ok/   11 12 1  3
  37.  
  38. /MAT/ $$
  39. 1 $Begin
  40. 100.    40. 30.  2.0   0 ****22 ksdas
  41. 2
  42. 200.    40. 60.  2.0   0 ****22 ksdas
  43. 3
  44. 600.    40. 30.  5.0   0 ****22 ksdas
  45. 4
  46. 500.    40. 70.  2.0   0 ****22 ksdas
  47. 0 $End
  48. 2 ***Values $Begin  
  49. 1000.  .1
  50. 2000.  .2
  51. 3000.  .3
  52. 4000.  .6
  53.    0.  .0 $End
  54.  
  55. 3 ***Values $Begin  
  56. 3000.  .1
  57. 5000.  .2
  58. 6000.  .3
  59. 7000.  .6
  60.    0.  .0 $End
  61. 0 $End
  62.  
  63. 2 ***Values $Begin  
  64. 1000.  .1
  65. 2000.  .2
  66. 3000.  .3
  67. 4000.  .6
  68.    0.  .0 $End
  69.  
  70. 3 ***Values $Begin  
  71. 13000.  .1
  72. 45000.  .2
  73. 56000.  .3
  74. 87000.  .6
  75.     0.  .0 $End
  76. 0 $End
  77. 2 $Begin
  78.     2.0 .00
  79.     2.0 .210
  80.     3.0 .235
  81.     0.  .0 $End
  82. 3 $Begin
  83.     2.0 .00
  84.     2.0 .210
  85.     3.0 .235
  86.     0.  .0 $End
  87. 0 $End
  88. /4*ALL/ $ ***
  89.  11       1       1       1     69716.   1000
  90.  11       1       1       5     76296.   1000
  91.  31       1       1       6     74926.   1000
  92.  31       1       1       7     74653.   1000
  93.  
Jan 6 '08 #16
psbasha
440 256MB
>>> Ok = [11, 12, 1, 3]
MAT = [[1], [100, 40, 30, 2.0, 0], [2], [200, 40, 60, 2.0, 0], [3], [600, 40, 30, 5.0, 0], [4], [500, 40, 70, 2.0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [3000, 1], [5000, 2], [6000, 3], [7000, 6], [0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [13000, 1], [45000, 2], [56000, 3], [87000, 6], [0, 0], [0], [2], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [3], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [0], [4, 'ALL'], [11, 1, 1, 1, 69716, 1000], [11, 1, 1, 5, 76296, 1000], [31, 1, 1, 6, 74926, 1000], [31, 1, 1, 7, 74653, 1000]]
LineThick = [0.10000000000000001]
TOl2 = [0.01]
TOl1 = [0.050000000000000003]
STOP = ['Line', 'and', 'Circle']
Top2 = [11]
Top1 = [10]
Bot4 = [40]
Bot1 = [20]
NOT = [10, 11, 12, 1, 10, 11, 12, 2]
Flag2 = [1]
Flag1 = [1]
Parameter_range = [1, 1]
Bot2 = [30]
DummyFlag1 = [1]
>>> [/code]I understand that this is not your final solution. Maybe you can come up with a way to parse the 'MAT' data.[/quote]

Expand|Select|Wrap|Line Numbers
  1. Description
  2. >>> 
  3. ------------------------------------------------------------------
  4. Ok = [11, 12, 1, 3]  # This should be [[11, 12, 1, 3] ].In this we have one or more 
  5.  
  6. MAT = [[1], [100, 40, 30, 2.0, 0], [2], [200, 40, 60, 2.0, 0], [3], [600, 40, 30, 5.0, 0], [4], [500, 40, 70, 2.0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [3000, 1], [5000, 2], [6000, 3], [7000, 6], [0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [13000, 1], [45000, 2], [56000, 3], [87000, 6], [0, 0], [0], [2], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [3], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [0], [4, 'ALL'], [11, 1, 1, 1, 69716, 1000], [11, 1, 1, 5, 76296, 1000], [31, 1, 1, 6, 74926, 1000], [31, 1, 1, 7, 74653, 1000]]
  7.  
  8. #The mat should have only these values
  9. MAT = { 1:[100.,40.,30.,20.,0],2:[200.,40.,60.,2.0,0],3:[600,40.,30.,5.0,0],4:[500.,40.,70.,2.0,0]}
  10. -------------------------------------------------------------------------------
  11. #Other  block data should be
  12. {2:[[1000., 0.1],[2000. ,.2],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.1],[5000.,0.2],[6000.,.4],[7000.,.6]}
  13.  
  14.  {2:[[1000., 0.1],[2000. ,.5],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.9],[5000.,0.2],[6000.,.4],[7000.,.6]}
  15.  
  16. {2:[[0.00,2.0],[.210,2.0],[0.235,3.0]],3:[[0.00,2.0],[.2110,2.0],[0.2135,3.0]]}
  17.  
  18. ALL = [[ 11,1,1,1,69716.,1000],[ 11,1,1,5,76296.,1000],[ 31,1,1,6,74926.,1000],[ 31, 1,1,7,74653.,1000]]
  19.  
  20. -------------------------------------------------------------------------------
  21. LineThick = [0.10000000000000001]
  22. TOl2 = [0.01]
  23. TOl1 = [0.050000000000000003]
  24. ----------------------------------------------------------
  25. STOP = ['Line', 'and', 'Circle']
  26. #It should be  STOP = 'Line and Circle'
  27. ----------------------------------------------------------
  28. Top2 = [11]
  29. Top1 = [10]
  30. Bot4 = [40]
  31. Bot1 = [20]
  32. ----------------------------------------------------------
  33. NOT = [10, 11, 12, 1, 10, 11, 12, 2]
  34.  
  35. #it should be stored as 
  36. NOT = [[ 10,11,12,1],[10,11,12,2]]
  37.  
  38. ----------------------------------------------------------
  39. Flag2 = [1]
  40. Flag1 = [1]
  41. Parameter_range = [1, 1]
  42. Bot2 = [30]
  43. DummyFlag1 = [1]
  44.  
  45.  
The above is the description for some of the variables to be stored and the for some cases where we have single data ,need not be created as list.

Help me in fixing to get the data as mentioned above description
Jan 6 '08 #17
bvdet
2,851 Expert Mod 2GB

Expand|Select|Wrap|Line Numbers
  1. Description
  2. >>> 
  3. ------------------------------------------------------------------
  4. Ok = [11, 12, 1, 3]  # This should be [[11, 12, 1, 3] ].In this we have one or more 
  5.  
  6. MAT = [[1], [100, 40, 30, 2.0, 0], [2], [200, 40, 60, 2.0, 0], [3], [600, 40, 30, 5.0, 0], [4], [500, 40, 70, 2.0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [3000, 1], [5000, 2], [6000, 3], [7000, 6], [0, 0], [0], [2, 'Values'], [1000, 1], [2000, 2], [3000, 3], [4000, 6], [0, 0], [3, 'Values'], [13000, 1], [45000, 2], [56000, 3], [87000, 6], [0, 0], [0], [2], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [3], [2.0, 0], [2.0, 210], [3.0, 235], [0, 0], [0], [4, 'ALL'], [11, 1, 1, 1, 69716, 1000], [11, 1, 1, 5, 76296, 1000], [31, 1, 1, 6, 74926, 1000], [31, 1, 1, 7, 74653, 1000]]
  7.  
  8. #The mat should have only these values
  9. MAT = { 1:[100.,40.,30.,20.,0],2:[200.,40.,60.,2.0,0],3:[600,40.,30.,5.0,0],4:[500.,40.,70.,2.0,0]}
  10. -------------------------------------------------------------------------------
  11. #Other  block data should be
  12. {2:[[1000., 0.1],[2000. ,.2],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.1],[5000.,0.2],[6000.,.4],[7000.,.6]}
  13.  
  14.  {2:[[1000., 0.1],[2000. ,.5],[3000,0.3],[4000.,0.6]],3:[ [3000.0,0.9],[5000.,0.2],[6000.,.4],[7000.,.6]}
  15.  
  16. {2:[[0.00,2.0],[.210,2.0],[0.235,3.0]],3:[[0.00,2.0],[.2110,2.0],[0.2135,3.0]]}
  17.  
  18. ALL = [[ 11,1,1,1,69716.,1000],[ 11,1,1,5,76296.,1000],[ 31,1,1,6,74926.,1000],[ 31, 1,1,7,74653.,1000]]
  19.  
  20. -------------------------------------------------------------------------------
  21. LineThick = [0.10000000000000001]
  22. TOl2 = [0.01]
  23. TOl1 = [0.050000000000000003]
  24. ----------------------------------------------------------
  25. STOP = ['Line', 'and', 'Circle']
  26. #It should be  STOP = 'Line and Circle'
  27. ----------------------------------------------------------
  28. Top2 = [11]
  29. Top1 = [10]
  30. Bot4 = [40]
  31. Bot1 = [20]
  32. ----------------------------------------------------------
  33. NOT = [10, 11, 12, 1, 10, 11, 12, 2]
  34.  
  35. #it should be stored as 
  36. NOT = [[ 10,11,12,1],[10,11,12,2]]
  37.  
  38. ----------------------------------------------------------
  39. Flag2 = [1]
  40. Flag1 = [1]
  41. Parameter_range = [1, 1]
  42. Bot2 = [30]
  43. DummyFlag1 = [1]
  44.  
  45.  
The above is the description for some of the variables to be stored and the for some cases where we have single data ,need not be created as list.

Help me in fixing to get the data as mentioned above description
What you are saying is that the parsing must be customized for certain keywords. The code as is will parse all the data in a consistent manner. You must make an effort to solve this problem yourself. Post back your solution and we can try to help you from there.
Jan 6 '08 #18
psbasha
440 256MB
What you are saying is that the parsing must be customized for certain keywords. The code as is will parse all the data in a consistent manner. You must make an effort to solve this problem yourself. Post back your solution and we can try to help you from there.
Expand|Select|Wrap|Line Numbers
  1. SampleCode
  2. key_patt = re.compile(r'/([A-Za-z_\-0-9]+)/')
  3. data_patt = re.compile(r'\d+\.\d+|\d+|-\.\d+|\w+') 
  4. def parse_data(fn):
  5.     key = None
  6.     bMFlag = False
  7.     iCount = 0
  8.     dataList = []
  9.     dd = {}
  10.  
  11.     matDataDict = {}
  12.     matCDict = {}
  13.     matTDict = {}
  14.     comFactDict = {}
  15.     bMCFlag = False
  16.     bMTFlag = False
  17.     bMatDataBlockFlag = False
  18.     bDataFlag = False
  19.     otherFList =[]
  20.     bmatStartFlag = True
  21.     bmatEndFlag = False
  22.     dataListList = []
  23.  
  24.     lineList = [strip_comments(line.strip()) for line in open(fn).readlines()\
  25.                 if line != '\n' and not line.startswith('$')]
  26.  
  27.     for line in lineList:
  28.         m = key_patt.search(line)
  29.         if m:
  30.             key = m.group(1)
  31.             line1 = line[indexList(line, '/')[1]+1:]
  32.             if key == 'NOT':
  33.                 dataList = [convertType(item) for item in \
  34.                                 data_patt.findall(line1)]
  35.                 dataListList.append(dataList)                
  36.             else:
  37.                 if data_patt.search(line1):                
  38.                     if dd.has_key(key):
  39.                         dd[key] = dd[key]+[convertType(item) for item in \
  40.                                            data_patt.findall(line1)]
  41.                     else:
  42.                         dd[key] = [convertType(item) for item in \
  43.                                    data_patt.findall(line1)]
  44.                 else:
  45.                     dd[key] = []
  46.                     bMFlag = True
  47.                     bMatDataBlockFlag = True
  48.         else:
  49.             if 'ALL' in line:
  50.                 bDataFlag = True
  51.                 bMatDataBlockFlag = False
  52.             elif bDataFlag:
  53.                 if bDataFlag and line != '\n':
  54.                     line1 = line.split()
  55.                     otherFList.append(line1)
  56.                 elif bDataFlag and '\n':
  57.                     bDataFlag = False
  58.             elif  bMatDataBlockFlag:                               
  59.                 if line.startswith('0') and  '0.  .0' != line and not line.startswith('0.  .0'):                
  60.                     bMFlag = False                
  61.                     if bMCFlag :
  62.                         bMCFlag = False
  63.                         bMTFlag = True                    
  64.                     else:
  65.                         if bMTFlag :
  66.                             bMTFlag = False
  67.                             iCount =0
  68.                         elif not bMCFlag :
  69.                             bMCFlag = True                            
  70.                             iCount =0                                                                
  71.                         else:
  72.                             pass                            
  73.                 else:
  74.                     if bMFlag:
  75.                         m1 = data_patt.search(line)
  76.                         if m1:
  77.                             if bmatStartFlag:
  78.                                 dataList = []
  79.                                 list1 = [convertType(n) for n in \
  80.                                             data_patt.findall(line)]
  81.                                 matID = list1[0]
  82.                                 bmatStartFlag = False
  83.                             else:
  84.                                 dataList = [convertType(n) for n in \
  85.                                             data_patt.findall(line)]
  86.                                 matDataDict[matID] = dataList
  87.                                 dataList = []
  88.                                 bmatStartFlag = True                                
  89.                     elif bMCFlag:                    
  90.                         if iCount ==0:
  91.                             dataList =[]
  92.                             line1 = line.split()
  93.                             matID = int(line1[0])
  94.                             iCount = iCount + 1                        
  95.                         #elif '0.  .0' != line and not line.startswith('0.  .0'):
  96.                         elif not line.startswith('0.  .0'):
  97.                             line1 = line.split()
  98.                             dataList.append([float( line1[0]),float(line1[1])])                        
  99.                         elif  '0.  .0' == line or line.startswith('0.  .0') :
  100.                             matCDict[matID] = dataList
  101.                             iCount =0
  102.                     elif bMTFlag:
  103.                         if iCount ==0:
  104.                             dataList = []
  105.                             line1 = line.split()
  106.                             matID = int(line1[0])
  107.                             iCount = iCount + 1                                                
  108.                         elif not line.startswith('0.  .0'):
  109.                             line1 = line.split()
  110.                             dataList.append([float( line1[0]),float(line1[1])])                                                
  111.                         elif '0.  .0' == line or line.startswith('0.  .0'):
  112.                             matTDict[matID] = dataList
  113.                             iCount =0
  114.                     elif not bMCFlag and not bMTFlag:
  115.                         if iCount ==0:
  116.                             dataList = []
  117.                             line1 = line.split()
  118.                             matID = int(line1[0])
  119.                             iCount = iCount + 1                                                
  120.                         elif not line.startswith('0.  .0'):
  121.                             line1 = line.split()                        
  122.                             dataList.append([float( line1[1]),float( line1[0])])                        
  123.                         elif  '0.  .0' == line or line.startswith('0.  .0'):
  124.                             comFactDict[matID] = dataList
  125.                             iCount =0                    
  126.  
  127.     dd['NOT'] =dataListList
  128.     print 'matDataDict',matDataDict                            
  129.     print 'matCDict',matCDict
  130.     print 'matTDict',matTDict
  131.     print 'comFactDict',comFactDict
  132.     print ',otherFList',otherFList
  133.     return dd
  134.  
Please find the solution.Let me know whether this can be done in more precise and better way.

Thanks
PSB
Jan 11 '08 #19
psbasha
440 256MB
Expand|Select|Wrap|Line Numbers
  1. Output
  2. matDataDict {1: [100, 40, 30, 2.0, 0], 2: [200, 40, 60, 2.0, 0], 3: [600, 40, 30, 5.0, 0], 4: [500, 40, 70, 2.0, 0]}
  3. matCDict {2: [[1000.0, 0.10000000000000001], [2000.0, 0.20000000000000001], [3000.0, 0.29999999999999999], [4000.0, 0.59999999999999998]], 3: [[3000.0, 0.10000000000000001], [5000.0, 0.20000000000000001], [6000.0, 0.40000000000000002], [7000.0, 0.59999999999999998]]}
  4. matTDict {2: [[1000.0, 0.10000000000000001], [2000.0, 0.5], [3000.0, 0.29999999999999999], [4000.0, 0.59999999999999998]], 3: [[13000.0, 0.90000000000000002], [45000.0, 0.20000000000000001], [56000.0, 0.29999999999999999], [87000.0, 0.59999999999999998]]}
  5. comFactDict {2: [[0.0, 2.0], [0.20999999999999999, 2.0], [0.23499999999999999, 3.0]], 3: [[0.0, 2.0], [0.21099999999999999, 2.0], [0.23150000000000001, 3.0]]}
  6. ,otherFList [['11', '1', '1', '1', '69716.', '1000'], ['11', '1', '1', '5', '76296.', '1000'], ['31', '1', '1', '6', '74926.', '1000'], ['31', '1', '1', '7', '74653.', '1000']]
  7. Ok = [11, 12, 1, 3]
  8. MAT = []
  9. LineThick = [0.10000000000000001]
  10. TOl2 = [0.01]
  11. TOl1 = [0.050000000000000003]
  12. STOP = ['Line', 'and', 'Circle']
  13. Top2 = [11]
  14. Top1 = [10]
  15. Bot4 = [40]
  16. Bot1 = [20]
  17. NOT = [[10, 11, 12, 1], [10, 11, 12, 2]]
  18. Flag2 = [1]
  19. Flag1 = [1]
  20. Parameter_range = [1, 1]
  21. Bot2 = [30]
  22. DummyFlag1 = [1]
  23.  
  24.  
Jan 11 '08 #20
psbasha
440 256MB
BV,

Your suggestion is required.

Thanks
PSB
Jan 11 '08 #21
bvdet
2,851 Expert Mod 2GB
BV,

Your suggestion is required.

Thanks
PSB
I don't have time to do it right now, as work deadlines are approaching. I will try to look at it later.
Jan 12 '08 #22

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Roberto A. F. De Almeida | last post by:
Hi, I'm interested in parsing a file containing this "structure": """dataset { int catalog_number; sequence { string experimenter; int32 time; structure {
2
by: Oxmard | last post by:
Armed with my new O'Reilly book Optimizing Oracle Performance I have been trying to get a better understanding of how Oracle works. The book makes the statement, " A database cal with dep=n + 1...
2
by: Cigdem | last post by:
Hello, I am trying to parse the XML files that the user selects(XML files are on anoher OS400 system called "wkdis3"). But i am permenantly getting that error: Directory0: \\wkdis3\ROOT\home...
8
by: H | last post by:
Now, I'm here with another newbie question .... I want to read a text file, string by string (to do some things with some words etc etc), but I can't seem to find a way to do this String by...
7
by: christian.eickhoff | last post by:
Hi Everyone, I am currently implementing an XercesDOMParser to parse an XML file and to validate this file against its XSD Schema file which are both located on my local HD drive. For this...
5
by: baskarpr | last post by:
Hi all, I my program after parsing in SAX parser, I want to write the parse result as an XML file. I want to ensure that there should be no difference between source XML file and parse result xml...
5
AdrianH
by: AdrianH | last post by:
Assumptions I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C++ programming. FYI Although I have called...
1
AdrianH
by: AdrianH | last post by:
Assumptions I am assuming that you know or are capable of looking up the functions I am to describe here and have some remedial understanding of C programming. FYI Although I have called this...
7
by: souravmallik | last post by:
Hello, I'm facing a big logical problem while writing a parser in VC++ using C. I have to parse a file in a chunk of bytes in a round robin fashion. Means, when I select a file, the parser...
2
by: Felipe De Bene | last post by:
I'm having problems parsing an HTML file with the following syntax : <TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'> <TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH> <TH...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.