By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,454 Members | 1,804 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,454 IT Pros & Developers. It's quick & easy.

Reading and writing a text file

100+
P: 440
Hi,

Is it necessary in Python to close the File after reading or writing the data to file?.While refering to Python material ,I saw some where mentioning that no need to close the file.Correct me if I am wrong.

If possible could anybody help me with sample code for reading and writing a simple text file.I have seen there are many ways to read /write the data in Python.But I want to use the effective way of reading or writing the data from and to file.

Thanks in advance
PSB
Feb 28 '07 #1
Share this Question
Share on Google+
42 Replies


bvdet
Expert Mod 2.5K+
P: 2,851
Hi,

Is it necessary in Python to close the File after reading or writing the data to file?.While refering to Python material ,I saw some where mentioning that no need to close the file.Correct me if I am wrong.

If possible could anybody help me with sample code for reading and writing a simple text file.I have seen there are many ways to read /write the data in Python.But I want to use the effective way of reading or writing the data from and to file.

Thanks in advance
PSB
This thread shows how to read and write data:http://www.thescripts.com/forum/thre...7166-1-10.html
There are several other theads on file I/O that I have participated in.

Python will close an open file in its garbage collection routine when the file object reference is reassigned or decreases to None. It is good practice to close every file that is opened - especially when a file object was created. This brings up a subject that has puzzled me. Open a file like this:
Expand|Select|Wrap|Line Numbers
  1. lineLst = open('file_name').readlines()
Does Python close the file? No file object is created, so the file is closed when the end of file is reached (I think).
Mar 1 '07 #2

Expert 100+
P: 511
This brings up a subject that has puzzled me. Open a file like this:
Expand|Select|Wrap|Line Numbers
  1. lineLst = open('file_name').readlines()
Does Python close the file? No file object is created, so the file is closed when the end of file is reached (I think).
yes Python does close the file when the file object gets garbage collected.
Mar 1 '07 #3

100+
P: 440
I have a file data in this format

Employee # Employee Name Salary Location
---------------------------------------------------------------------------------------------
121111 Sam 10,000 NJ
121311 Paul 20,000 NY
111111 Jim 10,000 TX

The data is in Xls and we are copying manually into text file.After copying into text file ,the data is not organized as we see in xls.

So how to read this file data (without using slicing concept) and store in the respective fields.

Could anybody provide a sample piece of code.

Thanks
PSB
Mar 2 '07 #4

bvdet
Expert Mod 2.5K+
P: 2,851
I have a file data in this format

Employee # Employee Name Salary Location
---------------------------------------------------------------------------------------------
121111 Sam 10,000 NJ
121311 Paul 20,000 NY
111111 Jim 10,000 TX

The data is in Xls and we are copying manually into text file.After copying into text file ,the data is not organized as we see in xls.

So how to read this file data (without using slicing concept) and store in the respective fields.

Could anybody provide a sample piece of code.

Thanks
PSB
You can save the Excel worksheet as a text file. The text file will be tab delimited which can be easily parsed.
Expand|Select|Wrap|Line Numbers
  1. """
  2. Read a tab delimited file
  3. """
  4.  
  5. fn = 'your_file'
  6.  
  7. f = open(fn, 'r')
  8. labelLst = f.readline().strip().split('\t')
  9. lineLst = []
  10.  
  11. for line in f:
  12.     if not line.startswith('#'):
  13.         lineLst.append(line.strip().split('\t'))
  14.  
  15. f.close()
  16.  
  17. print labelLst
  18. print lineLst
Mar 2 '07 #5

100+
P: 440
Could anybody help me in reading this data.How to seperate the line data and read.

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125

If anybody provide a sample code for the above it will be helpful.

Thanks in advance
PSB
Mar 2 '07 #6

bvdet
Expert Mod 2.5K+
P: 2,851
Could anybody help me in reading this data.How to seperate the line data and read.

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125

If anybody provide a sample code for the above it will be helpful.

Thanks in advance
PSB
Expand|Select|Wrap|Line Numbers
  1. s = 'SET 10 = 1101 1106 1107 1108 1109 1110 1111,\n 1112 1113 1114 1115 1116 1117 1118,\n 1119 1120 1121 1122 1123 1124 1125'
  2. sList = s.split('=')
  3. label = sList[0].strip()
  4. data = sList[1].strip().split(',\n')
  5. datastr = ''.join(data)
  6.  
  7. print '%s = %s' % (label, datastr)
  8. '''
  9. >>> SET 10 = 1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125
  10. '''
Mar 3 '07 #7

100+
P: 440
Thanks for the reply,

This is the input file :

$
$ SET 10
$
$ hjdsahclaladsalkjls
$PTITLE = SET 10 = SET_110
SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
$

I have to get the SET # 10 and all the integers starting from the 1101 to 1125.How can I read all the integer numbers from the 1101 to 1125.

Is it possible with the solution provided by you?.If possible what is the modifications has to be done to that piece of code
Mar 3 '07 #8

bvdet
Expert Mod 2.5K+
P: 2,851
Thanks for the reply,

This is the input file :

$
$ SET 10
$
$ hjdsahclaladsalkjls
$PTITLE = SET 10 = SET_110
SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
$

I have to get the SET # 10 and all the integers starting from the 1101 to 1125.How can I read all the integer numbers from the 1101 to 1125.

Is it possible with the solution provided by you?.If possible what is the modifications has to be done to that piece of code
This will give you a list of integers from the data string in my earlier post:
Expand|Select|Wrap|Line Numbers
  1. map(int, datastr.split())
Mar 3 '07 #9

100+
P: 440
I understand the piece of code what you have posted.But how to capture the data in between the Key Words "SET" and "END".My program should be generic enough to read this data in between this key words

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
Mar 3 '07 #10

Expert 100+
P: 511
Thanks for the reply,

This is the input file :

$
$ SET 10
$
$ hjdsahclaladsalkjls
$PTITLE = SET 10 = SET_110
SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
$

I have to get the SET # 10 and all the integers starting from the 1101 to 1125.How can I read all the integer numbers from the 1101 to 1125.

Is it possible with the solution provided by you?.If possible what is the modifications has to be done to that piece of code
another way

Expand|Select|Wrap|Line Numbers
  1. >>> import re
  2. >>> data = open("file").read()
  3. >>> re.compile("SET 10 = (\d+.*)\$ END OF SET 110",re.M|re.DOTALL).findall(data)[0].replace("\n","").replace(","," ")
  4. '1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125'
  5. >>>  
  6.  
Mar 3 '07 #11

bvdet
Expert Mod 2.5K+
P: 2,851
I understand the piece of code what you have posted.But how to capture the data in between the Key Words "SET" and "END".My program should be generic enough to read this data in between this key words

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
Iterate on the file for line in file: Untested:
Expand|Select|Wrap|Line Numbers
  1. if line.startswith('SET'):
  2.     in_set = True
  3.     s = line
  4. elif line.startswith('$'):
  5.     in_set = False
  6. elif in_set:
  7.     s += line
  8. return s
Mar 3 '07 #12

bvdet
Expert Mod 2.5K+
P: 2,851
another way

Expand|Select|Wrap|Line Numbers
  1. >>> import re
  2. >>> data = open("file").read()
  3. >>> re.compile("SET 10 = (\d+.*)\$ END OF SET 110",re.M|re.DOTALL).findall(data)[0].replace("\n","").replace(","," ")
  4. '1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125'
  5. >>>  
  6.  
I like that ghostdog! :)
Mar 3 '07 #13

100+
P: 440
another way

Expand|Select|Wrap|Line Numbers
  1. >>> import re
  2. >>> data = open("file").read()
  3. >>> re.compile("SET 10 = (\d+.*)\$ END OF SET 110",re.M|re.DOTALL).findall(data)[0].replace("\n","").replace(","," ")
  4. '1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125'
  5. >>>  
  6.  
What this piece of code will do ?.I am not able to understand what these piece of code will do ?
Mar 3 '07 #14

Expert 100+
P: 511
What this piece of code will do ?.I am not able to understand what these piece of code will do ?
I will just briefly explain as my english is not good. hope you will understand
The 're' module is regular expression module. More information can be found here at python docs

>>> data = open("file").read()
reads in the whole file as a string to be fed to re module's findall() method

>>> re.compile("SET 10 = (\d+.*)\$ END OF SET 110",re.M|re.DOTALL).findall(data)[0].replace("\n","").replace(","," ")
'1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125'
>>>
you wanted to find the numbers between "SET 10 ="
and "$END OF SET 110" . \d+ means find more that one digits. \d+.* means go on and find more of those digits. by putting brackets between (\d+.*), the results of findall() will return these digit groups. re.compile() sets up the pattern that i want to find and re.M means to search for the pattern in multiline mode. re.DOTALL means to make the "." match a newline. In other words, the "." in (\d+.*) will match newline because your numbers are split into multiline. findall() method will do the searching, and will output the results in one list. In this list, there are redundant \n and "," , so got to get rid of them through replace()

Anyway , this is just another method. I suggest you use bvdet's method if you are not familiar with regular expression.
Mar 3 '07 #15

100+
P: 440
Is the above piece of code can be made generic for the reading the following input data

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 10

SET 11 = 11031 11036 11037 11038 11040 11050 11051,
11052 11053 11054 11055 11056 11057 11058,
$ END OF SET 11

SET 15 = 110131 110136 110137 110138 110410 110510 110511,
110512 110513 110514 110515 110516 110517 110518,
$ END OF SET 15

.................
Mar 3 '07 #16

Expert 100+
P: 511
Is the above piece of code can be made generic for the reading the following input data

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 10

SET 11 = 11031 11036 11037 11038 11040 11050 11051,
11052 11053 11054 11055 11056 11057 11058,
$ END OF SET 11

SET 15 = 110131 110136 110137 110138 110410 110510 110511,
110512 110513 110514 110515 110516 110517 110518,
$ END OF SET 15

.................
sure. one way
Expand|Select|Wrap|Line Numbers
  1. >>> import re
  2. >>> data = open("file").read()
  3. >>> pat = re.compile("SET \d+ = (\d+.*?)\$ END OF SET",re.M|re.DOTALL)
  4. >>> for result in pat.findall(data):
  5. ...   print result.replace("\n","").replace(","," ")
  6. ...
  7. 1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125
  8. 11031 11036 11037 11038 11040 11050 11051 11052 11053 11054 11055 11056 11057 11058
  9. 110131 110136 110137 110138 110410 110510 110511 110512 110513 110514 110515 110516 110517 110518
  10. >>>  
  11.  
Mar 3 '07 #17

bvdet
Expert Mod 2.5K+
P: 2,851
Is the above piece of code can be made generic for the reading the following input data

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 10

SET 11 = 11031 11036 11037 11038 11040 11050 11051,
11052 11053 11054 11055 11056 11057 11058,
$ END OF SET 11

SET 15 = 110131 110136 110137 110138 110410 110510 110511,
110512 110513 110514 110515 110516 110517 110518,
$ END OF SET 15

.................
ghostdog has provided an excellent solution. Here is another, less elegant one:
Expand|Select|Wrap|Line Numbers
  1. ### READ FILE DATA
  2.  
  3. ''' File Data
  4. $
  5. $ SET 10
  6. $
  7. $ hjdsahclaladsalkjls
  8. $PTITLE = SET 10 = SET_110
  9. SET 10 = 1101 1106 1107 1108 1109 1110 1111,
  10. 1112 1113 1114 1115 1116 1117 1118,
  11. 1119 1120 1121 1122 1123 1124 1125
  12. $ END OF SET 110
  13. $
  14. $
  15.  
  16. SET 11 = 11031 11036 11037 11038 11040 11050 11051,
  17. 11052 11053 11054 11055 11056 11057 11058,
  18. $ END OF SET 11
  19. $
  20. $
  21. SET 15 = 110131 110136 110137 110138 110410 110510 110511,
  22. 110512 110513 110514 110515 110516 110517 110518,
  23. $ END OF SET 15
  24. $
  25. $
  26. '''
  27.  
  28. def file_data(s):
  29.     outStr = ''
  30.     in_set = False
  31.     for line in s:
  32.         if line.startswith('SET'):
  33.             in_set = True
  34.             outStr += line.strip('\n').strip(',')
  35.         elif 'END OF SET' in line:
  36.             in_set = False
  37.             outStr += '\n'
  38.         elif in_set:
  39.             outStr += ' ' + line.strip('\n').strip(',')
  40.     return outStr.strip()
  41.  
  42. data = file_data(open('your_file').readlines())
  43. print data, '\n'
  44. dataDict = {}
  45. for line in data.strip().split('\n'):
  46.     dataDict[line.split('=')[0].strip()] = map(int, line.split('=')[1].strip().split())
  47. for key in dataDict:
  48.     print '%s = %s' % (key, dataDict[key])
  49.  
  50. '''>>> SET 10 = 1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125
  51. SET 11 = 11031 11036 11037 11038 11040 11050 11051 11052 11053 11054 11055 11056 11057 11058
  52. SET 15 = 110131 110136 110137 110138 110410 110510 110511 110512 110513 110514 110515 110516 110517 110518
  53.  
  54. SET 15 = [110131, 110136, 110137, 110138, 110410, 110510, 110511, 110512, 110513, 110514, 110515, 110516, 110517, 110518]
  55. SET 11 = [11031, 11036, 11037, 11038, 11040, 11050, 11051, 11052, 11053, 11054, 11055, 11056, 11057, 11058]
  56. SET 10 = [1101, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1122, 1123, 1124, 1125]
  57. >>> 
  58. >>> sum(dataDict['SET 10'])
  59. 23411
  60. >>>
  61. '''
Mar 3 '07 #18

bvdet
Expert Mod 2.5K+
P: 2,851
ghostdog has provided an excellent solution. Here is another, less elegant one:
Expand|Select|Wrap|Line Numbers
  1. ............................
  2. data = file_data(open('your_file').readlines())
  3. print data, '\n'
  4. dataDict = {}
  5. for line in data.strip().split('\n'):
  6.     dataDict[line.split('=')[0].strip()] = map(int, line.split('=')[1].strip().split())
  7. for key in dataDict:
  8.     print '%s = %s' % (key, dataDict[key])
  9. .............................
  10.  
Expand|Select|Wrap|Line Numbers
  1. data = file_data(open('H:/TEMP/temsys/strdata.txt').readlines())
  2.  
  3. dataDict = {}
  4. for line in data.strip().split('\n'):
  5.     dataDict[line.split('=')[0].strip()] = [int(x) for x in line.split('=')[1].strip().split()]
  6. for key in dataDict:
  7.     print '%s = %s' % (key, dataDict[key])
In the above snippet, 'map' has been replaced by a list comprehension.
Mar 6 '07 #19

100+
P: 440
Thanks for the reply.Some times the input data is give in this format

''' File Data
$
$ SET 10
$
$ hjdsahclaladsalkjls
$PTITLE = SET 10 = SET_110
SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
$
$

SET 11 = 11031 11036 11037 11038 11040 11050 11051,
11052 11053 11054 11055 11056 11057 11058,
$ END OF SET 11
$
$
SET 15 = 110131 110136 110137 110138 110410 110510 110511,
110512 110513 110514 110515 110516 110517 110518,
$ END OF SET 15
$
$
SET 1 = 1 THRU 897
$HMSET
SET 2 = 1 THRU 932
$HMSET

How to handle the above problem to make it generic

-PSB
Mar 7 '07 #20

Expert 100+
P: 511
Thanks for the reply.Some times the input data is give in this format

''' File Data
$
$ SET 10
$
$ hjdsahclaladsalkjls
$PTITLE = SET 10 = SET_110
SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
$
$

SET 11 = 11031 11036 11037 11038 11040 11050 11051,
11052 11053 11054 11055 11056 11057 11058,
$ END OF SET 11
$
$
SET 15 = 110131 110136 110137 110138 110410 110510 110511,
110512 110513 110514 110515 110516 110517 110518,
$ END OF SET 15
$
$
SET 1 = 1 THRU 897
$HMSET
SET 2 = 1 THRU 932
$HMSET

How to handle the above problem to make it generic

-PSB
what have you done so far?
Mar 7 '07 #21

bvdet
Expert Mod 2.5K+
P: 2,851
Thanks for the reply.Some times the input data is give in this format

''' File Data
$
$ SET 10
$
$ hjdsahclaladsalkjls
$PTITLE = SET 10 = SET_110
SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
$
$

SET 11 = 11031 11036 11037 11038 11040 11050 11051,
11052 11053 11054 11055 11056 11057 11058,
$ END OF SET 11
$
$
SET 15 = 110131 110136 110137 110138 110410 110510 110511,
110512 110513 110514 110515 110516 110517 110518,
$ END OF SET 15
$
$
SET 1 = 1 THRU 897
$HMSET
SET 2 = 1 THRU 932
$HMSET

How to handle the above problem to make it generic

-PSB
Expand|Select|Wrap|Line Numbers
  1.         if line.startswith('SET'):
  2.             if not re.findall("[^0-9 ,\n]", line.split('=')[1]):
  3.                 .....................................
Anything else?
Mar 7 '07 #22

100+
P: 440
what have you done so far?
The solution what ypou have posted really helped me.Now I am trying to implement for the the other lines which appear the input file

SET 1 = 1 THRU 897
$HMSET
SET 2 = 1 THRU 932
$HMSET

I am working on this.I will post you tomorrow ,what I have done so far.Since we know that different developers have thier own way of reading the file data.But I want to have the optimized code with less lines to read the code.So I am loking from the forum.

I am not sure know whether the approach what I am following will looks tedious or round about the way.

If anybody has the idea to have in a better way ,that will help me.

Thanks in advance
PSB
Mar 7 '07 #23

Expert 100+
P: 511
The solution what ypou have posted really helped me.Now I am trying to implement for the the other lines which appear the input file

SET 1 = 1 THRU 897
$HMSET
SET 2 = 1 THRU 932
$HMSET

I am working on this.I will post you tomorrow ,what I have done so far.Since we know that different developers have thier own way of reading the file data.But I want to have the optimized code with less lines to read the code.So I am loking from the forum.

I am not sure know whether the approach what I am following will looks tedious or round about the way.

If anybody has the idea to have in a better way ,that will help me.

Thanks in advance
PSB
both outputs should be "1 THRU 897" and "1 THRU 932" that is in between SET and $HMSET? right? ie don't want SET.. and $HMSET..?
Mar 7 '07 #24

100+
P: 440
I want tthe values 1 and 897 from "1 THRU 897" and
1 and 932 from "1 THRU 932".

So the output should be [1,897] and [1,932] including the output for the previous result.
Mar 7 '07 #25

Expert 100+
P: 511
I want tthe values 1 and 897 from "1 THRU 897" and
1 and 932 from "1 THRU 932".

So the output should be [1,897] and [1,932] including the output for the previous result.
not to be overly complicated with regexp, you can try this.
Expand|Select|Wrap|Line Numbers
  1. import re
  2. data = open("file").read()
  3. pat = re.compile("SET \d+ = (\d+.*?)(?:\$ END OF SET| THRU (\d+.*?))",re.M|re.DOTALL)
  4. for result in pat.findall(data):
  5.    print result  ##do your manipulations here.
  6.  
Mar 7 '07 #26

100+
P: 440
Expand|Select|Wrap|Line Numbers
  1. """    def read_Sets_file_data(self,strSetsFile):        
  2.  
  3.         fSets = open(strSetsFile,'r')
  4.         strTemp = fSets.readlines()
  5.         elementList = []
  6.  
  7.         outStr = ''
  8.         bFlag = False
  9.         startVal =0
  10.         endVal = 0
  11.  
  12.         ### Yet to implement the "THRU" elements reading
  13.         for line in strTemp:
  14.             if line.startswith('SET'):
  15.                 bFlag = True
  16.                 outStr += line.strip('\n').strip(',')
  17.                 labelLst = line.strip().split(" ")
  18.                 for i in range(0,labelLst.__len__()):
  19.                     if ( labelLst[i] == '=' and labelLst[i+1].isdigit()) :
  20.                         startVal = labelLst[i+1]
  21.  
  22.                     if(labelLst[i].isalnum()):
  23.                         if( labelLst[i] == "THRU"):
  24.                             endVal = labelLst[i+1]
  25.  
  26.                     #print startVal,endVal
  27.                     if( int(startVal) > 0  and int(endVal) > 0):
  28.                         list1 =  self.get_THRU_elements(startVal,endVal)
  29.                         print list1
  30.                         break
  31.  
  32.             elif 'END OF SET' in line:
  33.                 bFlag = False
  34.                 outStr += '\n'
  35.             elif bFlag:
  36.                 outStr += ' ' + line.strip('\n').strip(',')
  37.  
  38.  
  39.  
  40.         data = outStr.strip()
  41.         #print data"""
  42.  
  43.         dataDict = {}
  44.  
  45.         for line in data.strip().split('\n'):
  46.             dataDict[line.split('=')[0].strip()] = map(int, line.split('=')[1].strip().split())
  47.  
  48.         #return (dataDict)
  49.         #return {}
  50.  
what have you done so far?
I have done the above one.Please correct me if I have done in a wrong way.Yet to completet the iteratice of SETS file.Only one SETS I am reading.But looking for generic.
Mar 7 '07 #27

bvdet
Expert Mod 2.5K+
P: 2,851
Expand|Select|Wrap|Line Numbers
  1. """    def read_Sets_file_data(self,strSetsFile):        
  2.  
  3.         fSets = open(strSetsFile,'r')
  4.         strTemp = fSets.readlines()
  5.         elementList = []
  6.  
  7.         outStr = ''
  8.         bFlag = False
  9.         startVal =0
  10.         endVal = 0
  11.  
  12.         ### Yet to implement the "THRU" elements reading
  13.         for line in strTemp:
  14.             if line.startswith('SET'):
  15.                 bFlag = True
  16.                 outStr += line.strip('\n').strip(',')
  17.                 labelLst = line.strip().split(" ")
  18.                 for i in range(0,labelLst.__len__()):
  19.                     if ( labelLst[i] == '=' and labelLst[i+1].isdigit()) :
  20.                         startVal = labelLst[i+1]
  21.  
  22.                     if(labelLst[i].isalnum()):
  23.                         if( labelLst[i] == "THRU"):
  24.                             endVal = labelLst[i+1]
  25.  
  26.                     #print startVal,endVal
  27.                     if( int(startVal) > 0  and int(endVal) > 0):
  28.                         list1 =  self.get_THRU_elements(startVal,endVal)
  29.                         print list1
  30.                         break
  31.  
  32.             elif 'END OF SET' in line:
  33.                 bFlag = False
  34.                 outStr += '\n'
  35.             elif bFlag:
  36.                 outStr += ' ' + line.strip('\n').strip(',')
  37.  
  38.  
  39.  
  40.         data = outStr.strip()
  41.         #print data"""
  42.  
  43.         dataDict = {}
  44.  
  45.         for line in data.strip().split('\n'):
  46.             dataDict[line.split('=')[0].strip()] = map(int, line.split('=')[1].strip().split())
  47.  
  48.         #return (dataDict)
  49.         #return {}
  50.  
I have done the above one.Please correct me if I have done in a wrong way.Yet to completet the iteratice of SETS file.Only one SETS I am reading.But looking for generic.
This problem is similar to a function I wrote recently to extract data from XML files. Instead of going over your code, it's much easier for me to post the following code (hopefully it will do what you want):
Expand|Select|Wrap|Line Numbers
  1. def file_data(s):
  2.     outStr = ''
  3.     in_set = False
  4.     for line in s:
  5.         if line.startswith('SET'):
  6.             if 'THRU' in line:
  7.                 in_set = False
  8.                 outStr += line.replace('THRU ', '').strip('\n').strip(',')+'\n'
  9.             else:
  10.                 in_set = True
  11.                 outStr += line.strip('\n').strip(',')
  12.         elif 'END OF SET' in line:
  13.             in_set = False
  14.             outStr += '\n'
  15.         elif in_set:
  16.             outStr += ' ' + line.strip('\n').strip(',')
  17.     dataDict = {}
  18.     for line in outStr.strip().split('\n'):
  19.         dataDict[line.split('=')[0].strip()] = [int(x) for x in line.split('=')[1].strip().split()]
  20.     return dataDict
  21.  
  22. dd = file_data(open('your_file').readlines())
  23. for key in dd:
  24.     print '%s = %s' % (key, dd[key])
Mar 7 '07 #28

100+
P: 440
Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2. $
  3. $ SET 10
  4. $
  5. $ hjdsahclaladsalkjls
  6. $PTITLE = SET 10 = SET_110
  7. SET 10 = 1101 1106 1107 1108 1109 1110 1111,
  8. 1112 1113 1114 1115 1116 1117 1118,
  9. 1119 1120 1121 1122 1123 1124 1125
  10. $ END OF SET 110
  11. $
  12. $
  13.  
  14. SET 11 = 11031 11036 11037 11038 11040 11050 11051,
  15. 11052 11053 11054 11055 11056 11057 11058,
  16. $ END OF SET 11
  17. $
  18. $
  19. SET 15 = 110131 110136 110137 110138 110410 110510 110511,
  20. 110512 110513 110514 110515 110516 110517 110518,
  21. $ END OF SET 15
  22. $
  23. $
  24. SET 1 = 1 THRU 897 
  25. $HMSET 
  26. SET 2 = 1 THRU 932 
  27.  
  28. SET 102 = 1001323 THRU 1001331,1001343 THRU 1001349,
  29.         1001359 THRU 1001365,1001375 THRU 1001381,
  30.         1001391 THRU 1001397,1001407 THRU 1001413,
  31.         1001415 THRU 1001429,1001439 THRU 1001445,
  32.         1001455 THRU 1001461,1001471 THRU 1001477,
  33.         1001479 THRU 1001490,1001500 THRU 1001506,
  34.         1001516 THRU 1001522,1001532 THRU 1001538,
  35.         1001540 THRU 1001554,1001564 THRU 1001570,
  36.         1001580 THRU 1001586,1001596 THRU 1001602,
  37.         1001612 THRU 1001618,1001620 THRU 1001634,
  38.         1001644 THRU 1001650,1001660 THRU 1001666,
  39.         1009990,1009992,1009994,1009996,1009998,1010000,1010002,
  40.         1010004,1010006,1010009,1010010,1010012,1010014,
  41.         1010066 THRU 1010081
  42.  
Could anybody help me in making more generic.

Add one more thing I forgot to mention in my previous dicussion

SET 1 = 1 THRU 897 ,when I come accross THRU word ,I have to take he value before THRU and after THRU i.e 1 and 897 and create a list of number for this range

say
dataList[]
for i in range(1,897):
dataList.append(i)

O/P:
get the list of all the numbers

[1,2,.......896,897]

Thanks
PSB
Mar 21 '07 #29

Expert 100+
P: 511
Expand|Select|Wrap|Line Numbers
  1. Sample.txt
  2. $
  3. $ SET 10
  4. $
  5. $ hjdsahclaladsalkjls
  6. $PTITLE = SET 10 = SET_110
  7. SET 10 = 1101 1106 1107 1108 1109 1110 1111,
  8. 1112 1113 1114 1115 1116 1117 1118,
  9. 1119 1120 1121 1122 1123 1124 1125
  10. $ END OF SET 110
  11. $
  12. $
  13.  
  14. SET 11 = 11031 11036 11037 11038 11040 11050 11051,
  15. 11052 11053 11054 11055 11056 11057 11058,
  16. $ END OF SET 11
  17. $
  18. $
  19. SET 15 = 110131 110136 110137 110138 110410 110510 110511,
  20. 110512 110513 110514 110515 110516 110517 110518,
  21. $ END OF SET 15
  22. $
  23. $
  24. SET 1 = 1 THRU 897 
  25. $HMSET 
  26. SET 2 = 1 THRU 932 
  27.  
  28. SET 102 = 1001323 THRU 1001331,1001343 THRU 1001349,
  29.         1001359 THRU 1001365,1001375 THRU 1001381,
  30.         1001391 THRU 1001397,1001407 THRU 1001413,
  31.         1001415 THRU 1001429,1001439 THRU 1001445,
  32.         1001455 THRU 1001461,1001471 THRU 1001477,
  33.         1001479 THRU 1001490,1001500 THRU 1001506,
  34.         1001516 THRU 1001522,1001532 THRU 1001538,
  35.         1001540 THRU 1001554,1001564 THRU 1001570,
  36.         1001580 THRU 1001586,1001596 THRU 1001602,
  37.         1001612 THRU 1001618,1001620 THRU 1001634,
  38.         1001644 THRU 1001650,1001660 THRU 1001666,
  39.         1009990,1009992,1009994,1009996,1009998,1010000,1010002,
  40.         1010004,1010006,1010009,1010010,1010012,1010014,
  41.         1010066 THRU 1010081
  42.  
Could anybody help me in making more generic.

Add one more thing I forgot to mention in my previous dicussion

SET 1 = 1 THRU 897 ,when I come accross THRU word ,I have to take he value before THRU and after THRU i.e 1 and 897 and create a list of number for this range

say
dataList[]
for i in range(1,897):
dataList.append(i)

O/P:
get the list of all the numbers

[1,2,.......896,897]

Thanks
PSB
say you have already gotten the values.
you can just use
Expand|Select|Wrap|Line Numbers
  1. datalist = range(1,898)
  2.  
this will create a list for you
Mar 22 '07 #30

100+
P: 440
Thanks for the solution.

But the file format what I have mentioned above is failing to get the "set " numbers ,when we have "THRU" key word in between the numbers.I am looking for reading of this sets file in more generic way to handle the above different sets format.

Can body help me in fixing the above code for more generic approach for reading the different sets file format as mentioned above.

-PSB
Mar 22 '07 #31

Expert 100+
P: 511
you have quite some experience now with Python, so I just give you some ideas and you do the rest. Just simple string manipulations will get you want you want eventually.
one idea for a sample line with THRU:
Expand|Select|Wrap|Line Numbers
  1. >>> a = "1001323 THRU 1001331,1001343 THRU 1001349"
  2. >>> a.split(",")
  3. ['1001323 THRU 1001331', '1001343 THRU 1001349']
  4. >>> for items in a.split(","):
  5. ...  print items.split("THRU")
  6. ...
  7. ['1001323 ', ' 1001331']
  8. ['1001343 ', ' 1001349']
  9.  
you said you want the numbers on the left and right of THRU right?, the above seem to get what you want. Of course, for other redundant words on the line, you can just use replace() , string slices, etc etc to get rid of them.
Mar 22 '07 #32

bvdet
Expert Mod 2.5K+
P: 2,851
you have quite some experience now with Python, so I just give you some ideas and you do the rest. Just simple string manipulations will get you want you want eventually.
one idea for a sample line with THRU:
Expand|Select|Wrap|Line Numbers
  1. >>> a = "1001323 THRU 1001331,1001343 THRU 1001349"
  2. >>> a.split(",")
  3. ['1001323 THRU 1001331', '1001343 THRU 1001349']
  4. >>> for items in a.split(","):
  5. ...  print items.split("THRU")
  6. ...
  7. ['1001323 ', ' 1001331']
  8. ['1001343 ', ' 1001349']
  9.  
you said you want the numbers on the left and right of THRU right?, the above seem to get what you want. Of course, for other redundant words on the line, you can just use replace() , string slices, etc etc to get rid of them.
I have another suggestion:
Expand|Select|Wrap|Line Numbers
  1. line = '1001612 THRU 1001618,1001620 THRU 1001634, 1001644 THRU 1001650,1001660 THRU 1001666, 1009990,1009992,1009994,1009996,1009998,1010000,1010002'
  2. outStr = ''
  3. if 'THRU' in line:
  4.     lineList = re.findall('\d+ THRU \d+|\d+', line)
  5.     lst = []
  6.     for item in lineList:
  7.         if 'THRU' in item:
  8.             tem = item.split(' THRU ')
  9.             lst += range(int(tem[0]), int(tem[1])+1)
  10.         else:
  11.             lst.append(int(item))
  12.     outStr += ' '.join([str(i) for i in lst if i != '']) + ' '
  13. else:
  14.     outStr += ' ' + line.strip('\n').strip(',')
  15.  
  16. '''
  17. >>> outStr
  18. 1001612 1001613 1001614 1001615 1001616 1001617 1001618 1001620 1001621 1001622 1001623 1001624 1001625 1001626 1001627 1001628 1001629 1001630 1001631 1001632 1001633 1001634 1001644 1001645 1001646 1001647 1001648 1001649 1001650 1001660 1001661 1001662 1001663 1001664 1001665 1001666 1009990 1009992 1009994 1009996 1009998 1010000 1010002
  19. '''
  20.  
It's not very pretty though. You should be able to do the rest from here.
Mar 22 '07 #33

Expert 100+
P: 511
looking at only these set of input data provided by OP,(though the whole file may not be the same)
Expand|Select|Wrap|Line Numbers
  1. SET 1 = 1 THRU 897 
  2. $HMSET 
  3. SET 2 = 1 THRU 932 
  4. SET 102 = 1001323 THRU 1001331,1001343 THRU 1001349,
  5.         1001359 THRU 1001365,1001375 THRU 1001381,
  6.         1001391 THRU 1001397,1001407 THRU 1001413,
  7.         1001415 THRU 1001429,1001439 THRU 1001445,
  8.         1001455 THRU 1001461,1001471 THRU 1001477,
  9.         1001479 THRU 1001490,1001500 THRU 1001506,
  10.         1001516 THRU 1001522,1001532 THRU 1001538,
  11.         1001540 THRU 1001554,1001564 THRU 1001570,
  12.         1001580 THRU 1001586,1001596 THRU 1001602,
  13.         1001612 THRU 1001618,1001620 THRU 1001634,
  14.         1001644 THRU 1001650,1001660 THRU 1001666,
  15.         1009990,1009992,1009994,1009996,1009998,1010000,10  10002,
  16.         1010004,1010006,1010009,1010010,1010012,1010014,
  17.         1010066 THRU 1010081
  18.  
this little piece of code will get what he wants. (if i didn't interpret wrongly)

Expand|Select|Wrap|Line Numbers
  1. data = open("file").read()
  2. pat = re.compile("(\d+) THRU (\d+)",re.M|re.DOTALL)
  3. for items in pat.findall(data):
  4.     print items
  5.  
output:
Expand|Select|Wrap|Line Numbers
  1. # ./test.py
  2. ('1', '897')
  3. ('1', '932')
  4. ('1001323', '1001331')
  5. ('1001343', '1001349')
  6. ('1001359', '1001365')
  7. ('1001375', '1001381')
  8. ('1001391', '1001397')
  9. ('1001407', '1001413')
  10. ('1001415', '1001429')
  11. ('1001439', '1001445')
  12. ('1001455', '1001461')
  13. ('1001471', '1001477')
  14. ('1001479', '1001490')
  15. ('1001500', '1001506')
  16. ('1001516', '1001522')
  17. ('1001532', '1001538')
  18. ('1001540', '1001554')
  19. ('1001564', '1001570')
  20. ('1001580', '1001586')
  21. ('1001596', '1001602')
  22. ('1001612', '1001618')
  23. ('1001620', '1001634')
  24. ('1001644', '1001650')
  25. ('1001660', '1001666')
  26. ('1010066', '1010081')
  27.  
Mar 22 '07 #34

bvdet
Expert Mod 2.5K+
P: 2,851
Taking it a step farther:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. def getThruData(s):
  4.     sList = re.findall('\d+ THRU \d+|\d+', s)
  5.     for item in sList:
  6.         if 'THRU' in item:
  7.             tem = item.split(' THRU ')
  8.             for num in range(int(tem[0]), int(tem[1])+1):
  9.                 yield num
  10.         else:
  11.             yield int(item.strip())
Expand|Select|Wrap|Line Numbers
  1. >>> s
  2. '1001359 THRU 1001365,1001375 THRU 1001381,\n1010004,1010006,1010009,1010010,1010012,1010014,1010066 THRU 1010081'
  3. >>> sList = [i for i in getThruData(s)]
  4. >>> sList
  5. [1001359, 1001360, 1001361, 1001362, 1001363, 1001364, 1001365, 1001375, 1001376, 1001377, 1001378, 1001379, 1001380, 1001381, 1010004, 1010006, 1010009, 1010010, 1010012, 1010014, 1010066, 1010067, 1010068, 1010069, 1010070, 1010071, 1010072, 1010073, 1010074, 1010075, 1010076, 1010077, 1010078, 1010079, 1010080, 1010081]
  6. >>> 
Mar 22 '07 #35

100+
P: 440
....,1009990,1009992,1009994,1009996,1009998,10100 00,10 10002,
1010004,1010006,1010009,1010010,1010012,1010014,.. ...

I need this numbers also to be taken care while reading the data other than the key word "THRU"

Thanks
PSB
Mar 22 '07 #36

Expert 100+
P: 511
....,1009990,1009992,1009994,1009996,1009998,10100 00,10 10002,
1010004,1010006,1010009,1010010,1010012,1010014,.. ...

I need this numbers also to be taken care while reading the data other than the key word "THRU"

Thanks
PSB
up until now, have you actually have some code yet? show us what you did to make you "unable to handle these numbers. "
Mar 23 '07 #37

bvdet
Expert Mod 2.5K+
P: 2,851
....,1009990,1009992,1009994,1009996,1009998,10100 00,10 10002,
1010004,1010006,1010009,1010010,1010012,1010014,.. ...

I need this numbers also to be taken care while reading the data other than the key word "THRU"

Thanks
PSB
My post 'Taking it a step farther' extracts all the numbers. If you had bothered to look at my post, you could see that. I thought it was pretty neat. Let me show you AGAIN:
Expand|Select|Wrap|Line Numbers
  1. >>> s1 = '1009990,1009992,1009994,1009996,1009998,1010000,1010002,1010004,1010006,1010009,1010010,1010012,1010014,10010016 THRU 10010035'
  2. >>> for i in getThruData(s1):
  3. ...     print i
  4. ...     
  5. 1009990
  6. 1009992
  7. 1009994
  8. 1009996
  9. 1009998
  10. 1010000
  11. 1010002
  12. 1010004
  13. 1010006
  14. 1010009
  15. 1010010
  16. 1010012
  17. 1010014
  18. 10010016
  19. 10010017
  20. 10010018
  21. 10010019
  22. 10010020
  23. 10010021
  24. 10010022
  25. 10010023
  26. 10010024
  27. 10010025
  28. 10010026
  29. 10010027
  30. 10010028
  31. 10010029
  32. 10010030
  33. 10010031
  34. 10010032
  35. 10010033
  36. 10010034
  37. 10010035
  38. >>> 
Please note that ALL the numbers are in the output. We have done most of the work. You don't expect us to write the total solution, do you?
Mar 23 '07 #38

100+
P: 440
Thanks to all,for providing the solution for different file reading formats.

No BV ,I dont expect from you to give the whole output/solution for the problem.Just I would like to know the approach/ concept.

Thanks BV.You are really Guru to us in the forum

- PSB
Mar 24 '07 #39

bvdet
Expert Mod 2.5K+
P: 2,851
Thanks to all,for providing the solution for different file reading formats.

No BV ,I dont expect from you to give the whole output/solution for the problem.Just I would like to know the approach/ concept.

Thanks BV.You are really Guru to us in the forum

- PSB
The approach I used:
1. Read all lines into a list, initialize 'outStr', and iterate on the list.
2. Look for keyword 'SET' and set variable 'in_set' = True.
3. If 'THRU' is in line, split on '=', send the right side to getThruData(), put back together and append (by concatenation) to 'outStr' - otherwise just append to 'outStr'.
4. While 'in_set' is True, append 'line' to 'outStr' (send to getThruData() if 'THRU' is in 'line') until a condition is seen that indicates the end of the set. Set variable 'in_set' to False and append a newline to 'outStr'.
5. Repeat until the end of the file.
6. Create the dictionary from 'outStr', splitting on '\n'.

It seems simple when described like this. HTH :)
Mar 24 '07 #40

100+
P: 440
Sorry BV,I typed the message wrongly.

What I mean is ,I understood the concept what you have explained in the earlier discussion?


-PSB
Mar 24 '07 #41

bvdet
Expert Mod 2.5K+
P: 2,851
Sorry BV,I typed the message wrongly.

What I mean is ,I understood the concept what you have explained in the earlier discussion?


-PSB
You probably solved this problem by now. Here's what I came up with in my spare time:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. def getThruData(s):
  4.     sList = re.findall('\d+ THRU \d+|\d+', s)
  5.     for item in sList:
  6.         if 'THRU' in item:
  7.             tem = item.split(' THRU ')
  8.             for num in range(int(tem[0]), int(tem[1])+1):
  9.                 yield num
  10.         else:
  11.             yield int(item.strip())
  12.  
  13. def file_data(s):
  14.     outStr = ''
  15.     in_set = False
  16.     for line in s:
  17.         line = line.replace(',', ' ')
  18.         if line.startswith('SET'):
  19.             in_set = True
  20.             if 'THRU' in line:
  21.                 lineList = line.strip().split('=')
  22.                 lst = [i for i in getThruData(lineList[1])]
  23.                 outStr += '%s=%s ' % (lineList[0], ' '.join([str(i) for i in lst if i != '']))
  24.             else:
  25.                 outStr += line.strip('\n,')
  26.         elif (line.startswith('$') or 'END OF SET' in line or line == '\n') and in_set == True:
  27.             in_set = False
  28.             outStr += '\n'
  29.         elif in_set:
  30.             if 'THRU' in line:
  31.                 lst = [i for i in getThruData(line)]
  32.                 outStr += ' '.join([str(i) for i in lst]) + ' '
  33.             else:
  34.                 outStr += ' ' + line.strip('\n,')
  35.     dataDict = {}
  36.     for line in outStr.strip().split('\n'):
  37.         dataDict[line.split('=')[0].strip()] = [int(x) for x in line.split('=')[1].strip().split()]
  38.     return dataDict
  39.  
  40.  
  41. dd = file_data(open('H:/TEMP/temsys/strdata.txt').readlines())
  42. for key in dd:
  43.     print '%s = %s' % (key, dd[key])
Mar 27 '07 #42

P: 3
ya it's neccessary to close the file frm user point of view
becoz on closing we explicitly force the buffer to save data in to the
disk other wise sometimes data loss may be occur becoz wht we perform the
operation that operation is performed on buffer not directly to disk so it's neccessary.

and prog for read and write i send u after some time becoz it will take
lot's of time on typing the code (sorry)

bye dear.
Mar 27 '07 #43

Post your reply

Sign in to post your reply or Sign up for a free account.