Reading and writing a text file

440 256MB

Hi,

Is it necessary in Python to close the File after reading or writing the data to file?.While refering to Python material ,I saw some where mentioning that no need to close the file.Correct me if I am wrong.

If possible could anybody help me with sample code for reading and writing a simple text file.I have seen there are many ways to read /write the data in Python.But I want to use the effective way of reading or writing the data from and to file.

Thanks in advance
PSB

Feb 28 '07 #1

Subscribe Post Reply

4812

bvdet

2,851

Expert Mod 2GB

Hi,

Is it necessary in Python to close the File after reading or writing the data to file?.While refering to Python material ,I saw some where mentioning that no need to close the file.Correct me if I am wrong.

If possible could anybody help me with sample code for reading and writing a simple text file.I have seen there are many ways to read /write the data in Python.But I want to use the effective way of reading or writing the data from and to file.

Thanks in advance
PSB

This thread shows how to read and write data:http://www.thescripts.com/forum/thre...7166-1-10.html
There are several other theads on file I/O that I have participated in.

Python will close an open file in its garbage collection routine when the file object reference is reassigned or decreases to None. It is good practice to close every file that is opened - especially when a file object was created. This brings up a subject that has puzzled me. Open a file like this:

Expand|Select|Wrap|Line Numbers

lineLst = open('file_name').readlines()

Does Python close the file? No file object is created, so the file is closed when the end of file is reached (I think).

Mar 1 '07 #2

ghostdog74

511

Expert 256MB

This brings up a subject that has puzzled me. Open a file like this:

Expand|Select|Wrap|Line Numbers

lineLst = open('file_name').readlines()

Does Python close the file? No file object is created, so the file is closed when the end of file is reached (I think).

yes Python does close the file when the file object gets garbage collected.

Mar 1 '07 #3

psbasha

440

256MB

I have a file data in this format

Employee # Employee Name Salary Location
---------------------------------------------------------------------------------------------
121111 Sam 10,000 NJ
121311 Paul 20,000 NY
111111 Jim 10,000 TX

The data is in Xls and we are copying manually into text file.After copying into text file ,the data is not organized as we see in xls.

So how to read this file data (without using slicing concept) and store in the respective fields.

Could anybody provide a sample piece of code.

Thanks
PSB

Mar 2 '07 #4

bvdet

2,851

Expert Mod 2GB

I have a file data in this format

Employee # Employee Name Salary Location
---------------------------------------------------------------------------------------------
121111 Sam 10,000 NJ
121311 Paul 20,000 NY
111111 Jim 10,000 TX

The data is in Xls and we are copying manually into text file.After copying into text file ,the data is not organized as we see in xls.

So how to read this file data (without using slicing concept) and store in the respective fields.

Could anybody provide a sample piece of code.

Thanks
PSB

You can save the Excel worksheet as a text file. The text file will be tab delimited which can be easily parsed.

Expand|Select|Wrap|Line Numbers

 """

Read a tab delimited file

"""
 
fn = 'your_file'
 
f = open(fn, 'r')

labelLst = f.readline().strip().split('\t')

lineLst = []
 
for line in f:

    if not line.startswith('#'):

        lineLst.append(line.strip().split('\t'))
 
f.close()
 
print labelLst

print lineLst

Mar 2 '07 #5

psbasha

440

256MB

Could anybody help me in reading this data.How to seperate the line data and read.

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125

If anybody provide a sample code for the above it will be helpful.

Thanks in advance
PSB

Mar 2 '07 #6

bvdet

2,851

Expert Mod 2GB

Could anybody help me in reading this data.How to seperate the line data and read.

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125

If anybody provide a sample code for the above it will be helpful.

Thanks in advance
PSB

Expand|Select|Wrap|Line Numbers

 s = 'SET 10 = 1101 1106 1107 1108 1109 1110 1111,\n 1112 1113 1114 1115 1116 1117 1118,\n 1119 1120 1121 1122 1123 1124 1125'

sList = s.split('=')

label = sList[0].strip()

data = sList[1].strip().split(',\n')

datastr = ''.join(data)
 
print '%s = %s' % (label, datastr)

'''

>>> SET 10 = 1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125

'''

Mar 3 '07 #7

psbasha

440

256MB

Thanks for the reply,

This is the input file :

$
$ SET 10
$
$ hjdsahclaladsalkjls
$PTITLE = SET 10 = SET_110
SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
$

I have to get the SET # 10 and all the integers starting from the 1101 to 1125.How can I read all the integer numbers from the 1101 to 1125.

Is it possible with the solution provided by you?.If possible what is the modifications has to be done to that piece of code

Mar 3 '07 #8

bvdet

2,851

Expert Mod 2GB

Thanks for the reply,

This is the input file :

$
$ SET 10
$
$ hjdsahclaladsalkjls
$PTITLE = SET 10 = SET_110
SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
$

I have to get the SET # 10 and all the integers starting from the 1101 to 1125.How can I read all the integer numbers from the 1101 to 1125.

Is it possible with the solution provided by you?.If possible what is the modifications has to be done to that piece of code

This will give you a list of integers from the data string in my earlier post:

Expand|Select|Wrap|Line Numbers

map(int, datastr.split())

Mar 3 '07 #9

psbasha

440

256MB

I understand the piece of code what you have posted.But how to capture the data in between the Key Words "SET" and "END".My program should be generic enough to read this data in between this key words

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110

Mar 3 '07 #10

ghostdog74

511

Expert 256MB

Thanks for the reply,

This is the input file :

$
$ SET 10
$
$ hjdsahclaladsalkjls
$PTITLE = SET 10 = SET_110
SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
$

I have to get the SET # 10 and all the integers starting from the 1101 to 1125.How can I read all the integer numbers from the 1101 to 1125.

Is it possible with the solution provided by you?.If possible what is the modifications has to be done to that piece of code

another way

Expand|Select|Wrap|Line Numbers

 
>>> import re

>>> data = open("file").read()

>>> re.compile("SET 10 = (\d+.*)\$ END OF SET 110",re.M|re.DOTALL).findall(data)[0].replace("\n","").replace(","," ")

'1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125'

>>>

Mar 3 '07 #11

bvdet

2,851

Expert Mod 2GB

I understand the piece of code what you have posted.But how to capture the data in between the Key Words "SET" and "END".My program should be generic enough to read this data in between this key words

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110

Iterate on the file for line in file: Untested:

Expand|Select|Wrap|Line Numbers

 if line.startswith('SET'):

    in_set = True

    s = line

elif line.startswith('$'):

    in_set = False

elif in_set:

    s += line

return s

Mar 3 '07 #12

bvdet

2,851

Expert Mod 2GB

another way

Expand|Select|Wrap|Line Numbers

>>> import re

>>> data = open("file").read()

>>> re.compile("SET 10 = (\d+.*)\$ END OF SET 110",re.M|re.DOTALL).findall(data)[0].replace("\n","").replace(","," ")

'1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125'

>>>

I like that ghostdog! :)

Mar 3 '07 #13

psbasha

440

256MB

another way

Expand|Select|Wrap|Line Numbers

>>> import re

>>> data = open("file").read()

>>> re.compile("SET 10 = (\d+.*)\$ END OF SET 110",re.M|re.DOTALL).findall(data)[0].replace("\n","").replace(","," ")

'1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125'

>>>

What this piece of code will do ?.I am not able to understand what these piece of code will do ?

Mar 3 '07 #14

ghostdog74

511

Expert 256MB

What this piece of code will do ?.I am not able to understand what these piece of code will do ?

I will just briefly explain as my english is not good. hope you will understand
The 're' module is regular expression module. More information can be found here at python docs

>>> data = open("file").read()

reads in the whole file as a string to be fed to re module's findall() method

>>> re.compile("SET 10 = (\d+.*)\$ END OF SET 110",re.M|re.DOTALL).findall(data)[0].replace("\n","").replace(","," ")
'1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125'
>>>

you wanted to find the numbers between "SET 10 ="
and "$END OF SET 110" . \d+ means find more that one digits. \d+.* means go on and find more of those digits. by putting brackets between (\d+.*), the results of findall() will return these digit groups. re.compile() sets up the pattern that i want to find and re.M means to search for the pattern in multiline mode. re.DOTALL means to make the "." match a newline. In other words, the "." in (\d+.*) will match newline because your numbers are split into multiline. findall() method will do the searching, and will output the results in one list. In this list, there are redundant \n and "," , so got to get rid of them through replace()

Anyway , this is just another method. I suggest you use bvdet's method if you are not familiar with regular expression.

Mar 3 '07 #15

psbasha

440

256MB

Is the above piece of code can be made generic for the reading the following input data

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 10

SET 11 = 11031 11036 11037 11038 11040 11050 11051,
11052 11053 11054 11055 11056 11057 11058,
$ END OF SET 11

SET 15 = 110131 110136 110137 110138 110410 110510 110511,
110512 110513 110514 110515 110516 110517 110518,
$ END OF SET 15

.................

Mar 3 '07 #16

ghostdog74

511

Expert 256MB

Is the above piece of code can be made generic for the reading the following input data

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 10

SET 11 = 11031 11036 11037 11038 11040 11050 11051,
11052 11053 11054 11055 11056 11057 11058,
$ END OF SET 11

SET 15 = 110131 110136 110137 110138 110410 110510 110511,
110512 110513 110514 110515 110516 110517 110518,
$ END OF SET 15

.................

sure. one way

Expand|Select|Wrap|Line Numbers

 
>>> import re

>>> data = open("file").read()

>>> pat = re.compile("SET \d+ = (\d+.*?)\$ END OF SET",re.M|re.DOTALL)

>>> for result in pat.findall(data):

...   print result.replace("\n","").replace(","," ")

...

1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125

11031 11036 11037 11038 11040 11050 11051 11052 11053 11054 11055 11056 11057 11058

110131 110136 110137 110138 110410 110510 110511 110512 110513 110514 110515 110516 110517 110518

>>>

Mar 3 '07 #17

bvdet

2,851

Expert Mod 2GB

Is the above piece of code can be made generic for the reading the following input data

SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 10

SET 11 = 11031 11036 11037 11038 11040 11050 11051,
11052 11053 11054 11055 11056 11057 11058,
$ END OF SET 11

SET 15 = 110131 110136 110137 110138 110410 110510 110511,
110512 110513 110514 110515 110516 110517 110518,
$ END OF SET 15

.................

ghostdog has provided an excellent solution. Here is another, less elegant one:

Expand|Select|Wrap|Line Numbers

 ### READ FILE DATA
 
''' File Data

$

$ SET 10

$

$ hjdsahclaladsalkjls

$PTITLE = SET 10 = SET_110

SET 10 = 1101 1106 1107 1108 1109 1110 1111,

1112 1113 1114 1115 1116 1117 1118,

1119 1120 1121 1122 1123 1124 1125

$ END OF SET 110

$

$
 
SET 11 = 11031 11036 11037 11038 11040 11050 11051,

11052 11053 11054 11055 11056 11057 11058,

$ END OF SET 11

$

$

SET 15 = 110131 110136 110137 110138 110410 110510 110511,

110512 110513 110514 110515 110516 110517 110518,

$ END OF SET 15

$

$

'''
 
def file_data(s):

    outStr = ''

    in_set = False

    for line in s:

        if line.startswith('SET'):

            in_set = True

            outStr += line.strip('\n').strip(',')

        elif 'END OF SET' in line:

            in_set = False

            outStr += '\n'

        elif in_set:

            outStr += ' ' + line.strip('\n').strip(',')

    return outStr.strip()
 
data = file_data(open('your_file').readlines())

print data, '\n'

dataDict = {}

for line in data.strip().split('\n'):

    dataDict[line.split('=')[0].strip()] = map(int, line.split('=')[1].strip().split())

for key in dataDict:

    print '%s = %s' % (key, dataDict[key])
 
'''>>> SET 10 = 1101 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125

SET 11 = 11031 11036 11037 11038 11040 11050 11051 11052 11053 11054 11055 11056 11057 11058

SET 15 = 110131 110136 110137 110138 110410 110510 110511 110512 110513 110514 110515 110516 110517 110518
 
SET 15 = [110131, 110136, 110137, 110138, 110410, 110510, 110511, 110512, 110513, 110514, 110515, 110516, 110517, 110518]

SET 11 = [11031, 11036, 11037, 11038, 11040, 11050, 11051, 11052, 11053, 11054, 11055, 11056, 11057, 11058]

SET 10 = [1101, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1122, 1123, 1124, 1125]

>>> 

>>> sum(dataDict['SET 10'])

23411

>>>

'''

Mar 3 '07 #18

bvdet

2,851

Expert Mod 2GB

ghostdog has provided an excellent solution. Here is another, less elegant one:

Expand|Select|Wrap|Line Numbers

............................

data = file_data(open('your_file').readlines())

print data, '\n'

dataDict = {}

for line in data.strip().split('\n'):

dataDict[line.split('=')[0].strip()] = map(int, line.split('=')[1].strip().split())

for key in dataDict:

print '%s = %s' % (key, dataDict[key])

.............................

Expand|Select|Wrap|Line Numbers

 data = file_data(open('H:/TEMP/temsys/strdata.txt').readlines())
 
dataDict = {}

for line in data.strip().split('\n'):

    dataDict[line.split('=')[0].strip()] = [int(x) for x in line.split('=')[1].strip().split()]

for key in dataDict:

    print '%s = %s' % (key, dataDict[key])

In the above snippet, 'map' has been replaced by a list comprehension.

Mar 6 '07 #19

psbasha

440

256MB

Thanks for the reply.Some times the input data is give in this format

''' File Data
$
$ SET 10
$
$ hjdsahclaladsalkjls
$PTITLE = SET 10 = SET_110
SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
$
$

SET 11 = 11031 11036 11037 11038 11040 11050 11051,
11052 11053 11054 11055 11056 11057 11058,
$ END OF SET 11
$
$
SET 15 = 110131 110136 110137 110138 110410 110510 110511,
110512 110513 110514 110515 110516 110517 110518,
$ END OF SET 15
$
$
SET 1 = 1 THRU 897
$HMSET
SET 2 = 1 THRU 932
$HMSET

How to handle the above problem to make it generic

-PSB

Mar 7 '07 #20

ghostdog74

511

Expert 256MB

Thanks for the reply.Some times the input data is give in this format

''' File Data
$
$ SET 10
$
$ hjdsahclaladsalkjls
$PTITLE = SET 10 = SET_110
SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
$
$

SET 11 = 11031 11036 11037 11038 11040 11050 11051,
11052 11053 11054 11055 11056 11057 11058,
$ END OF SET 11
$
$
SET 15 = 110131 110136 110137 110138 110410 110510 110511,
110512 110513 110514 110515 110516 110517 110518,
$ END OF SET 15
$
$
SET 1 = 1 THRU 897
$HMSET
SET 2 = 1 THRU 932
$HMSET

How to handle the above problem to make it generic

-PSB

what have you done so far?

Mar 7 '07 #21

bvdet

2,851

Expert Mod 2GB

Thanks for the reply.Some times the input data is give in this format

''' File Data
$
$ SET 10
$
$ hjdsahclaladsalkjls
$PTITLE = SET 10 = SET_110
SET 10 = 1101 1106 1107 1108 1109 1110 1111,
1112 1113 1114 1115 1116 1117 1118,
1119 1120 1121 1122 1123 1124 1125
$ END OF SET 110
$
$

SET 11 = 11031 11036 11037 11038 11040 11050 11051,
11052 11053 11054 11055 11056 11057 11058,
$ END OF SET 11
$
$
SET 15 = 110131 110136 110137 110138 110410 110510 110511,
110512 110513 110514 110515 110516 110517 110518,
$ END OF SET 15
$
$
SET 1 = 1 THRU 897
$HMSET
SET 2 = 1 THRU 932
$HMSET

How to handle the above problem to make it generic

-PSB

Expand|Select|Wrap|Line Numbers

         if line.startswith('SET'):

            if not re.findall("[^0-9 ,\n]", line.split('=')[1]):

                .....................................

Anything else?

Mar 7 '07 #22

psbasha

440

256MB

what have you done so far?

The solution what ypou have posted really helped me.Now I am trying to implement for the the other lines which appear the input file

SET 1 = 1 THRU 897
$HMSET
SET 2 = 1 THRU 932
$HMSET

I am working on this.I will post you tomorrow ,what I have done so far.Since we know that different developers have thier own way of reading the file data.But I want to have the optimized code with less lines to read the code.So I am loking from the forum.

I am not sure know whether the approach what I am following will looks tedious or round about the way.

If anybody has the idea to have in a better way ,that will help me.

Thanks in advance
PSB

Mar 7 '07 #23

ghostdog74

511

Expert 256MB

The solution what ypou have posted really helped me.Now I am trying to implement for the the other lines which appear the input file

SET 1 = 1 THRU 897
$HMSET
SET 2 = 1 THRU 932
$HMSET

I am working on this.I will post you tomorrow ,what I have done so far.Since we know that different developers have thier own way of reading the file data.But I want to have the optimized code with less lines to read the code.So I am loking from the forum.

I am not sure know whether the approach what I am following will looks tedious or round about the way.

If anybody has the idea to have in a better way ,that will help me.

Thanks in advance
PSB

both outputs should be "1 THRU 897" and "1 THRU 932" that is in between SET and $HMSET? right? ie don't want SET.. and $HMSET..?

Mar 7 '07 #24

psbasha

440

256MB

I want tthe values 1 and 897 from "1 THRU 897" and
1 and 932 from "1 THRU 932".

So the output should be [1,897] and [1,932] including the output for the previous result.

Mar 7 '07 #25

ghostdog74

511

Expert 256MB

I want tthe values 1 and 897 from "1 THRU 897" and
1 and 932 from "1 THRU 932".

So the output should be [1,897] and [1,932] including the output for the previous result.

not to be overly complicated with regexp, you can try this.

Expand|Select|Wrap|Line Numbers

 
import re

data = open("file").read()

pat = re.compile("SET \d+ = (\d+.*?)(?:\$ END OF SET| THRU (\d+.*?))",re.M|re.DOTALL)

for result in pat.findall(data):

   print result  ##do your manipulations here.

Mar 7 '07 #26

psbasha

440

256MB

Expand|Select|Wrap|Line Numbers

 """    def read_Sets_file_data(self,strSetsFile):        
 
        fSets = open(strSetsFile,'r')

        strTemp = fSets.readlines()

        elementList = []
 
        outStr = ''

        bFlag = False

        startVal =0

        endVal = 0
 
        ### Yet to implement the "THRU" elements reading

        for line in strTemp:

            if line.startswith('SET'):

                bFlag = True

                outStr += line.strip('\n').strip(',')

                labelLst = line.strip().split(" ")

                for i in range(0,labelLst.__len__()):

                    if ( labelLst[i] == '=' and labelLst[i+1].isdigit()) :

                        startVal = labelLst[i+1]
 
                    if(labelLst[i].isalnum()):

                        if( labelLst[i] == "THRU"):

                            endVal = labelLst[i+1]
 
                    #print startVal,endVal

                    if( int(startVal) > 0  and int(endVal) > 0):

                        list1 =  self.get_THRU_elements(startVal,endVal)

                        print list1

                        break
 
            elif 'END OF SET' in line:

                bFlag = False

                outStr += '\n'

            elif bFlag:

                outStr += ' ' + line.strip('\n').strip(',')
 
        data = outStr.strip()

        #print data"""
 
        dataDict = {}
 
        for line in data.strip().split('\n'):

            dataDict[line.split('=')[0].strip()] = map(int, line.split('=')[1].strip().split())
 
        #return (dataDict)

        #return {}

what have you done so far?

I have done the above one.Please correct me if I have done in a wrong way.Yet to completet the iteratice of SETS file.Only one SETS I am reading.But looking for generic.

Mar 7 '07 #27

bvdet

2,851

Expert Mod 2GB

Expand|Select|Wrap|Line Numbers

"""    def read_Sets_file_data(self,strSetsFile):

        fSets = open(strSetsFile,'r')

        strTemp = fSets.readlines()

        elementList = []

        outStr = ''

        bFlag = False

        startVal =0

        endVal = 0

        ### Yet to implement the "THRU" elements reading

        for line in strTemp:

            if line.startswith('SET'):

                bFlag = True

                outStr += line.strip('\n').strip(',')

                labelLst = line.strip().split(" ")

                for i in range(0,labelLst.__len__()):

                    if ( labelLst[i] == '=' and labelLst[i+1].isdigit()) :

                        startVal = labelLst[i+1]

                    if(labelLst[i].isalnum()):

                        if( labelLst[i] == "THRU"):

                            endVal = labelLst[i+1]

                    #print startVal,endVal

                    if( int(startVal) > 0  and int(endVal) > 0):

                        list1 =  self.get_THRU_elements(startVal,endVal)

                        print list1

                        break

            elif 'END OF SET' in line:

                bFlag = False

                outStr += '\n'

            elif bFlag:

                outStr += ' ' + line.strip('\n').strip(',')

        data = outStr.strip()

        #print data"""

        dataDict = {}

        for line in data.strip().split('\n'):

            dataDict[line.split('=')[0].strip()] = map(int, line.split('=')[1].strip().split())

        #return (dataDict)

        #return {}

I have done the above one.Please correct me if I have done in a wrong way.Yet to completet the iteratice of SETS file.Only one SETS I am reading.But looking for generic.

This problem is similar to a function I wrote recently to extract data from XML files. Instead of going over your code, it's much easier for me to post the following code (hopefully it will do what you want):

Expand|Select|Wrap|Line Numbers

 def file_data(s):

    outStr = ''

    in_set = False

    for line in s:

        if line.startswith('SET'):

            if 'THRU' in line:

                in_set = False

                outStr += line.replace('THRU ', '').strip('\n').strip(',')+'\n'

            else:

                in_set = True

                outStr += line.strip('\n').strip(',')

        elif 'END OF SET' in line:

            in_set = False

            outStr += '\n'

        elif in_set:

            outStr += ' ' + line.strip('\n').strip(',')

    dataDict = {}

    for line in outStr.strip().split('\n'):

        dataDict[line.split('=')[0].strip()] = [int(x) for x in line.split('=')[1].strip().split()]

    return dataDict
 
dd = file_data(open('your_file').readlines())

for key in dd:

    print '%s = %s' % (key, dd[key])

Mar 7 '07 #28

psbasha

440

256MB

Expand|Select|Wrap|Line Numbers

 Sample.txt

$

$ SET 10

$

$ hjdsahclaladsalkjls

$PTITLE = SET 10 = SET_110

SET 10 = 1101 1106 1107 1108 1109 1110 1111,

1112 1113 1114 1115 1116 1117 1118,

1119 1120 1121 1122 1123 1124 1125

$ END OF SET 110

$

$
 
SET 11 = 11031 11036 11037 11038 11040 11050 11051,

11052 11053 11054 11055 11056 11057 11058,

$ END OF SET 11

$

$

SET 15 = 110131 110136 110137 110138 110410 110510 110511,

110512 110513 110514 110515 110516 110517 110518,

$ END OF SET 15

$

$

SET 1 = 1 THRU 897 

$HMSET 

SET 2 = 1 THRU 932 
 
SET 102 = 1001323 THRU 1001331,1001343 THRU 1001349,

        1001359 THRU 1001365,1001375 THRU 1001381,

        1001391 THRU 1001397,1001407 THRU 1001413,

        1001415 THRU 1001429,1001439 THRU 1001445,

        1001455 THRU 1001461,1001471 THRU 1001477,

        1001479 THRU 1001490,1001500 THRU 1001506,

        1001516 THRU 1001522,1001532 THRU 1001538,

        1001540 THRU 1001554,1001564 THRU 1001570,

        1001580 THRU 1001586,1001596 THRU 1001602,

        1001612 THRU 1001618,1001620 THRU 1001634,

        1001644 THRU 1001650,1001660 THRU 1001666,

        1009990,1009992,1009994,1009996,1009998,1010000,1010002,

        1010004,1010006,1010009,1010010,1010012,1010014,

        1010066 THRU 1010081

Could anybody help me in making more generic.

Add one more thing I forgot to mention in my previous dicussion

SET 1 = 1 THRU 897 ,when I come accross THRU word ,I have to take he value before THRU and after THRU i.e 1 and 897 and create a list of number for this range

say
dataList[]
for i in range(1,897):
dataList.append(i)

O/P:
get the list of all the numbers

[1,2,.......896,897]

Thanks
PSB

Mar 21 '07 #29

ghostdog74

511

Expert 256MB

Expand|Select|Wrap|Line Numbers

Sample.txt

$

$ SET 10

$

$ hjdsahclaladsalkjls

$PTITLE = SET 10 = SET_110

SET 10 = 1101 1106 1107 1108 1109 1110 1111,

1112 1113 1114 1115 1116 1117 1118,

1119 1120 1121 1122 1123 1124 1125

$ END OF SET 110

$

$

SET 11 = 11031 11036 11037 11038 11040 11050 11051,

11052 11053 11054 11055 11056 11057 11058,

$ END OF SET 11

$

$

SET 15 = 110131 110136 110137 110138 110410 110510 110511,

110512 110513 110514 110515 110516 110517 110518,

$ END OF SET 15

$

$

SET 1 = 1 THRU 897

$HMSET

SET 2 = 1 THRU 932

SET 102 = 1001323 THRU 1001331,1001343 THRU 1001349,

        1001359 THRU 1001365,1001375 THRU 1001381,

        1001391 THRU 1001397,1001407 THRU 1001413,

        1001415 THRU 1001429,1001439 THRU 1001445,

        1001455 THRU 1001461,1001471 THRU 1001477,

        1001479 THRU 1001490,1001500 THRU 1001506,

        1001516 THRU 1001522,1001532 THRU 1001538,

        1001540 THRU 1001554,1001564 THRU 1001570,

        1001580 THRU 1001586,1001596 THRU 1001602,

        1001612 THRU 1001618,1001620 THRU 1001634,

        1001644 THRU 1001650,1001660 THRU 1001666,

        1009990,1009992,1009994,1009996,1009998,1010000,1010002,

        1010004,1010006,1010009,1010010,1010012,1010014,

        1010066 THRU 1010081

Could anybody help me in making more generic.

Add one more thing I forgot to mention in my previous dicussion

SET 1 = 1 THRU 897 ,when I come accross THRU word ,I have to take he value before THRU and after THRU i.e 1 and 897 and create a list of number for this range

say
dataList[]
for i in range(1,897):
dataList.append(i)

O/P:
get the list of all the numbers

[1,2,.......896,897]

Thanks
PSB

say you have already gotten the values.
you can just use

Expand|Select|Wrap|Line Numbers

datalist = range(1,898)

this will create a list for you

Mar 22 '07 #30

psbasha

440

256MB

Thanks for the solution.

But the file format what I have mentioned above is failing to get the "set " numbers ,when we have "THRU" key word in between the numbers.I am looking for reading of this sets file in more generic way to handle the above different sets format.

Can body help me in fixing the above code for more generic approach for reading the different sets file format as mentioned above.

-PSB

Mar 22 '07 #31

ghostdog74

511

Expert 256MB

you have quite some experience now with Python, so I just give you some ideas and you do the rest. Just simple string manipulations will get you want you want eventually.
one idea for a sample line with THRU:

Expand|Select|Wrap|Line Numbers

 
>>> a = "1001323 THRU 1001331,1001343 THRU 1001349"

>>> a.split(",")

['1001323 THRU 1001331', '1001343 THRU 1001349']

>>> for items in a.split(","):

...  print items.split("THRU")

...

['1001323 ', ' 1001331']

['1001343 ', ' 1001349']

you said you want the numbers on the left and right of THRU right?, the above seem to get what you want. Of course, for other redundant words on the line, you can just use replace() , string slices, etc etc to get rid of them.

Mar 22 '07 #32

bvdet

2,851

Expert Mod 2GB

you have quite some experience now with Python, so I just give you some ideas and you do the rest. Just simple string manipulations will get you want you want eventually.
one idea for a sample line with THRU:

Expand|Select|Wrap|Line Numbers

>>> a = "1001323 THRU 1001331,1001343 THRU 1001349"

>>> a.split(",")

['1001323 THRU 1001331', '1001343 THRU 1001349']

>>> for items in a.split(","):

... print items.split("THRU")

...

['1001323 ', ' 1001331']

['1001343 ', ' 1001349']

you said you want the numbers on the left and right of THRU right?, the above seem to get what you want. Of course, for other redundant words on the line, you can just use replace() , string slices, etc etc to get rid of them.

I have another suggestion:

Expand|Select|Wrap|Line Numbers

 line = '1001612 THRU 1001618,1001620 THRU 1001634, 1001644 THRU 1001650,1001660 THRU 1001666, 1009990,1009992,1009994,1009996,1009998,1010000,1010002'

outStr = ''

if 'THRU' in line:

    lineList = re.findall('\d+ THRU \d+|\d+', line)

    lst = []

    for item in lineList:

        if 'THRU' in item:

            tem = item.split(' THRU ')

            lst += range(int(tem[0]), int(tem[1])+1)

        else:

            lst.append(int(item))

    outStr += ' '.join([str(i) for i in lst if i != '']) + ' '

else:

    outStr += ' ' + line.strip('\n').strip(',')
 
'''

>>> outStr

1001612 1001613 1001614 1001615 1001616 1001617 1001618 1001620 1001621 1001622 1001623 1001624 1001625 1001626 1001627 1001628 1001629 1001630 1001631 1001632 1001633 1001634 1001644 1001645 1001646 1001647 1001648 1001649 1001650 1001660 1001661 1001662 1001663 1001664 1001665 1001666 1009990 1009992 1009994 1009996 1009998 1010000 1010002

'''

It's not very pretty though. You should be able to do the rest from here.

Mar 22 '07 #33

ghostdog74

511

Expert 256MB

looking at only these set of input data provided by OP,(though the whole file may not be the same)

Expand|Select|Wrap|Line Numbers

 
SET 1 = 1 THRU 897 

$HMSET 

SET 2 = 1 THRU 932 

SET 102 = 1001323 THRU 1001331,1001343 THRU 1001349,

        1001359 THRU 1001365,1001375 THRU 1001381,

        1001391 THRU 1001397,1001407 THRU 1001413,

        1001415 THRU 1001429,1001439 THRU 1001445,

        1001455 THRU 1001461,1001471 THRU 1001477,

        1001479 THRU 1001490,1001500 THRU 1001506,

        1001516 THRU 1001522,1001532 THRU 1001538,

        1001540 THRU 1001554,1001564 THRU 1001570,

        1001580 THRU 1001586,1001596 THRU 1001602,

        1001612 THRU 1001618,1001620 THRU 1001634,

        1001644 THRU 1001650,1001660 THRU 1001666,

        1009990,1009992,1009994,1009996,1009998,1010000,10  10002,

        1010004,1010006,1010009,1010010,1010012,1010014,

        1010066 THRU 1010081

this little piece of code will get what he wants. (if i didn't interpret wrongly)

Expand|Select|Wrap|Line Numbers

 
data = open("file").read()

pat = re.compile("(\d+) THRU (\d+)",re.M|re.DOTALL)

for items in pat.findall(data):

    print items

output:

Expand|Select|Wrap|Line Numbers

 
# ./test.py

('1', '897')

('1', '932')

('1001323', '1001331')

('1001343', '1001349')

('1001359', '1001365')

('1001375', '1001381')

('1001391', '1001397')

('1001407', '1001413')

('1001415', '1001429')

('1001439', '1001445')

('1001455', '1001461')

('1001471', '1001477')

('1001479', '1001490')

('1001500', '1001506')

('1001516', '1001522')

('1001532', '1001538')

('1001540', '1001554')

('1001564', '1001570')

('1001580', '1001586')

('1001596', '1001602')

('1001612', '1001618')

('1001620', '1001634')

('1001644', '1001650')

('1001660', '1001666')

('1010066', '1010081')

Mar 22 '07 #34

bvdet

2,851

Expert Mod 2GB

Taking it a step farther:

Expand|Select|Wrap|Line Numbers

 import re
 
def getThruData(s):

    sList = re.findall('\d+ THRU \d+|\d+', s)

    for item in sList:

        if 'THRU' in item:

            tem = item.split(' THRU ')

            for num in range(int(tem[0]), int(tem[1])+1):

                yield num

        else:

            yield int(item.strip())

Expand|Select|Wrap|Line Numbers

 >>> s

'1001359 THRU 1001365,1001375 THRU 1001381,\n1010004,1010006,1010009,1010010,1010012,1010014,1010066 THRU 1010081'

>>> sList = [i for i in getThruData(s)]

>>> sList

[1001359, 1001360, 1001361, 1001362, 1001363, 1001364, 1001365, 1001375, 1001376, 1001377, 1001378, 1001379, 1001380, 1001381, 1010004, 1010006, 1010009, 1010010, 1010012, 1010014, 1010066, 1010067, 1010068, 1010069, 1010070, 1010071, 1010072, 1010073, 1010074, 1010075, 1010076, 1010077, 1010078, 1010079, 1010080, 1010081]

>>>

Mar 22 '07 #35

psbasha

440

256MB

....,1009990,1009992,1009994,1009996,1009998,10100 00,10 10002,
1010004,1010006,1010009,1010010,1010012,1010014,.. ...

I need this numbers also to be taken care while reading the data other than the key word "THRU"

Thanks
PSB

Mar 22 '07 #36

ghostdog74

511

Expert 256MB

....,1009990,1009992,1009994,1009996,1009998,10100 00,10 10002,
1010004,1010006,1010009,1010010,1010012,1010014,.. ...

I need this numbers also to be taken care while reading the data other than the key word "THRU"

Thanks
PSB

up until now, have you actually have some code yet? show us what you did to make you "unable to handle these numbers. "

Mar 23 '07 #37

bvdet

2,851

Expert Mod 2GB

....,1009990,1009992,1009994,1009996,1009998,10100 00,10 10002,
1010004,1010006,1010009,1010010,1010012,1010014,.. ...

I need this numbers also to be taken care while reading the data other than the key word "THRU"

Thanks
PSB

My post 'Taking it a step farther' extracts all the numbers. If you had bothered to look at my post, you could see that. I thought it was pretty neat. Let me show you AGAIN:

Expand|Select|Wrap|Line Numbers

 >>> s1 = '1009990,1009992,1009994,1009996,1009998,1010000,1010002,1010004,1010006,1010009,1010010,1010012,1010014,10010016 THRU 10010035'

>>> for i in getThruData(s1):

...     print i

...     

1009990

1009992

1009994

1009996

1009998

1010000

1010002

1010004

1010006

1010009

1010010

1010012

1010014

10010016

10010017

10010018

10010019

10010020

10010021

10010022

10010023

10010024

10010025

10010026

10010027

10010028

10010029

10010030

10010031

10010032

10010033

10010034

10010035

>>>

Please note that ALL the numbers are in the output. We have done most of the work. You don't expect us to write the total solution, do you?

Mar 23 '07 #38

psbasha

440

256MB

Thanks to all,for providing the solution for different file reading formats.

No BV ,I dont expect from you to give the whole output/solution for the problem.Just I would like to know the approach/ concept.

Thanks BV.You are really Guru to us in the forum

- PSB

Mar 24 '07 #39

bvdet

2,851

Expert Mod 2GB

Thanks to all,for providing the solution for different file reading formats.

No BV ,I dont expect from you to give the whole output/solution for the problem.Just I would like to know the approach/ concept.

Thanks BV.You are really Guru to us in the forum

- PSB

The approach I used:
1. Read all lines into a list, initialize 'outStr', and iterate on the list.
2. Look for keyword 'SET' and set variable 'in_set' = True.
3. If 'THRU' is in line, split on '=', send the right side to getThruData(), put back together and append (by concatenation) to 'outStr' - otherwise just append to 'outStr'.
4. While 'in_set' is True, append 'line' to 'outStr' (send to getThruData() if 'THRU' is in 'line') until a condition is seen that indicates the end of the set. Set variable 'in_set' to False and append a newline to 'outStr'.
5. Repeat until the end of the file.
6. Create the dictionary from 'outStr', splitting on '\n'.

It seems simple when described like this. HTH :)

Mar 24 '07 #40

psbasha

440

256MB

Sorry BV,I typed the message wrongly.

What I mean is ,I understood the concept what you have explained in the earlier discussion?

-PSB

Mar 24 '07 #41

bvdet

2,851

Expert Mod 2GB

Sorry BV,I typed the message wrongly.

What I mean is ,I understood the concept what you have explained in the earlier discussion?

-PSB

You probably solved this problem by now. Here's what I came up with in my spare time:

Expand|Select|Wrap|Line Numbers

 
import re
 
def getThruData(s):

    sList = re.findall('\d+ THRU \d+|\d+', s)

    for item in sList:

        if 'THRU' in item:

            tem = item.split(' THRU ')

            for num in range(int(tem[0]), int(tem[1])+1):

                yield num

        else:

            yield int(item.strip())
 
def file_data(s):

    outStr = ''

    in_set = False

    for line in s:

        line = line.replace(',', ' ')

        if line.startswith('SET'):

            in_set = True

            if 'THRU' in line:

                lineList = line.strip().split('=')

                lst = [i for i in getThruData(lineList[1])]

                outStr += '%s=%s ' % (lineList[0], ' '.join([str(i) for i in lst if i != '']))

            else:

                outStr += line.strip('\n,')

        elif (line.startswith('$') or 'END OF SET' in line or line == '\n') and in_set == True:

            in_set = False

            outStr += '\n'

        elif in_set:

            if 'THRU' in line:

                lst = [i for i in getThruData(line)]

                outStr += ' '.join([str(i) for i in lst]) + ' '

            else:

                outStr += ' ' + line.strip('\n,')

    dataDict = {}

    for line in outStr.strip().split('\n'):

        dataDict[line.split('=')[0].strip()] = [int(x) for x in line.split('=')[1].strip().split()]

    return dataDict
 
dd = file_data(open('H:/TEMP/temsys/strdata.txt').readlines())

for key in dd:

    print '%s = %s' % (key, dd[key])

Mar 27 '07 #42

inderjeetkalra

ya it's neccessary to close the file frm user point of view
becoz on closing we explicitly force the buffer to save data in to the
disk other wise sometimes data loss may be occur becoz wht we perform the
operation that operation is performed on buffer not directly to disk so it's neccessary.

and prog for read and write i send u after some time becoz it will take
lot's of time on typing the code (sorry)

bye dear.

Mar 27 '07 #43

Reading and writing a text file

Similar topics