newb comment request

Alexandre

Hi,

Im a newb to dev and python... my first sefl assigned mission was to read a
pickled file containing a list with DB like data and convert this to
MySQL... So i wrote my first module which reads this pickled file and writes
an XML file with list of tables and fields (... next step will the module
who creates the tables according to details found in the XML file).

If anyone has some minutes to spare, suggestions and comments would be verry
much appreciated to help me make my next modules better.. and not starting
with bad habit :)

Alexandre

######################## <EIPyFormatToXML>
######################################
#pickle.loads a file containing database like data in a python list ->
#Outputs a XML file with "tables" details which will be used in a future
module
#to build MySQL tables. See the comment blocks at the end of the module
#for more details.

import sys
import pickle
import types

infile = open('cached-objects-Python-pickled-sample', 'rb')
_data = pickle.load(infile)
infile.close()

def ExtractTablesFromData(data):
"""Extracts all the table names from the Dumped items data file and
returns the list."""
tablesR = []
for tables in data:#For each tables found in 'data'
tablesR.append([tables[0]])#Appends the list with current table name
return tablesR

def ExtractFieldNamesFromData(data):
"""Extract all fields from data list (the calling function defines for
which table in 'data' argument)."""
fieldsR = []
for fields in data:
fieldsR.append([fields])
return fieldsR

def ExtractFieldValuesFromData(data, indexField):
"""Check each value of the field passed as argument to the function, it
returns [valueType, maxLength, maxValue, minValue, floatPrecision,
NoneValues(bool), sampleValue]."""
valueType, maxLength, maxValue, minValue, values , floatPrecision,
NoneValues = None, None, None, 999999999999, [], None, False
sampleValue = 'numeric value, check min and max values as sample'
for valuesD in data:#for each record...
value = valuesD[indexField]#focus value on required field
if type(value) is not types.NoneType:#if a value other than None is
found
valueType = type(value)#valueType is given the type of the
current value
else:#... if the value is None
NoneValues = True#None values exist for this field in the record
list
if valueType is str:#if type is string
minValue = None#set minValue to None (minValue and maxValue are
only for numeric types)
if len(value) > maxLength:#if current string lenght is bigger
than maxLength
maxLength = len(value)#Set maxLength value to current string
length
sampleValue = value#Sets sampleValue with the longest string
found
else:#... if not string type
if value > maxValue:#if current value bigger than maxValue
maxValue = value#Sets current value to maxValue
if value and value < minValue:#if value is not None AND smaller
than minValue
minValue = value#Sets new minValue with current value
if valueType is float and value != 0:#if value type is float and
not 0
precisionTemp = len(str(value - int(value)))-2
if precisionTemp > floatPrecision:#if the current length
after decimal point is bigger than previous
floatPrecision = precisionTemp#set current value to
precision
if valueType is float and floatPrecision == None:#if float could not be
determined because only 0.0 values were found
floatPrecision = 1#set precision to 1
if valueType is not float and floatPrecision != None:#if last value type
was not float but some float values were found
valueType = type(1.234)#set valueType to float
if valueType is str and maxLength == 0:#if value type found only ''
(empty) records
NoneValues = True#allow null values
if minValue == 999999999999:#if minValue was not set
minValue = None#then minValue is None
values[:] = [valueType, maxLength, maxValue, minValue, floatPrecision,
NoneValues, sampleValue]
return values

def AddFieldsPerTable():
"""Appends field list to each table."""
tables = ExtractTablesFromData(_data) #First extract list of tables
for i, table in enumerate(tables): #Then for each table in the list
fields = ExtractFieldNamesFromData(_data[i][1][0])#get field list
([i] as table index, [1][0] to reach field list)
tables[i].append(fields) #Appends the returned field list to current
table
return tables

def AddFieldsDetailsPerField():
"""Extend field list with details for each field."""
tables = AddFieldsPerTable()#First get table list
for iTable, table in enumerate(tables):#Then for each table
for iField, field in enumerate(table[1]):#...for each field in the
current table
values = ExtractFieldValuesFromData(_data[iTable][1][1],
iField)#Get field's details([iTable] as table index, [1][1] to reach records
list, iField to focus search on current field)
field.extend(values)#Extends the tables list with returned field
details
return tables

def AddNbOfRecordsPerTable():#Insert number of records per table.
"""Extend 'tables' details with number of records per table."""
tables = AddFieldsDetailsPerField()#get tables
for i, table in enumerate(tables):#for each table
nbOfRecords = len(_data[i][1][1])#get number of records ([i]=table
index, [1][1] = record list)
table.insert(1, nbOfRecords)#inserts the number of records in tables
list
return tables

def WriteFileTableFormat(fileName):#Creates the XML with 'tables' list
tables = AddNbOfRecordsPerTable()#get tables detailed list
f = open(fileName, 'w')
f.write("""<?xml version="1.0" encoding="ISO-8859-1"?>\n""")
f.write("<Root>\n")

for table in tables:
f.write("\t<table>\n")
f.write("\t\t<name>%s</name>\n" % table[0])
f.write("\t\t<nbOfRecords>%s</nbOfRecords>\n" % table[1])
for field in table[2][:]:
f.write("\t\t<field>\n")
f.write("\t\t\t<name>%s</name>\n" % field[0])
if str(field[1])[:7] == "<type '":
field[1] = str(field[1])[7:-2]
f.write("\t\t\t<pythonType>%s</pythonType>\n" % str(field[1]))
f.write("\t\t\t<maxLength>%s</maxLength>\n" % str(field[2]))
f.write("\t\t\t<maxValue>%s</maxValue>\n" % str(field[3]))
f.write("\t\t\t<minValue>%s</minValue>\n" % str(field[4]))
f.write("\t\t\t<floatPrecision>%s</floatPrecision>\n" %
str(field[5]))
f.write("\t\t\t<NoneValues>%s</NoneValues>\n" % str(field[6]))
f.write("\t\t\t<sampleValue>%s</sampleValue>\n" % str(field[7]))
f.write("\t\t\t<mysqlFieldType></mysqlFieldType>\n")
f.write("\t\t</field>\n")
f.write("\t</table>\n")

f.write("</Root>")
f.close

WriteFileTableFormat('EITablesFormat.xml')

############ <Help to understand '_data' structure>
#
# [['FirstTableName', (['FirstFieldName', 'nFieldName'],
[['FirstFieldFirstValue', 'nFieldFirstValue'],
# ['FirstFieldnValue', 'nFieldnValue']])], ['nTableName', (['etc..
# print _data[0][0] #[0]=FirstTable, [0]=TableName -> output :
'FirstTableName'
# print len(_data) #number of tables in 'data'
# print _data[0][1] #[0]=FirstTable, [1]=FieldList And Records
# print _data[0][1][0] #[0]=FirstTable, [1]=FieldList, [0]=FieldNames ->
output : ['FirstFieldName', 'nFieldName']
# print len(_data[0][1][0]) #number of fields in first table
# print _data[0][1][1] #[0]=FirstTable, [1]=FieldList, [1]=RecordList
# print len(_data[0][1][1]) #number of records in first table
# print _data[0][1][1][0][2] #[0]=firstTable, [1]=FieldList,
[1]=RecordList, [0] = First Record, [2]=Third Field Value
#
######################## </Help to understand '_data' structure>
############ <Final 'tables' variable format>
#
# The final 'tables' format used to build the XML should look like :
# ([tablename_1, nbOfRecords
# [
# [fieldname_1, pythonType, maxLength, maxValue, minValue,
floatPrecision, NoneValues, sampleValue],
# [fieldname_1, pythonType, maxLength, maxValue, minValue,
floatPrecision, NoneValues, sampleValue]
# ],
# [tablename_n,
# [
# [fieldname_1, ...]
# ]
# ])
#
######################## </Final 'tables' variable format>
#################################### </EIPyFormatToXML>

Jul 18 '05 #1

Subscribe Post Reply

2008

Peter Otten

Alexandre wrote:

Hi,

Im a newb to dev and python... my first sefl assigned mission was to read
a pickled file containing a list with DB like data and convert this to
MySQL... So i wrote my first module which reads this pickled file and
writes an XML file with list of tables and fields (... next step will the
module who creates the tables according to details found in the XML file).

If anyone has some minutes to spare, suggestions and comments would be
verry much appreciated to help me make my next modules better.. and not
starting with bad habit :)

I would suggest that you repost the script without the excessive comments.
Most programmers find Python very readable, your comments actually make it
harder to parse for the human eye.
Also, make it work if you can, or point to the actual errors that you cannot
fix yourself. Provide actual test data in your post instead of the
unpickling code, so that others can easily reproduce your errors.
Only then you should ask for improvements.

Random remarks:

Object oriented programming is about programming against interfaces, so
exessive type checking is a strong hint to design errors.

Avoid using global variables in your functions; rather pass them explicitly
as arguments.

Where is valuesD introduced?

"%s" % str(1.23)
is the same as
"%s" % 1.23

type(value) is not types.NoneType
is the same as
value is not None

Module level code is habitually wrapped like so:

if __name__ == "__main__":
infile = open('cached-objects-Python-pickled-sample', 'rb')
_data = pickle.load(infile)
infile.close()
# process _data...

That way, you can import the module as well as use it as a standalone
script.

I'm sure, there is more, but then again, clean up the comments, try to make
it work, and then repost.

Peter

Jul 18 '05 #2

Alexandre

"Peter Otten" <__*******@web.de> a écrit dans le message de news:bq*************@news.t-online.com...

I would suggest that you repost the script without the excessive comments.
Most programmers find Python very readable, your comments actually make it
harder to parse for the human eye.
Done, so your help will also make my next posts better :)
Also, make it work if you can, or point to the actual errors that you cannot
fix yourself. Provide actual test data in your post instead of the
unpickling code, so that others can easily reproduce your errors.
Only then you should ask for improvements.
Also available in my new version :)
Random remarks:

Object oriented programming is about programming against interfaces, so
exessive type checking is a strong hint to design errors.
I'm not sure i understand... well, let's put it that way : i'm sure i don't understand :)
I guess i'll have to read about OO design... this is my first program, although i think i understand what OO means, i'm
not abble to write OO yet :/
Avoid using global variables in your functions; rather pass them explicitly
as arguments.
I'll try that !
Where is valuesD introduced?
Not sure i understand the question... i used this variable name "ValuesD" (meaning Values from Data) not to conflict
with the other variable named "values" in the same function.
"%s" % str(1.23)
is the same as
"%s" % 1.23
Ok

type(value) is not types.NoneType
is the same as
value is not None
Shame on me :)
Module level code is habitually wrapped like so:

if __name__ == "__main__":
infile = open('cached-objects-Python-pickled-sample', 'rb')
_data = pickle.load(infile)
infile.close()
# process _data...

That way, you can import the module as well as use it as a standalone
script.
Oh ? but, i can use it as standalone script ?!
If i type myScriptName.py in my dos prompt the script is working !
I'm sure, there is more, but then again, clean up the comments, try to make
it work, and then repost.

I'm sure there's more :)
Thanks a lot for your comments Peter !
Best regards,
Alexandre

Jul 18 '05 #3

Alexandre

######################## <EIPyFormatToXML> ######################################
import sys
import types

_data = [['Table1',(['Field01','Field02','Field03',],
[['a string', 12345, 1.000123],
['a second string', None, 3406.3],
['', 64654564, 35]])],
['Table2', (['Field04', 'Field05'], [[None, -0.3]])],
['Table3',(['Field06', 'Field07', 'Field08'],
[['', 0, 0.001],
['', None, 646464.0],
['', 6546, 0.1],
['', -6444, 0.2],
['', 0, 0.3]])]]

def ExtractTablesFromData(data):
"""Extracts all the table names from the Dumped items data file and returns the list."""
tablesR = []
for tables in data:
tablesR.append([tables[0]])
return tablesR

def ExtractFieldNamesFromData(data):
"""Extract all fields from data list (the calling function defines for which table in 'data' argument)."""
fieldsR = []
for fields in data:
fieldsR.append([fields])
return fieldsR

def ExtractFieldValuesFromData(data, indexField):
"""Check each value of the field passed as argument to the function."""
values , floatPrecision, NoneValues = [], None, False
valueType, maxLength, maxValue, minValue = None, None, None, 999999999999
sampleValue = 'numeric value, check min and max values as sample'
for valuesD in data:
value = valuesD[indexField]
if type(value) is not types.NoneType:
valueType = type(value)
else:
NoneValues = True
if valueType is str:
minValue = None
if len(value) > maxLength:
maxLength = len(value)
sampleValue = value
else:
if value > maxValue:
maxValue = value
if value and value < minValue:
minValue = value
if valueType is float and value != 0:
precisionTemp = len(str(value - int(value)))-2
if precisionTemp > floatPrecision:
floatPrecision = precisionTemp
if valueType is float and floatPrecision == None:
floatPrecision = 1
if valueType is not float and floatPrecision != None:
valueType = type(1.234)
if valueType is str and maxLength == 0:
NoneValues = True
if minValue == 999999999999:
minValue = None
values[:] = [valueType, maxLength, maxValue, minValue, floatPrecision, NoneValues, sampleValue]
return values

def AddFieldsPerTable():
"""Appends field list to each table."""
tables = ExtractTablesFromData(_data)
for i, table in enumerate(tables):
fields = ExtractFieldNamesFromData(_data[i][1][0])
tables[i].append(fields)
return tables

def AddFieldsDetailsPerField():
"""Extend field list with details for each field."""
tables = AddFieldsPerTable()
for iTable, table in enumerate(tables):
for iField, field in enumerate(table[1]):
values = ExtractFieldValuesFromData(_data[iTable][1][1], iField)
field.extend(values)
return tables

def AddNbOfRecordsPerTable():
"""Extend 'tables' details with number of records per table."""
tables = AddFieldsDetailsPerField()
for i, table in enumerate(tables):
nbOfRecords = len(_data[i][1][1])
table.insert(1, nbOfRecords)
return tables

def WriteFileTableFormat(fileName):
tables = AddNbOfRecordsPerTable()
f = open(fileName, 'w')
f.write("""<?xml version="1.0" encoding="ISO-8859-1"?>\n""")
f.write("<Root>\n")

for table in tables:
f.write("\t<table>\n")
f.write("\t\t<name>%s</name>\n" % table[0])
f.write("\t\t<nbOfRecords>%s</nbOfRecords>\n" % table[1])
for field in table[2][:]:
f.write("\t\t<field>\n")
f.write("\t\t\t<name>%s</name>\n" % field[0])
if str(field[1])[:7] == "<type '":
field[1] = str(field[1])[7:-2]
f.write("\t\t\t<pythonType>%s</pythonType>\n" % str(field[1]))
f.write("\t\t\t<maxLength>%s</maxLength>\n" % str(field[2]))
f.write("\t\t\t<maxValue>%s</maxValue>\n" % str(field[3]))
f.write("\t\t\t<minValue>%s</minValue>\n" % str(field[4]))
f.write("\t\t\t<floatPrecision>%s</floatPrecision>\n" % str(field[5]))
f.write("\t\t\t<NoneValues>%s</NoneValues>\n" % str(field[6]))
f.write("\t\t\t<sampleValue>%s</sampleValue>\n" % str(field[7]))
f.write("\t\t\t<mysqlFieldType></mysqlFieldType>\n")
f.write("\t\t</field>\n")
f.write("\t</table>\n")

f.write("</Root>")
f.close

WriteFileTableFormat('EITablesFormat.xml')

#################################### </EIPyFormatToXML>

-> result xml

<?xml version="1.0" encoding="ISO-8859-1"?>

<Root>

<table>

<name>Table1</name>

<nbOfRecords>3</nbOfRecords>

<field>

<name>Field01</name>

<pythonType>str</pythonType>

<maxLength>15</maxLength>

<maxValue>None</maxValue>

<minValue>None</minValue>

<floatPrecision>None</floatPrecision>

<NoneValues>False</NoneValues>

<sampleValue>a second string</sampleValue>

<mysqlFieldType/>

</field>

<field>

<name>Field02</name>

<pythonType>int</pythonType>

<maxLength>None</maxLength>

<maxValue>64654564</maxValue>

<minValue>12345</minValue>

<floatPrecision>None</floatPrecision>

<NoneValues>True</NoneValues>

<sampleValue>numeric value, check min and max values as sample</sampleValue>

<mysqlFieldType/>

</field>

<field>

<name>Field03</name>

<pythonType>float</pythonType>

<maxLength>None</maxLength>

<maxValue>3406.3</maxValue>

<minValue>1.000123</minValue>

<floatPrecision>6</floatPrecision>

<NoneValues>False</NoneValues>

<sampleValue>numeric value, check min and max values as sample</sampleValue>

<mysqlFieldType/>

</field>

</table>

<table>

<name>Table2</name>

<nbOfRecords>1</nbOfRecords>

<field>

<name>Field04</name>

<pythonType>None</pythonType>

<maxLength>None</maxLength>

<maxValue>None</maxValue>

<minValue>None</minValue>

<floatPrecision>None</floatPrecision>

<NoneValues>True</NoneValues>

<sampleValue>numeric value, check min and max values as sample</sampleValue>

<mysqlFieldType/>

</field>

<field>

<name>Field05</name>

<pythonType>float</pythonType>

<maxLength>None</maxLength>

<maxValue>-0.3</maxValue>

<minValue>-0.3</minValue>

<floatPrecision>2</floatPrecision>

<NoneValues>False</NoneValues>

<sampleValue>numeric value, check min and max values as sample</sampleValue>

<mysqlFieldType/>

</field>

</table>

<table>

<name>Table3</name>

<nbOfRecords>5</nbOfRecords>

<field>

<name>Field06</name>

<pythonType>str</pythonType>

<maxLength>0</maxLength>

<maxValue>None</maxValue>

<minValue>None</minValue>

<floatPrecision>None</floatPrecision>

<NoneValues>True</NoneValues>

<sampleValue/>

<mysqlFieldType/>

</field>

<field>

<name>Field07</name>

<pythonType>int</pythonType>

<maxLength>None</maxLength>

<maxValue>6546</maxValue>

<minValue>-6444</minValue>

<floatPrecision>None</floatPrecision>

<NoneValues>True</NoneValues>

<sampleValue>numeric value, check min and max values as sample</sampleValue>

<mysqlFieldType/>

</field>

<field>

<name>Field08</name>

<pythonType>float</pythonType>

<maxLength>None</maxLength>

<maxValue>646464.0</maxValue>

<minValue>0.001</minValue>

<floatPrecision>3</floatPrecision>

<NoneValues>False</NoneValues>

<sampleValue>numeric value, check min and max values as sample</sampleValue>

<mysqlFieldType/>

</field>

</table>

</Root>

Jul 18 '05 #4

Alexandre

Object oriented programming is about programming against interfaces, so
exessive type checking is a strong hint to design errors.
I'm not sure i understand... well, let's put it that way : i'm sure i don't understand :)
I guess i'll have to read about OO design... this is my first program, although i think i understand what OO means,

i'm not abble to write OO yet :/

ok... a little shower always help, now i understand your remark :)
The thing is, this first module is retrieving '_data' from another app.
This other app is not written to share '_data' with other apps.

So my first module should be the only one which deals with types checking because those data will then be available to
my app from a DB (... once i've written the second module :)

Thx again and regards,
Alexandre

Jul 18 '05 #5

Peter Otten

Alexandre wrote:

[Peter]

Object oriented programming is about programming against interfaces, so
exessive type checking is a strong hint to design errors.
[Alexandre]
I'm not sure i understand... well, let's put it that way : i'm sure i
don't understand :) I guess i'll have to read about OO design... this is
my first program, although i think i understand what OO means, i'm not
abble to write OO yet :/

Type checking is generally best avoided, e. g. if you test for

isinstance(mystream, file)

this may fail on mystream objects that have all the methods needed to
replace a file in subsequent code and you thus unnecessarily limit its
usage.
As your script is explicitly in the "type checking business", my remark was
a bit off, though.
Where is valuesD introduced?

Not sure i understand the question... i used this variable name "ValuesD"
(meaning Values from Data) not to conflict with the other variable named
"values" in the same function.

I spotted an error where there was none - nothing to understand here :-(

Module level code is habitually wrapped like so:

if __name__ == "__main__":
infile = open('cached-objects-Python-pickled-sample', 'rb')
_data = pickle.load(infile)
infile.close()
# process _data...

That way, you can import the module as well as use it as a standalone
script.

Oh ? but, i can use it as standalone script ?!
If i type myScriptName.py in my dos prompt the script is working !

Yes, but you can *only* use it as a standalone script. If you import it into
another module, it will try to read the pickled data from the hard-coded
file before you can do anything else.
Peter

Jul 18 '05 #6

newb comment request

Similar topics