473,796 Members | 2,911 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Is it just me, or is Sqlite3 goofy?

Probably just me. I've only been using Access and SQL Server
for 12 years, so I'm sure my opinions don't count for anything.

I was, nevertheless, looking forward to Sqlite3. And now
that gmpy has been upgraded, I can go ahead and install
Python 2.5.

So I open the manual to Section 13.13 where I find the first
example of how to use Sqlite3:

<code>
conn = sqlite3.connect (':memory:')

c = conn.cursor()

# Create table
c.execute('''cr eate table stocks
(date timestamp, trans varchar, symbol varchar,
qty decimal, price decimal)''')

# Insert a row of data
c.execute("""in sert into stocks
values ('2006-01-05','BUY','RHAT ',100,35.14)""" )
</code>

Seems pretty simple, yet I was befuddled and bewildered
by this example. So much so that I added a query to see exactly
what was going on.

<code>
# added by me
c.execute('sele ct * from stocks')
d = c.fetchone()
for t in d:
print type(t),t
</code>

Original code - what's wrong with this picture?
<type 'unicode'2006-01-05
<type 'unicode'BUY
<type 'unicode'RHAT
<type 'int'100
<type 'float'35.14

Why is the date returning as a string? Aren't the built in
converters supposed to handle that? Yes, if you enable
detect_types.

Added detect_types=sq lite3.PARSE_DEC LTYPES

Traceback (most recent call last):
File "C:\Python25\sq lite_first_exam ple.py", line 30, in <module>
c.execute('sele ct * from stocks')
File "C:\Python25\li b\sqlite3\dbapi 2.py", line 66, in
convert_timesta mp
datepart, timepart = val.split(" ")
ValueError: need more than 1 value to unpack

Aha, that explains why they weren't enabled.

This failed because
- the value inserted was wrong format?
- and the builtin converter can't split it cuz it has no spaces?
when it worked it was because
- detect_types was not set, so converter not invoked when queried?

Yes, the format in the example was datetime.date and the field type
should have been cast [date], not [timestamp] which needs HH:MM:SS
for the converter to work properly (but only if detect_types
enabled).

If a correct format was inserted, converter would have worked

<type 'datetime.datet ime'2006-09-04 13:30:00
<type 'unicode'BUY
<type 'unicode'RHAT
<type 'int'100
<type 'float'35.14

Or, had the field been properly cast as [date] instead of [timestamp]
it would also have worked.

<type 'datetime.date' 2006-09-04
<type 'unicode'BUY
<type 'unicode'RHAT
<type 'int'100
<type 'float'35.14

Ah, this now partly explains the original result, since detect_types
is off by default, the field cast, not being a native sqlite3 type
was ignored and the data queried back as TEXT.

<type 'unicode'2006-09-04
<type 'unicode'BUY
<type 'unicode'RHAT
<type 'int'100
<type 'float'35.14

Ok, next issue, what the fuck are [varchar] and [decimal]?
They are certainly not Sqlite3 native types. If they are
user defined types, where are the converters and adapters?
Does Sqlite3 simply ignore a cast that isn't registered?

Note that although both qty and price were cast as [decimal]
they queried back as int and float.

Note also that it's "obvious" from the query example on page 13.13
that the example is FUBAR
- the date is a unicode string, not a datetime.date type
- the price is binary floating point, not decimal

<quote>
This example uses the iterator form:
>>c = conn.cursor()
c.execute('se lect * from stocks order by price')
for row in c:
... print row
...
(u'2006-01-05', u'BUY', u'RHAT', 100, 35.140000000000 001)
(u'2006-03-28', u'BUY', u'IBM', 1000, 45.0)
(u'2006-04-06', u'SELL', u'IBM', 500, 53.0)
(u'2006-04-05', u'BUY', u'MSOFT', 1000, 72.0)
</quote>

Here we have an example of things apparently working
for the wrong reason. A classic example of the programmer
who *thinks* he knows how it works because he wrote it.
This kind of sloppiness wouldn't last 5 minutes in a production
environment.

But why did Sqlite3 make qty an int and price a float?
Hard to say since THE FURTHER EXAMPLES IN THE DOCS don't
even bother to cast the field types. I guess I'm supposed
to guess how things are supposed to work. Can I trust that
default settings will be what I want?

Ha! Can I trust the baby with a hammer?

First, note that the script example in section 13.13.3

<quote>
import sqlite3

con = sqlite3.connect (":memory:")
cur = con.cursor()
cur.executescri pt("""
create table person(
firstname,
lastname,
age
);

create table book(
title,
author,
published
);

insert into book(title, author, published)
values (
'Dirk Gently''s Holistic Detective Agency
'Douglas Adams',
1987
);
""")
</quote>

contains not one but TWO syntax errors! A single quote after
the word Agency is missing as is the comma that should be at
the end of that line. Seems that no one actually ever tried
this example.

That's easily fixed. But I was curious about why the fields
don't have type casts. After the previous debacle and
knowing that this code was never tested, I am not going to
assume it works. Better add a query to make sure.

cur.execute("se lect title, author, published from book")
d = cur.fetchall()
for i in d: print i
print

(u"Dirk Gently's Holistic Detective Agency", u'Douglas Adams', 1987)

Ok, so if not cast, the fields must default (and probably also when
a cast is made that hasn't been defined).

But watch this: being clueless (but not stupid) is a gift I have
for troubleshooting . I tried (incorrectly) to insert another record:

cur.execute("in sert into book(title, author, published) values ('Dirk
Gently''s Holistic Detective Agency','Dougla s Adams','1987')" )

(u"Dirk Gently's Holistic Detective Agency", u'Douglas Adams', 1987)
(u"Dirk Gently's Holistic Detective Agency", u'Douglas Adams', u'1987')

Uhh...how can a database have a different field type for each record?

Simple, without a cast when the table is created, the field type is
whatever you insert into it. That's how the default must work,
each record has a data structure independent of every other record!

Wow. Just think of the kind of bugs *that* must cause.

Bugs?

Here's MY example, creating a Cartesian Product

<code>
import sqlite3
letters = [(2,),('10',),(' 20',),(200,)]
con = sqlite3.connect (":memory:")
con.text_factor y = str
con.execute("cr eate table letter(c integer)")
con.executemany ("insert into letter(c) values (?)", letters)
print 'Queried: ',
for row in con.execute("se lect c from letter"):
print row,
print
print
print 'Sorted: ',
for row in con.execute("se lect c from letter order by c"):
print row[0],
print
print
print 'Cartesian Product: ',
for row in con.execute("se lect a.c, b.c, c.c from letter as a, letter
as b, letter as c"):
print row[0]+row[1]+row[2],
</code>

Note that the list of data to be inserted contains both strings and
ints. But because the field was correctly cast as [integer], Sqlite3
actually stored integers in the db. We can tell that from how the
"order by" returned the records.

Queried: (2,) (10,) (20,) (200,)

Sorted: 2 10 20 200

Cartesian Product: 6 14 24 204 14 22 32 212 24 32 42 222 204 212
222 402 14 22 32 212 22 30 40 220 32 40 50 230 212 220 230 410 24
32 42 222 32 40 50 230 42 50 60 240 222 230 240 420 204 212 222
402 212 220 230 410 222 230 240 420 402 410 420 600

Because if I cast them as [text] the sort order changes (and my
Cartesian Product becomes concatenation instead of summation).

Queried: ('2',) ('10',) ('20',) ('200',)

Sorted: 10 2 20 200

Cartesian Product: 222 2210 2220 22200 2102 21010 21020 210200
2202 22010 22020 220200 22002 220010 220020 2200200 1022 10210
10220 102200 10102 101010 101020 1010200 10202 102010 102020
1020200 102002 1020010 1020020 10200200 2022 20210 20220 202200
20102 201010 201020 2010200 20202 202010 202020 2020200 202002
2020010 2020020 20200200 20022 200210 200220 2002200 200102
2001010 2001020 20010200 200202 2002010 2002020 20020200 2002002
20020010 20020020 200200200

But if I omit the cast altogether, then the db stores the input
exactly as it was inserted, so the c field contains both
text and integers wreaking havoc with my sort order, making
records un-queryable using "where" and causing my Cartesian
Product to crash.

Queried: (2,) ('10',) ('20',) (200,)

Sorted: 2 200 10 20

Cartesian Product: 6

Traceback (most recent call last):
File "C:\Python25\us er\sqlite_test2 .py", line 225, in <module>
print row[0]+row[1]+row[2],
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Yeah, I know, I've heard it before.

"This behavior is by design."

It's still fuckin' goofy.

Sep 5 '06
66 7087
Magnus Lycka wrote:
While I can understand your frustration, I think it is
important to think about the tone in our postings here.
Hydrocephalus is one of the most common birth defects,
and it's not terribly unlikely that someone who reads
this has a family member or someone else in his proximity
who suffers from this condition.
Ok, I appologize. No more Larry the Cable Guy humor.
I'll stick with Monty Python and Douglas Adams.
>
me********@aol. com wrote:
"Fixed"? Up until now, I didn't think it was possible for
crackpot theories to be implemented in computer science.
This is absolutely the craziest thing I've ever heard.

Still, many people with lots of experience in databases
use it, and prefer it for certain kinds of applications.
All systems have limitations and deviations, and those
limitations and deviations are stated more clearly for
SQLite than for most commercial products at least.
But they're not stated up front. The deviations from
standard SQL are extreme enough to warrant mentioning
in the Python docs.
The
market leader Oracle still can't store empty strings in
VARCHAR fields for instance. They are silently converted
to NULL. I'm pretty sure that has been in clear violation
to the official spec since 1986 at least.
But does Oracle claim the specification is a bug?
>
As far as I understand, noone here is forcing you to use
SQLite,
As long as it's included in the standard library, I'm going
to use it. There is nothing wrong with the idea of a "lite"
database. It is very misleading, though, to claim it's SQL.
and with your long experience of MS Access I'd
expect you to be fairly used to "almost SQL"... It's
some time since I used Jet/Access now, but I had much
more problems with that than I've had with SQLite.

SQLite is built in Tcl, by someone who appreciates the
way Tcl works, with its weak typing. I don't think Tcl's
type handling is nearly as clever as Python's, but I
think it's a good thing that Python's standard lib finally
has a DB-API compliant module, and while I would have
preferred something that was closer to standard SQL, I
don't know of a better candidate than SQLite.

It's good that it's usable without a server setup, and
that it's very light weight. A Jet engine is obviously
not an option, and I would have preferred SQLite even
if Jet was open source and worked on all platforms.
(Well, if JET *was* open source, I suspect it would
have been fixed by now.) It's possible that one could
have used the embedded version of Firebird instead, but
in my experience that's not nearly as lean or easy to
deploy.

With your long experience of Access and SQL Server I'm
sure you know well that any attempt to build a working
database application requires extensive knowledge of
the backend to understand its peculiarities and
limitations.

The list of software projects where not quite competent
developers built Access applications that worked ok in
small scale tests and failed catastrophicall y in real
life is looong...

Of course, if you've stayed with one vendor for 15 years,
I can imagine that you've forgotten how long it took you
Having worked with half a dozen backends or so, I'm no
longer surprised that SQL can be interpreted in so many
ways... I agree that SQLite is unique in it's approach
to typing, but if you are aware of this, it's really not
a big problem.
Ok, I'm now aware of it.

I'm aware that all my WHERE clauses will have to be
modified to allow for text that may show up in my numeric
fields.

I'm aware that all my ORDER BY clauses will have to
be modified for the same reason.

And I'm aware that INNER JOIN doesn't even work in
theory with dynamic typed fields and can't even be
worked around like the WHERE and ORDER BY clauses
can.

And given a typical query I use:

SELECT tblLocations.Si teID,
tblSites.SiteNa me,
tblLocations.IE PALocationNumbe r,
tblZones.Zone,
tblSampleEvent. SampleEventCode ,
tblSampleAnalys es.SampleDate,
tblSamples.Samp leMatrixID,
tblSamples.Samp leNumber,
tblRefQAQCTypes .QAQCType,
tblResults.LabS ampleNumber,
tblRefParameter Groups.Paramete rGroupCode,
tblSampleAnalys es.AnalysisID,
tblRefAnalyses. AnalysisTypeCod e,
tblRefParameter s.ParameterRepo rtLabel,
tblRefParameter s.CASNumber,
tblResults.Resu lt,
tblResults.Qual ifier,
tblRefUnitOfMea sure.Unit,
Val(Format$(IIf (([tblResults].[unitid]=5) Or
([tblResults].[unitid]=7),
[result]/1000,[result]),
"0.0000"))
AS the_ppm_result,
IIf([tblResults].[unitid]=7,"mg/kg",
IIf([tblResults].[unitid]=5,"mg/L",
[unit]))
AS the_ppm_unit,
Val(Format$(IIf (([tblResults].[unitid]=5) Or
([tblResults].[unitid]=7),
[quantitationlim it]/1000,
[quantitationlim it]),"0.0000"))
AS the_ppm_dl,
IIf(IsNull([qualifier]) Or
([qualifier] Not Like "*U*"),1,0)
AS detect,
IIf([detect] And
[the_ppm_result]>[tbl_GW_classi],"1","0")
AS x,
IIf([detect] And
[the_ppm_result]>[tbl_GW_classi],"!","_")
AS xflag,
Master_Lookup.T bl_pH_range,
Master_Lookup.T bl_GW_units,
Master_Lookup.T bl_GW_ClassI,
tblResults.Comm ent,
IIf([detect],[the_ppm_result],[the_ppm_result]/2)
AS detected_result ,
IIf([detect],[the_ppm_result],Null)
AS detected_max_re sult
FROM tblSites
INNER JOIN ((tblSampleEven t
INNER JOIN (tblRefQAQCType s
INNER JOIN ((tblSamples
INNER JOIN tblLocations
ON tblSamples.True Location = tblLocations.Lo cationID)
INNER JOIN tblZones
ON tblLocations.Lo cationID = tblZones.Locati onID)
ON tblRefQAQCTypes .QAQCID = tblSamples.QAQC ID)
ON tblSampleEvent. EventID = tblSamples.Even tID)
INNER JOIN ((tblRefAnalyse s
INNER JOIN tblSampleAnalys es
ON tblRefAnalyses. AnalysisID = tblSampleAnalys es.AnalysisID)
INNER JOIN (tblRefUnitOfMe asure
INNER JOIN ((tblRefParamet erGroups
INNER JOIN (tblRefParamete rs
LEFT JOIN Master_Lookup
ON tblRefParameter s.CASNumber = Master_Lookup.T bl__B_cas)
ON tblRefParameter Groups.Paramete rGroupID =
tblRefParameter s.ParameterGrou pID)
INNER JOIN tblResults
ON tblRefParameter s.ParameterID = tblResults.Para meterID)
ON tblRefUnitOfMea sure.UnitID = tblResults.Unit ID)
ON tblSampleAnalys es.SampleAnalys isID = tblResults.Samp leAnalysisID)
ON tblSamples.Samp leID = tblSampleAnalys es.SampleID)
ON tblSites.SiteID = tblLocations.Si teID
WHERE (((tblLocations .SiteID)<51)
AND (Not (tblLocations.I EPALocationNumb er)="G116")
AND ((tblZones.Zone )="UMAquifer" )
AND ((tblSampleEven t.SampleEventCo de) Like "200[2-6]Q*")
AND ((tblSamples.Sa mpleMatrixID)=1 )
AND ((tblSampleAnal yses.AnalysisID )>1)
AND ((tblResults.Qu alifier) Is Null)
AND ((Master_Lookup .Tbl_pH_range)= "pH 6.9 to 7.24" Or
(Master_Lookup. Tbl_pH_range) Is Null)
AND ((tblSamples.QA QCID)=1 Or (tblSamples.QAQ CID)=4))
OR (((tblLocations .SiteID)<51)
AND (Not (tblLocations.I EPALocationNumb er)="G116")
AND ((tblZones.Zone )="UMAquifer" )
AND ((tblSampleEven t.SampleEventCo de) Like "200[2-6]Q*")
AND ((tblSamples.Sa mpleMatrixID)=1 )
AND ((tblSampleAnal yses.AnalysisID )>1)
AND ((tblResults.Qu alifier) Not Like "*Z*" And
(tblResults.Qua lifier) Not Like "*R*")
AND ((Master_Lookup .Tbl_pH_range)= "pH 6.9 to 7.24" Or
(Master_Lookup. Tbl_pH_range) Is Null)
AND ((tblSamples.QA QCID)=1 Or
(tblSamples.QAQ CID)=4))
ORDER BY tblLocations.Si teID,
tblLocations.IE PALocationNumbe r,
tblSampleEvent. SampleEventCode ,
tblRefParameter Groups.Paramete rGroupCode,
tblRefParameter s.ParameterRepo rtLabel,
tblSampleAnalys es.SampleDate;

you're saying I don't have a big problem?

Sep 8 '06 #21
"me********@aol .com" <me********@aol .comwrites:
As long as it's included in the standard library, I'm going
to use it. There is nothing wrong with the idea of a "lite"
database. It is very misleading, though, to claim it's SQL.
Maybe it could be renamed by changing the "t" in "lite" to "k".
Sep 8 '06 #22
In <11************ **********@m73g 2000cwd.googleg roups.com>,
me********@aol. com wrote:
Ben Sizer wrote:
>I don't think so... it doesn't take much to say that the module
implements a subset of SQL but stores ignores data types.

So I can't just say

WHERE qty=12

I have to say

WHERE (qty=12) OR (qty="12")
No you just write the first and don't stuff strings into that column.
It's your choice after all.
Do you know what INNER JOIN means?

Do you know how important it is to a relational database?

Can you explain how an INNER JOIN can even work, in theory,
with dynamic data types?
The same as with static types!? Where's the problem?

table_a:

id name
----- ----------
42 bob
'foo' methusalem
'42' gaylord

table_b:

id age
----- ------
42 23
'foo' 1000
0.5 None

Join on `id`:

id name age
----- ---------- ----
42 bob 23
'foo' methusalem 1000
The implications of violating the SQL Language Specification
are much more serious than you think.
Or not so serious than you think.

Ciao,
Marc 'BlackJack' Rintsch
Sep 8 '06 #23
me********@aol. com wrote:
Do you know what INNER JOIN means?

Do you know how important it is to a relational database?

Can you explain how an INNER JOIN can even work, in theory,
with dynamic data types?
Let's stop the pissing contest and just see how it works. After all,
this is Python and we can easily just try it out. Here's my example.
Please tell me how this causes unexpected results, and why it isn't SQL.
Please modify my example to get it to cause a catastrophe, and post it
here so we can see the errors of our ways and be properly humbled.

#-- Preliminaries:
>>from pysqlite2 import dbapi2 as sqlite
con = sqlite.connect( "test.db")
cur = con.cursor()
#-- Create 3 tables for a M:M relationship between customers
#-- and categories:
>>cur.execute(" create table customers (id integer primary key
autoincrement, name char)")
>>cur.execute(" create table categories (id integer primary key
autoincrement, name char)")
>>cur.execute(" create table cust_cat (id integer primary key
autoincrement, cust_id integer, cat_id integer)")

#-- Insert some test data into customer and categories:
>>cur.execute(" insert into customers (name) values ('Ziggy Marley')")
cur.execute(" insert into customers (name) values ('David Bowie')")
cur.execute(" insert into categories (name) values ('Glam Rock')")
cur.execute(" insert into categories (name) values ('Nuevo Reggae')")
cur.execute(" insert into categories (name) values ('Male Singers')")
cur.execute(" select * from customers")
#-- Take a look at the data (and retrieve the pk's):
>>cur.fetchall( )
[(1, u'Ziggy Marley'), (2, u'David Bowie')]
>>cur.execute(" select * from categories")
cur.fetchall( )
[(1, u'Glam Rock'), (2, u'Nuevo Reggae'), (3, u'Male Singers')]

#-- Relate some customers to some categories. Note how I send strings
#-- in some places and ints in others:
>>cur.execute(" insert into cust_cat (cust_id, cat_id) values (1, 3)")
cur.execute(" insert into cust_cat (cust_id, cat_id) values (1, '2')")
cur.execute(" insert into cust_cat (cust_id, cat_id) values ('2', '1')")
cur.execute(" insert into cust_cat (cust_id, cat_id) values ('2', 3)")
#-- Run some queries:
>>cur.execute(" select customers.id as cust_id, customers.name as
cust_name, categories.id as cat_id, categories.name as cat_name from
customers inner join cust_cat on cust_cat.cust_i d = customers.id inner
join categories on categories.id = cust_cat.cat_id order by 2,4")
>>cur.fetchall( )
[(2, u'David Bowie', 1, u'Glam Rock'), (2, u'David Bowie', 3, u'Male
Singers'), (1, u'Ziggy Marley', 3, u'Male Singers'), (1, u'Ziggy
Marley', 2, u'Nuevo Reggae')]
>>cur.execute(" select customers.id as cust_id, customers.name as
cust_name, categories.id as cat_id, categories.name as cat_name from
customers inner join cust_cat on cust_cat.cust_i d = customers.id inner
join categories on categories.id = cust_cat.cat_id where categories.id =
1 order by 2,4")
>>cur.fetchall( )
[(2, u'David Bowie', 1, u'Glam Rock')]
>>cur.execute(" select customers.id as cust_id, customers.name as
cust_name, categories.id as cat_id, categories.name as cat_name from
customers inner join cust_cat on cust_cat.cust_i d = customers.id inner
join categories on categories.id = cust_cat.cat_id where categories.id =
'1' order by 2,4")
>>cur.fetchall( )
[(2, u'David Bowie', 1, u'Glam Rock')]
>>cur.execute(" select customers.id as cust_id, customers.name as
cust_name, categories.id as cat_id, categories.name as cat_name from
customers inner join cust_cat on cust_cat.cust_i d = customers.id inner
join categories on categories.id = cust_cat.cat_id where categories.id =
'2' order by 2,4")
>>cur.fetchall( )
[(1, u'Ziggy Marley', 2, u'Nuevo Reggae')]
>>cur.execute(" select customers.id as cust_id, customers.name as
cust_name, categories.id as cat_id, categories.name as cat_name from
customers inner join cust_cat on cust_cat.cust_i d = customers.id inner
join categories on categories.id = cust_cat.cat_id where categories.id =
'3' order by 2,4")
>>cur.fetchall( )
[(2, u'David Bowie', 3, u'Male Singers'), (1, u'Ziggy Marley', 3, u'Male
Singers')]
>>cur.execute(" select customers.id as cust_id, customers.name as
cust_name, categories.id as cat_id, categories.name as cat_name from
customers inner join cust_cat on cust_cat.cust_i d = customers.id inner
join categories on categories.id = cust_cat.cat_id where categories.id =
3 order by 2,4")
>>cur.fetchall( )
[(2, u'David Bowie', 3, u'Male Singers'), (1, u'Ziggy Marley', 3, u'Male
Singers')]

If I have skipped the test case that will fail, please enlighten me.

--
Paul McNett
http://paulmcnett.com
http://dabodev.com

Sep 8 '06 #24
Marc 'BlackJack' Rintsch wrote:
In <11************ **********@m73g 2000cwd.googleg roups.com>,
me********@aol. com wrote:
Ben Sizer wrote:
I don't think so... it doesn't take much to say that the module
implements a subset of SQL but stores ignores data types.
So I can't just say

WHERE qty=12

I have to say

WHERE (qty=12) OR (qty="12")

No you just write the first and don't stuff strings into that column.
It's your choice after all.
Not when I don't control the source of the data.
Despite the laboratory having been given a static
type definition for data deliverables, I sporadically
find text in my numeric fields.

And surely you don't think I write INSERT statements
for each of the over 500,000 records I have? The records
are inserted programatically .

Static types mean the exceptions...ra ise exceptions.
How am I supposed to identify the exceptions if SQLite
simply quietly converts the non-numeric data to text?
>
Do you know what INNER JOIN means?

Do you know how important it is to a relational database?

Can you explain how an INNER JOIN can even work, in theory,
with dynamic data types?

The same as with static types!? Where's the problem?

table_a:

id name
----- ----------
42 bob
'foo' methusalem
'42' gaylord

table_b:

id age
----- ------
42 23
'foo' 1000
0.5 None

Join on `id`:

id name age
----- ---------- ----
42 bob 23
'foo' methusalem 1000
Numbers JOINing numbers and text JOINing text doesn't illustrate
the problem. The problem is when SQLite allows bad data to be
inserted. The problem isn't that fields JOIN when they are not
supposed to, it's when they fail to JOIN when they are supposed
to.
>
The implications of violating the SQL Language Specification
are much more serious than you think.

Or not so serious than you think.
Well, I can make up examples also.

Bill for customer: Tom Smith
------------------------------------------------------------------
0003 Olsen's Standard Book of British Birds (Expurgated) $ 99.95
Bill for customer: Dick Smith
------------------------------------------------------------------
0002 A Sale of Two Titties $ 20.00
0005 David Coperfield $ 3.95
Bill for customer: Harry Smith
------------------------------------------------------------------
0001 101 Ways to Start A Fight $ 19.95
Hmm...looks like item 4 isn't being billed to anyone. That's no
way to run a business. If I do a LEFT JOIN instead of an INNER JOIN:
Bill for customer: None None
------------------------------------------------------------------
0004 Ethel the Aardvark Goes Quantity Surveying $ 9.99
Bill for customer: Tom Smith
------------------------------------------------------------------
0003 Olsen's Standard Book of British Birds (Expurgated) $ 99.95
Bill for customer: Dick Smith
------------------------------------------------------------------
0002 A Sale of Two Titties $ 20.00
0005 David Coperfield $ 3.95
Bill for customer: Harry Smith
------------------------------------------------------------------
0001 101 Ways to Start A Fight $ 19.95
I see the missing item, but still don't know who to bill it to.
What if I dumped the invoice table?

4
54
Ethel the Aardvark Goes Quantity Surveying
None
1975
9.99

3
001
Olsen's Standard Book of British Birds (Expurgated)
Olsen
None
99.95

2
032
A Sale of Two Titties
Charles Dikkens
1855
20.0

5
032
David Coperfield
Edmund Wells
1955
3.95

1
066
101 Ways to Start A Fight
some Irish gentleman
1919
19.95

Aha, it was sold to customer 54, so now I just need to dump
the customer table:

001
Tom
Smith
42

032
Dick
Smith
28

066
Harry
Smith
66

only to discover there is no customer 54! How can that happen?

invoices = [(1,'066','101 Ways to Start A Fight','some Irish
gentleman',1919 ,19.95), \
(2,'032','A Sale of Two Titties','Charl es Dikkens',1855,2 0.00), \
(3,'001',"Olsen 's Standard Book of British Birds
(Expurgated)"," Olsen",None,99. 95), \
(4,066,'Ethel the Aardvark Goes Quantity Surveying',None ,1975,9.99), \
(5,'032','David Coperfield','Ed mund Wells',1955,3.9 5)]

Oops! Forgot the quotes around the customer id for item 4.
But why didn't it become 66? Because the leading 0 made
it octal! A little static typing would have helped here.

Now, IF the same error is repeated in EVERY table that's JOINed,
THEN the dynamic typing won't be a problem. But that never happens
in practice. I populate one table and ensure it's correct. Someone
else populates the other table and screws it up so the JOIN never
works.

Sure, errors happen with static typing. After all, the values still
have to match. Dynamic typing allows for more potential errors and,
thanks to Murpy's Law, I will have a much bigger problem with data
integrity.
>
Ciao,
Marc 'BlackJack' Rintsch
Sep 9 '06 #25
Ben Sizer wrote:
It's not a crackpot theory. It's a completely reasonable theory. SQL is
based on relational algebra, which provides a mathematical set of
operators for grouping data that is stored in separate sets. That data
is selected and projected according to its value, and nothing else. The
concept of it having a 'type' has been overlaid on top of this,
presumably to facilitate efficient implementation, which tends to
require fixed-width rows (and hence columns). It's not necessary in any
sense, and it's reasonable to argue that if it was trivial to implement
variable width columns as efficiently as fixed width columns, that
explicit data types might never have needed to exist.
The mathematical definition of the relational model includes
that data values are drawn from specific sets. Implementing
variable width columns has nothing to do with it. Here's the
reference:

1.3. A Relational View of Data

The term relation is used here in its accepted mathematical
sense. Given sets S1, S2, ···, Sn, (not necessarily
distinct), R is a relation on these n sets if it is a set
of n-tuples each of which has its first element from S1,
its second element from S2, and so on [1]. We shall refer to
Sj as the jth domain of R.

[Footnote 1:] More concisely, R is a subset of the Cartesian
product S1* S2 * ... * Sn.

-- E. F. Codd. "A Relational Model of Data for Large Shared
Data Banks." /Communications of the ACM/, Vol. 13, No. 6,
June 1970, pp. 377-387.
--
--Bryan
Sep 9 '06 #26
me********@aol. com wrote:
Sure, errors happen with static typing. After all, the values still
have to match. Dynamic typing allows for more potential errors and,
thanks to Murpy's Law, I will have a much bigger problem with data
integrity.
If this was a java or c++ list, all this rant would be more
understandable, but bashing dynamic typing in a dynamic language list
seems pointless at best (as this has been beaten to death over and over
again), flamebait at worst. It should be clear by now that there are
two (at least) alternatives:
1. Validate the data in python before (or at the same time when)
feeding the DB.
2. Forget sqlite and use a statically typed DBMS; it's not like there
is a shortage of them.

Massaging your SQL statements to make up for the lack of type checking
(even if this is always possible) would be a bad idea for more than
one reasons (complexity,por tability,perfor mance), so you'd better not
go down this road.

George

Sep 9 '06 #27
It's not a bug, it's a feature. And answered as third point in the
FAQ:
>
http://www.sqlite.org/faq.html#q3

I think your whole experience is based on it. Live with it or use a
real RDBMS.

If you are so fond of static typing, why are you using Python in the first
place? Just see it as consistency -- dynamically typed language →
dynamically typed DB columns. ;-)

Ciao,
Marc 'BlackJack' Rintsch
I have to admit I find this bogus too. It has by no means anything to
do with static typing but letting "errors pass silently" i.e.
deactivating runtime type checks as well. The problem here is that
fields are not dynamically type checked but completely untyped and only
coercion hints are present. Using a clever coercion / "type-affinity"
does not justify that there is no error case handling when the coercion
fails. This might be handled by user code ( or better by the pysqlite
wrapper ) but it appears to be redundant.

Sep 9 '06 #28

George Sakkis wrote:
me********@aol. com wrote:
Sure, errors happen with static typing. After all, the values still
have to match. Dynamic typing allows for more potential errors and,
thanks to Murpy's Law, I will have a much bigger problem with data
integrity.

If this was a java or c++ list, all this rant would be more
understandable, but bashing dynamic typing in a dynamic language list
seems pointless at best (as this has been beaten to death over and over
again), flamebait at worst.
But I'm not bashing Python's use of dynamic typing. But if the
SQL Language Specification says static typing, then static typing
it is. Period.
It should be clear by now that there are
two (at least) alternatives:
1. Validate the data in python before (or at the same time when)
feeding the DB.
Data integrity is an issue even with static typing. It's a bigger
issue with dynamic typing.
2. Forget sqlite and use a statically typed DBMS; it's not like there
is a shortage of them.
I have no intention of forgetting sqlite simply because it's
now part of the standard library. I have now qualms about
using it *now* because I understand it better. But reaching
that level of understanding was like pulling teeth.

Documentation shouldn't talk down to the reader. It's always
bad when you confuse the smart people. The ignorant are
supposed to be confused. It's job of the documentation to
educate the ignorant. Hiding the idiosynchrocies of Sqlite3
from the user who's already familiar with SQL is simply
unacceptable.
>
Massaging your SQL statements to make up for the lack of type checking
(even if this is always possible) would be a bad idea for more than
one reasons (complexity,por tability,perfor mance), so you'd better not
go down this road.

George
Sep 9 '06 #29

Dennis Lee Bieber wrote:
On 8 Sep 2006 16:46:03 -0700, "me********@aol .com" <me********@aol .com>
declaimed the following in comp.lang.pytho n:

<snip>

After a sequence of hypothetical results of occult SQL you show
this...

invoices = [(1,'066','101 Ways to Start A Fight','some Irish
gentleman',1919 ,19.95), \
(2,'032','A Sale of Two Titties','Charl es Dikkens',1855,2 0.00), \
(3,'001',"Olsen 's Standard Book of British Birds
(Expurgated)"," Olsen",None,99. 95), \
(4,066,'Ethel the Aardvark Goes Quantity Surveying',None ,1975,9.99), \
(5,'032','David Coperfield','Ed mund Wells',1955,3.9 5)]

... A Python list of tuples!
Oops! Forgot the quotes around the customer id for item 4.
But why didn't it become 66? Because the leading 0 made
it octal! A little static typing would have helped here.
Do you routinely populate your databases by editing python lists of
tuples?
I don't routinely do anything, as I've only been playing with it for
4 days.
And if you do, why not show us the table definition and insert
statements that go with the above data?
I was simply following the examples in the Python manuals.
If the examples are poor usage, maybe they shouldn't be in
the manuals.
>
Or do you get the data from some file... What format is that file?
No file. Point is moot.
>
Would you agree that the following is a clean representation of your
example data, when considered as a CSV data source? (I left off the
primary key -- we'll just let SQLite add that value). I DID, however,
normalize the quoting by changing ' to ". (please ignore the line wrap
on the (Expurgated) -- it does not exist in the data file)

invoice.csv
-=-=-=-=-=-
"066","101 Ways to Start A Fight","some Irish gentleman",1919 ,19.95
"032","A Sale of Two Titties","Charl es Dikkens",1855,2 0.00
"001","Olse n"s Standard Book of British Birds
(Expurgated)"," Olsen",None,99. 95
066,"Ethel the Aardvark Goes Quantity Surveying",None ,1975,9.99
"032","Davi d Coperfield","Ed mund Wells",1955,3.9 5
-=-=-=-=-=-

Now, since you seem to believe that a "customer number" is a string
data type (implied by the presence of quotes around all but the "bad
data" record), I've so defined it in the table definition... Ah, but
what the heck, let's create a table with it defined as an integer too...

SQLiteTest.py
-=-=-=-=-=-=-=-
from pysqlite2 import dbapi2 as sql
import csv

TABLE_DEF_1 = """
create table invoice_1
(
ID integer primary key,
CustNo char,
Title char,
Author char,
Year integer,
Price float
) """

TABLE_DEF_2 = """
create table invoice_2
(
ID integer primary key,
CustNo integer,
Title char,
Author char,
Year integer,
Price float
) """

db = sql.connect("te st.db")
cr = db.cursor()
try:
rs = cr.execute(TABL E_DEF_1)
db.commit()
except: #I know, I should be explicit
pass #assume table already exists

try:
rs = cr.execute(TABL E_DEF_2)
db.commit()
except: #I know, I should be explicit
pass #assume table already exists
fin = open("invoice.c sv", "rb")
indata = csv.reader(fin)

print "\nInsertin g:"
for r in indata:
print r
if len(r) != 5:
print "^^^Bad Record"
else:
rs = cr.execute("""i nsert into invoice_1
(CustNo, Title, Author, Year, Price)
values (?,?,?,?,?)""",
r)
rs = cr.execute("""i nsert into invoice_2
(CustNo, Title, Author, Year, Price)
values (?,?,?,?,?)""",
r)
db.commit()

fin.close()

print "\nSelect all from Invoice_1 (CustNo is CHARACTER)"
rs = cr.execute("sel ect * from invoice_1")
for r in cr:
print r

print "\nSelect all from Invoice_2 (CustNo is INTEGER)"
rs = cr.execute("sel ect * from invoice_2")
for r in cr:
print r

db.close()
-=-=-=-=-=-=-=-

Now, let us run the above program, using the above data file! Again,
watch out for line wrapping (my comments will be blocked off with """ )

E:\UserData\Den nis Lee Bieber\My Documents\Pytho n Progs>python
SQLiteTest.py

Inserting:
['066', '101 Ways to Start A Fight', 'some Irish gentleman', '1919',
'19.95']
['032', 'A Sale of Two Titties', 'Charles Dikkens', '1855', '20.00']
['001', 'Olsens Standard Book of British Birds (Expurgated)"', 'Olsen',
'None', '99.95']
['066', 'Ethel the Aardvark Goes Quantity Surveying', 'None', '1975',
'9.99']
['032', 'David Coperfield', 'Edmund Wells', '1955', '3.95']

"""
Well, look at that... ALL the data from the file is coming in as
character strings... customer number, year, price, title, author... It's
ALL character! The difference between quoted and unquoted numbers has
been lost.
"""
How 'bout that? Maybe I should try harder to make a
better example.
>
Select all from Invoice_1 (CustNo is CHARACTER)
(1, u'066', u'101 Ways to Start A Fight', u'some Irish gentleman', 1919,
19.949999999999 999)
(2, u'032', u'A Sale of Two Titties', u'Charles Dikkens', 1855, 20.0)
(3, u'001', u'Olsens Standard Book of British Birds (Expurgated)"',
u'Olsen', u'None', 99.950000000000 003)
(4, u'066', u'Ethel the Aardvark Goes Quantity Surveying', u'None',
1975, 9.9900000000000 002)
(5, u'032', u'David Coperfield', u'Edmund Wells', 1955,
3.9500000000000 002)

"""
No strange results there -- the year and price aren't stored as
strings, even though they were string data when inserted.
"""
Select all from Invoice_2 (CustNo is INTEGER)
(1, 66, u'101 Ways to Start A Fight', u'some Irish gentleman', 1919,
19.949999999999 999)
(2, 32, u'A Sale of Two Titties', u'Charles Dikkens', 1855, 20.0)
(3, 1, u'Olsens Standard Book of British Birds (Expurgated)"', u'Olsen',
u'None', 99.950000000000 003)
(4, 66, u'Ethel the Aardvark Goes Quantity Surveying', u'None', 1975,
9.9900000000000 002)
(5, 32, u'David Coperfield', u'Edmund Wells', 1955, 3.9500000000000 002)

"""
And look here... Again no strange results -- SQLite didn't even
treat the leading 0 as a signal that the customer number is octal. They
come out as integers without leading 0s though -- but then again, I'm
not using a formatting statement on the output...
E:\UserData\Den nis Lee Bieber\My Documents\Pytho n Progs>
"""

Sure, errors happen with static typing. After all, the values still
have to match. Dynamic typing allows for more potential errors and,
thanks to Murpy's Law, I will have a much bigger problem with data
integrity.
SQLite's "dynamic typing" does not mean "random" or "unpredicta ble"
typing.

The basic rules are fairly simple.

IF the data field is declared as a numeric type, AND the input data
can be coerced to numeric without error, it is stored and returned as a
numeric value -- one would have to pass in a data value that contained
non-numeric characters for it to become a character string.
As if that never happens.
>
IF the data field is declared as a character type, AND the input
data is a numeric, it is converted to a character representation and
stored/returned as character.

(less basic involves the application of data type converters which are
probably user supplied)
As for your example of invoices and customers, surely the
application isn't trusting the user to type in a raw "customer number"
for the invoice without first validating it by attempting to retrieve
that customer from a customer table. Or, more likely, using the customer
name to look up the number in the customer table, meaning the customer
number -- whatever it is -- /will/ match the invoice data as it was
taken directly from the database.

Any data supplied to you in file format, if humanly readable, is
going to be character strings when fed to SQLite UNLESS your code first
performs some sort of conversion on it -- and I suspect you'll detect
the bad data when doing that conversion. And, by the way, the octal
detection only happens for numeric literals IN a Python statement, not
in conversion of an external string data item to numeric.
Ok, it was a bad example.
>
>int("066")
66
>int(066)
54
>>

Of course, if you're being supplied binary data files, you are
probably using the struct module to extract the numeric data fields...
But how, I wonder, would one get a non-numeric value using a numeric
specification on a string of raw bytes?
--
Wulfraed Dennis Lee Bieber KD6MOG
wl*****@ix.netc om.com wu******@bestia ria.com
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: we******@bestia ria.com)
HTTP://www.bestiaria.com/
Sep 9 '06 #30

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.