Drawbacks of using BYTEA for PK?

David Garamond

Are there any drawbacks of using BYTEA for PK compared to using a
primitive/atomic data types like INT/SERIAL? (like significant
performance hit, peculiar FK behaviour, etc).

I plan to use BYTEA for GUID (of course, temporarily I hope, until
PostgreSQL officially supports GUID data type), since it seems to be the
most convenient+compact compared to other data types currently
available. I use GUIDs for most PK columns.

--
dave

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #1

Subscribe Reply

3813

D. Dante Lorenso

David Garamond wrote:

Are there any drawbacks of using BYTEA for PK compared to using a
primitive/atomic data types like INT/SERIAL? (like significant
performance hit, peculiar FK behaviour, etc).

I plan to use BYTEA for GUID (of course, temporarily I hope, until
PostgreSQL officially supports GUID data type), since it seems to be
the most convenient+compact compared to other data types currently
available. I use GUIDs for most PK columns.

GUID? Isn't that really nothing more than an MD5 on a sequence?

SELECT (MD5(NEXTVAL('my_table_seq'))) AS my_guid;

Since 7.4 has the md5 function built-in, there's your support ;-)
Now just add that to your table's trigger and your good to go.
I think in MS products, they format the guid with dashes in the
style 8-4-4-4-12 but it still looks to me like a 32 character hex
string or a 16 byte (128 bit) value. You can choose to store the
value however you like, I'm not sure what would be optimal, but
bits are bits, right?

Dante

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #2

Richard Huxton

On Sunday 11 January 2004 22:05, D. Dante Lorenso wrote:

David Garamond wrote:
Are there any drawbacks of using BYTEA for PK compared to using a
primitive/atomic data types like INT/SERIAL? (like significant
performance hit, peculiar FK behaviour, etc).

I plan to use BYTEA for GUID (of course, temporarily I hope, until
PostgreSQL officially supports GUID data type), since it seems to be
the most convenient+compact compared to other data types currently
available. I use GUIDs for most PK columns.

GUID? Isn't that really nothing more than an MD5 on a sequence?

SELECT (MD5(NEXTVAL('my_table_seq'))) AS my_guid;

I think the point of a GUID is it's supposed to be unique across any number of
machines without requiring those machines to coordinate their use of GUID
values.

I think the typical approach is to use something like:
hash_fn( network_mac_address || other_hopefully_unique_constant ||
sequence_val )
and make sure that the probability of getting collisions is acceptably low.

ISTR a long discussion a year or two back on one of the lists, for those that
are interested.
--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #3

David Garamond

D. Dante Lorenso wrote:

GUID? Isn't that really nothing more than an MD5 on a sequence?

SELECT (MD5(NEXTVAL('my_table_seq'))) AS my_guid;
I know there are several algorithms to generate GUID, but this is
certainly inadequate :-) You need to make sure that the generated GUID
will be unique throughout cyberspace (or to be more precise, the GUID
should have a very very small chance of colliding with other people's
GUID). Even OID is not a good seed at all.

Perhaps I can make a GUID by MD5( two random numbers || a timestamp || a
unique seed like MD5 of '/sbin/ifconfig' output)...
Since 7.4 has the md5 function built-in, there's your support ;-)

Well, until there's a GUID or INT128 or BIGBIGINT builtin type I doubt
many people will regard PostgreSQL as fully supporting GUID. I believe
there's the pguuid project in GBorg site that does something like this.

--
dave
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #4

Tom Lane

David Garamond <li***@zara.6.isreserved.com> writes:

Perhaps I can make a GUID by MD5( two random numbers || a timestamp || a
unique seed like MD5 of '/sbin/ifconfig' output)...

Adding an MD5 hash contributes *absolutely zero*, except waste of space,
to any attempt to make a GUID. The hash will add no uniqueness that was
not there before.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #5

David Garamond

Tom Lane wrote:

David Garamond <li***@zara.6.isreserved.com> writes:
Perhaps I can make a GUID by MD5( two random numbers || a timestamp || a
unique seed like MD5 of '/sbin/ifconfig' output)...

Adding an MD5 hash contributes *absolutely zero*, except waste of space,
to any attempt to make a GUID. The hash will add no uniqueness that was
not there before.

Of course, in the above case, MD5 is used to compress the "uniqueness"
(which should be more than 128-bit, comprised of: a) [good] random
number; b) timestamp; c) a "node ID" element, either from /sbin/config
output which contain MAC address, or from the hash of harddisk content,
etc) into a 128-bit space.

--
dave
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #6

D. Dante Lorenso

Tom Lane wrote:
Adding an MD5 hash contributes *absolutely zero*, except waste of space,
to any attempt to make a GUID. The hash will add no uniqueness that was
not there before.

The cool thing about a 'GUID' (or in my example a hashed sequence number
[sure
toss in some entropy if you want it]) is that if you happen to reference
that
value as a primary key on a table, the URL that passes the argument can not
be guessed at easily. For example using a sequence:

http://domain.com/application/load_r...tomer_id=12345

Then, users of the web will assume that you have at most 12345
customers. And
they can try to look up information on other customers by doing:

http://domain.com/application/load_r...tomer_id=12346
http://domain.com/application/load_r...tomer_id=12344

....basically walking the sequence. Sure, you will protect against this with
access rights, BUT...seeing the sequence is a risk and not something you
want
to happen. NOW, if you use a GUID:
http://domain.com/application/load_r...3-1b8ce9dcccc1

Right, so now try to guess the next value in this sequence. It's a little
more protective and obfuscated (an advantage in using GUIDs).

Dante

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #7

scott.marlowe

On Mon, 12 Jan 2004, D. Dante Lorenso wrote:

Tom Lane wrote:
Adding an MD5 hash contributes *absolutely zero*, except waste of space,
to any attempt to make a GUID. The hash will add no uniqueness that was
not there before.

The cool thing about a 'GUID' (or in my example a hashed sequence number
[sure
toss in some entropy if you want it]) is that if you happen to reference
that
value as a primary key on a table, the URL that passes the argument can not
be guessed at easily. For example using a sequence:

http://domain.com/application/load_r...tomer_id=12345

Then, users of the web will assume that you have at most 12345
customers. And
they can try to look up information on other customers by doing:

http://domain.com/application/load_r...tomer_id=12346
http://domain.com/application/load_r...tomer_id=12344

...basically walking the sequence. Sure, you will protect against this with
access rights, BUT...seeing the sequence is a risk and not something you
want
to happen. NOW, if you use a GUID:

Security != obscurity.

While using GUIDs may make it harder to get hacked, it in no way actually
increases security. Real security comes from secure code, period.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #8

Greg Stark

"scott.marlowe" <sc***********@ihs.com> writes:

they can try to look up information on other customers by doing:

http://domain.com/application/load_r...tomer_id=12346
http://domain.com/application/load_r...tomer_id=12344

...basically walking the sequence. Sure, you will protect against this with
access rights, BUT...seeing the sequence is a risk and not something you
want
to happen. NOW, if you use a GUID:

Security != obscurity.

While using GUIDs may make it harder to get hacked, it in no way actually
increases security. Real security comes from secure code, period.

Well, uh, you're both wrong.

On the one hand if your GUIDs are just an MD5 of a sequence then they're just
as guessable as the sequence. The attacker can try MD5 of various numbers
until he finds the one he is (it's probably on the web site somewhere anyways)
and then run MD5 himself on whatever number he feels.

On the other hand it is possible to do this right. Include a secret of some
kind in the MD5 hash, something that's not publically available. That secret
is in essence the password to the scheme. Now it's not really "obscurity" any
more than any password based scheme is "security through obscurity".

However even that isn't ideal, since you have to be able to change the
password periodically in case it's leaked. I believe there are techniques to
solve this though I can' think of any off the top of my head.

But if your only threat model is people attacking based on the publicly
visible information then an MD5 of the combination of a sequence and a secret
is a perfectly reasonable approach.

In the past I happily exposed the sequence but used an MD5 of the sequence and
a secret as a protection against spoofing. I find exposing the sequence is
very convenient for programming and debugging problems. Spoofing is a serious
security hazard, but worrying about leaking information like the size of the
customer database is usually a sign of people hoping for security through
obscurity.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #9

David Garamond

Greg Stark wrote:

On the other hand it is possible to do this right. Include a secret of some
kind in the MD5 hash, something that's not publically available. That secret
is in essence the password to the scheme. Now it's not really "obscurity" any
more than any password based scheme is "security through obscurity".

However even that isn't ideal, since you have to be able to change the
password periodically in case it's leaked. I believe there are techniques to
solve this though I can' think of any off the top of my head.

But if your only threat model is people attacking based on the publicly
visible information then an MD5 of the combination of a sequence and a secret
is a perfectly reasonable approach.

We're originally talking about using MD5 as a means to generate unique
ID right (and not to store password hash to be checked against later)?

Then this "secret key" is unnecessary. Just get some truly random bits
(if the number of bits is 128, then you can use it as it is. If the
number of bits is > 128, you can hash it using MD5 to get 128 bit. If
the number of bits is < 128, you're "screwed" anyway :-)

--
dave
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #10

Keith C. Perry

Quoting Greg Stark <gs*****@mit.edu>:

"scott.marlowe" <sc***********@ihs.com> writes:
they can try to look up information on other customers by doing:

http://domain.com/application/load_r...tomer_id=12346
http://domain.com/application/load_r...tomer_id=12344

...basically walking the sequence. Sure, you will protect against this with access rights, BUT...seeing the sequence is a risk and not something you want
to happen. NOW, if you use a GUID:

Security != obscurity.

While using GUIDs may make it harder to get hacked, it in no way actually
increases security. Real security comes from secure code, period.

Well, uh, you're both wrong.

On the one hand if your GUIDs are just an MD5 of a sequence then they're
just
as guessable as the sequence. The attacker can try MD5 of various numbers
until he finds the one he is (it's probably on the web site somewhere
anyways)
and then run MD5 himself on whatever number he feels.

On the other hand it is possible to do this right. Include a secret of some
kind in the MD5 hash, something that's not publically available. That secret
is in essence the password to the scheme. Now it's not really "obscurity"
any
more than any password based scheme is "security through obscurity".

However even that isn't ideal, since you have to be able to change the
password periodically in case it's leaked. I believe there are techniques to
solve this though I can' think of any off the top of my head.

But if your only threat model is people attacking based on the publicly
visible information then an MD5 of the combination of a sequence and a
secret
is a perfectly reasonable approach.

In the past I happily exposed the sequence but used an MD5 of the sequence
and
a secret as a protection against spoofing. I find exposing the sequence is
very convenient for programming and debugging problems. Spoofing is a
serious
security hazard, but worrying about leaking information like the size of the
customer database is usually a sign of people hoping for security through
obscurity.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Its not a question of right or wrong. Its the method. One thing I see here is
a failing to use several security methods at different layers. That really is
necessary for a production environment. If you want customer id's kept private,
then you need a private connection or to not expose them. Using an MD5 hash to
"hide" them will slow your app down by some delta and not protect your
connection. Granted garbling that id with a password is somewhat more secure
but your connection could still be attacked or even hijacked.

In the URL's you gave above, why are you not using HTTPS (i.e. authentication)?
What about using a crytographic cookies to identify your session and link that
to you userid (after authorization)?

'Just seems like you're not using the right tool (method) for the job here.

$-0.02

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #11

Alex Satrapa

David Garamond wrote:

Perhaps I can make a GUID by MD5( two random numbers || a timestamp || a
unique seed like MD5 of '/sbin/ifconfig' output)...

As long as you don't use RFC1918 addresses, the IPv4 address(es) of the
host should be unique for the Internet. Append/prepend a 32 bit
timestamp and you have a 64bit unique identifier that is "universally"
unique (to one second).
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #12

Alex Satrapa

Greg Stark wrote:

... worrying about leaking information like the size of the
customer database is usually a sign of people hoping for security through
obscurity.

To prevent the size of your database being guessed at from the serial
numbers of your customers' accounts, don't issue the numbers sequentially.

One simplistic method of non-sequential assignment is: generate a random
number between "00...00" and "99...99"*, check if it's already in use -
if not, issue it, if so, regenerate. When presenting the number, always
format it as an N-digit number with leading zeroes - for Perl
programmers, this would be achieved along the lines of printf("%010d",
$account_number)

Thus you will end up with customer numbers evenly spread over the number
space. This will prevent people inferring the size of your database (or
company) through the account numbers they observe.

To protect the customer's account from being accessed by unauthorised
persons, use form-based password access (not HTTP basic**) and/or X.509
certificates over a secure connection.

As Scotty says, "use the right tool for the right job!"

HTH
Alex Satrapa

*make the number space much larger than your expected number of
accounts. This reduces collisions in random number generation. Another
option is to increment through the number space in the event of a
collision, rather than generating another random number.

**using form-based access, the user can log out when leaving the
terminal. Using HTTP basic, the browser is likely to remember their
login for the entire session, and sometimes even between sessions.
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #13

Kragen Sitaker

On Tue, Jan 13, 2004 at 10:15:47AM +1100, Alex Satrapa wrote:

**using form-based access, the user can log out when leaving the
terminal. Using HTTP basic, the browser is likely to remember their
login for the entire session, and sometimes even between sessions.

You can persuade the browser to forget the password just by sending it
a 401. Unfortunately, the user then has to know to hit 'cancel' on the
resulting dialog box.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #14

D. Dante Lorenso

they can try to look up information on other customers by doing:

http://domain.com/application/load_r...tomer_id=12346
http://domain.com/application/load_r...tomer_id=12344

...basically walking the sequence. Sure, you will protect against this
to happen. NOW, if you use a GUID:
Security != obscurity.

While using GUIDs may make it harder to get hacked, it in no way actually
increases security. Real security comes from secure code, period.

Well, uh, you're both wrong.
On the one hand if your GUIDs are just an MD5 of a sequence then they're
just as guessable as the sequence.

Its not a question of right or wrong. Its the method. One thing I see here is
a failing to use several security methods at different layers....why are you not using HTTPS (i.e. authentication)?
What about using a crytographic cookies to identify your session and link that
to you userid (after authorization)?

Ok, my point is not one of security as much as the obscurity. I have the
security aspect already covered whereby I only select the customer
record from
the database where the logged in account has access to the record. So, if
you are not the admin or the actual customer, the select will return a code
indicating that you do not have permission to view the given record.

Maybe a better example of my problem is with records throughout the system
like invoices, customer data, etc... If any of these items use a sequence
and that sequence is global to the table in the database and the number is
exposed externally, then it is possible to infer the success of the company
underneath, is it not?

For instance, if I generate sequential numbers for invoice ids and the
customer
sees #123 as an invoice number one month and sees #128 the next month,
it might
imply that there are only 4 customers getting invoiced each month.

Another example ... let's say customers can create 'Widgets' in their
account.
There might be a page that lists all their 'widgets'. If you click on the
widget, you can edit it. A link to do this might look as follows:

http://.../account/widget_list.html
http://.../account/widget_edit.html?widget_id=12345

Well, if the widget_id is a sequence (global to the widget table), then
by creating
one widget, customer would get widget id (WIDG_1) and another widget
(WIDG_2),
the customer could see that the widget_id increased by only an amount of

N = WIDG_2 - WIDG_1

and would therefore provide the assumption that the number of customers
creating
widgets in total does not exceed N. I don't see this as much of a
problem about
'security' in the respect of who can access the data as much as who can make
conclusions about the company beind the data.

See what I mean? What do you propose as the best solution for this?
Not expose
the sequences to the user and use user-enumerated ids? Then a trigger
on the
table would assign ids like:

SELECT (MAX(widget_id)+1) INTO NEW.widget_id
WHERE cust_id = NEW.cust_id;

But I think after several hundred customer records, this trigger would start
getting slow. I don't know really, any ideas?

Dante

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #15

Greg Stark

"D. Dante Lorenso" <da***@lorenso.com> writes:

Maybe a better example of my problem is with records throughout the system
like invoices, customer data, etc... If any of these items use a sequence
and that sequence is global to the table in the database and the number is
exposed externally, then it is possible to infer the success of the company
underneath, is it not?

Except that's exactly the way business has always been done. Though people
usually start new accounts with check# 50000 or something like that for
precisely that reason. But it's still pretty transparent, and they don't
really worry about it too much.

What you're saying is fundamentally valid, but I tend to think these kinds of
concerns are just generically overblown.

My only comment was that just taking an MD5 of the sequence gives you no
security. At the very least you have to include a secret. Even then I suspect
there are further subtle cryptographic issues. There always are.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #16

Chris Travers

----- Original Message -----
From: "Alex Satrapa" <al**@lintelsys.com.au>

As long as you don't use RFC1918 addresses, the IPv4 address(es) of the
host should be unique for the Internet. Append/prepend a 32 bit
timestamp and you have a 64bit unique identifier that is "universally"
unique (to one second).

Aarrgh... So if you have 2 inserts in the same second, you have key
collision? Why not append a sequence to that so you have: Unique address
|| timestamp || sequence value. In a case such as this I can see why you
might want to use md5() to hash that value.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 22 '05 #17

Chris Travers

From: "Keith C. Perry" <ne******@vcsn.com>

Using an MD5 hash to
"hide" them will slow your app down by some delta and not protect your
connection. Granted garbling that id with a password is somewhat more secure but your connection could still be attacked or even hijacked.

In the URL's you gave above, why are you not using HTTPS (i.e. authentication)? What about using a crytographic cookies to identify your session and link that to you userid (after authorization)?

Https I can see. I am having difficulty understanding how you could use
cryptographic cookies to prevent session hijacking though given the current
setup. Also you could use ssl between the web server and PostgreSQL to
secure that connection.

As a side question: Does PostgreSQL support using Kerberos for encrypted
connections (beyond authentication), or do you need to use SSL for that?

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 22 '05 #18

Chris Travers

Answers inline.

----- Original Message -----
From: "Greg Stark" <gs*****@mit.edu>

On the one hand if your GUIDs are just an MD5 of a sequence then they're just as guessable as the sequence. The attacker can try MD5 of various numbers
until he finds the one he is (it's probably on the web site somewhere anyways) and then run MD5 himself on whatever number he feels.

On the other hand it is possible to do this right. Include a secret of some kind in the MD5 hash, something that's not publically available. That secret is in essence the password to the scheme. Now it's not really "obscurity" any more than any password based scheme is "security through obscurity".

You still have the following problem: the PK is not really used for very
much in this case except referencing data. This is done internally
(invoices, etc), so the application is presumed to know the ID when looking
up a customer. Nothing you do will prevent any attack based on searching
the database, i.e. select customer_id from customers; if such an attack is
possible in an application. I actually think that developers should enforce
security as far back (towards the database) as possible, so if this needs to
be prevented, using a view which only provides access to the customers
required is the preferred solution. You could also use triggers.

If, however, you want a global unique id which will never collide with any
other records (f. ex. for distributed server solutions), then you have
another problem-- MD5 is NOT guaranteed to be unique. Think about it-- if
the return digest is of a set length, then there must be many different
values which will create that same digest. Instead MD5 is designed to
prevent deliberate duplication, which is not what we are talking about here
(accidental duplication) and so you may want to be cautious about hashing
your keys. In this case, a more open, transparent key would be better. For
example:

machine identifier || sequence.

You *could* hash these, but it is unnecessary and may actually create
collisions if the machine identifier is sufficiently large. However,
mac_address || ipv4 address should be sufficient, I would think. It would
still be attackable in your view, so you could add a timestamp :-) but
again, I see limited utility of guids as a security feature.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 22 '05 #19

Chris Travers

Sounds to me you have concerns more along the lines of counterintelligence.

Maybe a better example of my problem is with records throughout the system
like invoices, customer data, etc... If any of these items use a sequence
and that sequence is global to the table in the database and the number is
exposed externally, then it is possible to infer the success of the company underneath, is it not?
IMO, the solution here is to start your sequences at an arbitrary value
(preferably not round) such as 1543691. Therefore the first customer
doesn't know that you don't have 1.5M other customers :-) This could be
calculated for each sequence with a formula such as
SELECT (random() * 1000000 + 1000000)::bigint;

For instance, if I generate sequential numbers for invoice ids and the
customer
sees #123 as an invoice number one month and sees #128 the next month,
it might
imply that there are only 4 customers getting invoiced each month.
Another solution I have seen is to use a formula for your invoices based on:
Letter key for invoice type followed by YYYYMMDD followed by a numeric
sequence. This also helps to obscure things since the customer may not know
how often you reset the sequence (could be every month, or every day). The
letter key can uniquely identify your server on your network thereby
creating a GUID. In other words your sequence need only be unique to a
given time frame. You could even add a timestamp and a sequence that wraps
around after 9 :-) That way as long as you don't create 10 invoices in the
same second you are OK.
http://.../account/widget_list.html
http://.../account/widget_edit.html?widget_id=12345
Provided that each customer is only creating one widget at a time, you could
then take the customer_id and append to it a value of a customer-specific
sequence. You could even have this as a compound primary key. That way,
each customer can only determine how many widgets they have created :-)
See what I mean? What do you propose as the best solution for this?

Create GUIDS which contain only the information you want. No need to hash.
See above for examples.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 22 '05 #20

John Sidney-Woollett

Careful...

If two (or more) clients (in the same network) are going through a
firewall that performs NAT, then they could appear to have the same IP
address if the NAT address pool is small (single address).

Appending a sequence would help resolve that issue though.

John Sidney-Woollett

Chris Travers said:

----- Original Message -----
From: "Alex Satrapa" <al**@lintelsys.com.au>
As long as you don't use RFC1918 addresses, the IPv4 address(es) of the
host should be unique for the Internet. Append/prepend a 32 bit
timestamp and you have a 64bit unique identifier that is "universally"
unique (to one second).

Aarrgh... So if you have 2 inserts in the same second, you have key
collision? Why not append a sequence to that so you have: Unique address
|| timestamp || sequence value. In a case such as this I can see why you
might want to use md5() to hash that value.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 22 '05 #21

David Garamond

Alex Satrapa wrote:

As long as you don't use RFC1918 addresses, the IPv4 address(es) of the
host should be unique for the Internet. Append/prepend a 32 bit
timestamp and you have a 64bit unique identifier that is "universally"
unique (to one second).

Remember that /sbin/ifconfig output usually include MAC address too. Not
that MAC addresses are 100% unique, but that should increase the uniqueness.

--
dave

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 22 '05 #22

Chris Travers

From: "John Sidney-Woollett" <jo****@wardbrook.com>

Careful... If two (or more) clients (in the same network) are going through a
firewall that performs NAT, then they could appear to have the same IP
address if the NAT address pool is small (single address).

Sorry, I should have been more specific about what I meant by "Unique
address." The unique address is some identifier which can be reasonably
considered to be unique to the system md5(serial number of the first CPU ||
make/model of the CPU) or md5(mac address || ipv4 address) or ipv6 address.
Concatinating this with a timestamp and a sequence should provide very
little chance of a collision.

Here is a more interesting application though-- suppose you have a
distributed environment and want a way of transparently authenticating where
a record is supposed to be. In this scenario, you have a situation where
you need some sort of GUID for the database server, which can be specified
when an addition is made or where the referring record exists. If the
record exists in a different location the other child records could be
deflected there too (perhaps via dblink). In this way, the pseudo-guid can
contain the customer number along with the location guid. Any suggestions
on handling this in a manageable and opaque fashion (i.e. not giving
customers case numbers with the mac address fo the server appended ;-))

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 22 '05 #23

Bruno Wolff III

On Mon, Jan 12, 2004 at 20:24:10 -0600,
"D. Dante Lorenso" <da***@lorenso.com> wrote:

See what I mean? What do you propose as the best solution for this?
Not expose
the sequences to the user and use user-enumerated ids? Then a trigger
on the
table would assign ids like:

SELECT (MAX(widget_id)+1) INTO NEW.widget_id
WHERE cust_id = NEW.cust_id;

But I think after several hundred customer records, this trigger would start
getting slow. I don't know really, any ideas?

I think it would be better to have a per customer counter. Then the GID
would be customer, customer_sequence. You probably wouldn't want to use
postgres sequences for this. I would expect that generating new ID numbers
isn't so common that updating a counter row in a customer counter table
would be a real problem.

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #24

Keith C. Perry

Quoting Chris Travers <ch***@travelamericas.com>:

From: "Keith C. Perry" <ne******@vcsn.com>
Using an MD5 hash to
"hide" them will slow your app down by some delta and not protect your
connection. Granted garbling that id with a password is somewhat more secure
but your connection could still be attacked or even hijacked.

In the URL's you gave above, why are you not using HTTPS (i.e.

authentication)?
What about using a crytographic cookies to identify your session and link

that
to you userid (after authorization)?

Https I can see. I am having difficulty understanding how you could use
cryptographic cookies to prevent session hijacking though given the current
setup.

Cryptographic cookies are actually how TCP SYN flood protection is done on Linux
and I think Solaris so in my case the OS is handling that. What is implemented
there could be implemented at the application layer but I don't think that
becomes valid once you are using HTTPS since is provide similar facilities.

In my applications, I simply have Apache push a cookie to the browser (during
authorization) which is then used as the session key. Additionally, I almost
always use POST methods instead of GET (I hate exposing application logic that
way). Ever time a user does something, the presence of that cookie is checked
in the database.
Also you could use ssl between the web server and PostgreSQL to
secure that connection.
True but that is only half the story. You're client interface is what is
public. I would SSL the web <--> db connection as a standard but I would be
less concerned about (what I'm assumming is) a local connection behind the DMZ.
As a side question: Does PostgreSQL support using Kerberos for encrypted
connections (beyond authentication), or do you need to use SSL for that?

Best Wishes,
Chris Travers

Not sure about that one but if so, I'm sure someone will speak up :)

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #25

Greg Stark

"Keith C. Perry" <ne******@vcsn.com> writes:

Additionally, I almost always use POST methods instead of GET
(I hate exposing application logic that way).

Well this is pretty far afield from postgres. But I hope you're not using POST
for idempotent operations like search or record lookups. I hate it when I'm on
a site and i can't bookmark pages usefully or use the back button or open
pages in new windows etc.

POST is ideally only used for non idempotent operations like updates, deletes,
etc. Things that you wouldn't want to be repeated when the user hits reload,
and wouldn't want repeated every time a user revisits a bookmark.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #26

Alex Satrapa

David Garamond wrote:

Remember that /sbin/ifconfig output usually include MAC address too. Not
that MAC addresses are 100% unique, but that should increase the
uniqueness.

How do you increase uniqueness? Either a value is unique or it isn't -
if you've got multiple hosts on the network with the same network
address, you're in big trouble!

32 bits for an IP address is a huge number space... but why you'd really
need that much space as a base for your GUID is beyond me. The "host"
part of the address (eg: the last 8 bits in a /24 network block) would
be enough to uniquely identify the 254 hosts on your network. Then add a
32 bit timestamp, and you have 24 bits left for uniquely identifying
things that are created within the same second on the same server -
that's 16M things per second. Busy little shop you'd be running to
exhaust that unique space ;)

Adding extra number space doesn't increase the uniqueness of any
particular key - you have to know how little you can get away with to be
unique. Like distinguishing two humans from each other - you don't need
to unravel the DNA to 3,000 base pairs (3k bits!) if you can settle for
"blonde" versus "auburn" (1 bit!).

I can't remember who said it, but there's a nice quote that's relevant
in this situation: "The true mark of a well designed system is not that
there's nothing left to add, it's that there's nothing left to take away!"

Regards
Alex Satrapa
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 22 '05 #27

Nigel J. Andrews

On Wed, 14 Jan 2004, Alex Satrapa wrote:

David Garamond wrote:
Remember that /sbin/ifconfig output usually include MAC address too. Not
that MAC addresses are 100% unique, but that should increase the
uniqueness.
How do you increase uniqueness? Either a value is unique or it isn't -
if you've got multiple hosts on the network with the same network
address, you're in big trouble!

It's easily done; you've misread it. David said _MAC_ address. Of course your
comment still stands if you've got the same MAC address on a segment more than
once.

32 bits for an IP address is a huge number space... but why you'd really
need that much space as a base for your GUID is beyond me. The "host"
part of the address (eg: the last 8 bits in a /24 network block) would
be enough to uniquely identify the 254 hosts on your network.

I might have more that 256 hosts.

I can't comment on the real content of this discussion though since a) I
haven't be reading it and b) I probably wouldn't know what it was on about if
I had been.
--
Nigel J. Andrews
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #28

Karsten Hilbert

> >Remember that /sbin/ifconfig output usually include MAC address too. Not

that MAC addresses are 100% unique, but that should increase the
uniqueness.
How do you increase uniqueness? Either a value is unique or it isn't -

It increases the *likelihood* of uniqueness, IOW the expected
collision frequency.
if you've got multiple hosts on the network with the same network
address, you're in big trouble!

He is talking about the MAC which in itself is supposed to be
globally unique. Nothing to do with IP numbers.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #29

Karsten Hilbert

> > How do you increase uniqueness? Either a value is unique or it isn't -

It increases the *likelihood* of uniqueness, IOW the expected
collision frequency.

...., IOW ... decreases

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #30

David Garamond

Alex Satrapa wrote:

David Garamond wrote:
Remember that /sbin/ifconfig output usually include MAC address too.
Not that MAC addresses are 100% unique, but that should increase the
uniqueness.
How do you increase uniqueness? Either a value is unique or it isn't -

Ok, let's say "mathematically unique" and "practically unique". I was
referring to the second one.
if you've got multiple hosts on the network with the same network
address, you're in big trouble!

32 bits for an IP address is a huge number space... but why you'd really
need that much space as a base for your GUID is beyond me.

The point of using /sbin/ifconfig output (and then hashing it to get a
128bit number) is because the output contains MAC address and IP
address. IP addresses are not unique, particularly in intranet or
clustering situation. MAC addresses are more likely to be unique, but
I've heard stories about cheap ethernet cards having duplicate MAC's,
and beside in many cases MAC addresses can be altered by software.

The probability of both MAC and IP address (and other strings in
/sbin/ifconfig output) being totally the same for two hosts is of course
much smaller, and thus the term "increase [the practical] uniqueness."

--
dave
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #31

David Garamond

Nigel J. Andrews wrote:

I can't comment on the real content of this discussion though since a) I
haven't be reading it and b) I probably wouldn't know what it was on about if
I had been.

Um, any insight on the original question (see subject)? :-)

--
dave
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 22 '05 #32

Bruce Momjian

David Garamond wrote:

Nigel J. Andrews wrote:
I can't comment on the real content of this discussion though since a) I
haven't be reading it and b) I probably wouldn't know what it was on about if
I had been.

Um, any insight on the original question (see subject)? :-)

A bytea join will use a bytea indexed column just fine, I think.

--
Bruce Momjian | http://candle.pha.pa.us
pg***@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 22 '05 #33

Similar topics