473,902 Members | 5,024 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Drawbacks of using BYTEA for PK?

Are there any drawbacks of using BYTEA for PK compared to using a
primitive/atomic data types like INT/SERIAL? (like significant
performance hit, peculiar FK behaviour, etc).

I plan to use BYTEA for GUID (of course, temporarily I hope, until
PostgreSQL officially supports GUID data type), since it seems to be the
most convenient+comp act compared to other data types currently
available. I use GUIDs for most PK columns.

--
dave

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 12 '05
32 3937
Quoting Greg Stark <gs*****@mit.ed u>:
"scott.marl owe" <sc***********@ ihs.com> writes:
they can try to look up information on other customers by doing:

http://domain.com/application/load_r...tomer_id=12346
http://domain.com/application/load_r...tomer_id=12344

...basically walking the sequence. Sure, you will protect against this with access rights, BUT...seeing the sequence is a risk and not something you want
to happen. NOW, if you use a GUID:


Security != obscurity.

While using GUIDs may make it harder to get hacked, it in no way actually
increases security. Real security comes from secure code, period.


Well, uh, you're both wrong.

On the one hand if your GUIDs are just an MD5 of a sequence then they're
just
as guessable as the sequence. The attacker can try MD5 of various numbers
until he finds the one he is (it's probably on the web site somewhere
anyways)
and then run MD5 himself on whatever number he feels.

On the other hand it is possible to do this right. Include a secret of some
kind in the MD5 hash, something that's not publically available. That secret
is in essence the password to the scheme. Now it's not really "obscurity"
any
more than any password based scheme is "security through obscurity".

However even that isn't ideal, since you have to be able to change the
password periodically in case it's leaked. I believe there are techniques to
solve this though I can' think of any off the top of my head.

But if your only threat model is people attacking based on the publicly
visible information then an MD5 of the combination of a sequence and a
secret
is a perfectly reasonable approach.

In the past I happily exposed the sequence but used an MD5 of the sequence
and
a secret as a protection against spoofing. I find exposing the sequence is
very convenient for programming and debugging problems. Spoofing is a
serious
security hazard, but worrying about leaking information like the size of the
customer database is usually a sign of people hoping for security through
obscurity.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Its not a question of right or wrong. Its the method. One thing I see here is
a failing to use several security methods at different layers. That really is
necessary for a production environment. If you want customer id's kept private,
then you need a private connection or to not expose them. Using an MD5 hash to
"hide" them will slow your app down by some delta and not protect your
connection. Granted garbling that id with a password is somewhat more secure
but your connection could still be attacked or even hijacked.

In the URL's you gave above, why are you not using HTTPS (i.e. authentication) ?
What about using a crytographic cookies to identify your session and link that
to you userid (after authorization)?

'Just seems like you're not using the right tool (method) for the job here.

$-0.02

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

_______________ _______________ ______
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #11
David Garamond wrote:
Perhaps I can make a GUID by MD5( two random numbers || a timestamp || a
unique seed like MD5 of '/sbin/ifconfig' output)...


As long as you don't use RFC1918 addresses, the IPv4 address(es) of the
host should be unique for the Internet. Append/prepend a 32 bit
timestamp and you have a 64bit unique identifier that is "universall y"
unique (to one second).
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #12
Greg Stark wrote:
... worrying about leaking information like the size of the
customer database is usually a sign of people hoping for security through
obscurity.


To prevent the size of your database being guessed at from the serial
numbers of your customers' accounts, don't issue the numbers sequentially.

One simplistic method of non-sequential assignment is: generate a random
number between "00...00" and "99...99"*, check if it's already in use -
if not, issue it, if so, regenerate. When presenting the number, always
format it as an N-digit number with leading zeroes - for Perl
programmers, this would be achieved along the lines of printf("%010d",
$account_number )

Thus you will end up with customer numbers evenly spread over the number
space. This will prevent people inferring the size of your database (or
company) through the account numbers they observe.

To protect the customer's account from being accessed by unauthorised
persons, use form-based password access (not HTTP basic**) and/or X.509
certificates over a secure connection.

As Scotty says, "use the right tool for the right job!"

HTH
Alex Satrapa

*make the number space much larger than your expected number of
accounts. This reduces collisions in random number generation. Another
option is to increment through the number space in the event of a
collision, rather than generating another random number.

**using form-based access, the user can log out when leaving the
terminal. Using HTTP basic, the browser is likely to remember their
login for the entire session, and sometimes even between sessions.
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #13
On Tue, Jan 13, 2004 at 10:15:47AM +1100, Alex Satrapa wrote:
**using form-based access, the user can log out when leaving the
terminal. Using HTTP basic, the browser is likely to remember their
login for the entire session, and sometimes even between sessions.


You can persuade the browser to forget the password just by sending it
a 401. Unfortunately, the user then has to know to hit 'cancel' on the
resulting dialog box.

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #14
they can try to look up information on other customers by doing:

http://domain.com/application/load_r...tomer_id=12346
http://domain.com/application/load_r...tomer_id=12344

...basicall y walking the sequence. Sure, you will protect against this
to happen. NOW, if you use a GUID:
Security != obscurity.

While using GUIDs may make it harder to get hacked, it in no way actually
increases security. Real security comes from secure code, period.

Well, uh, you're both wrong.
On the one hand if your GUIDs are just an MD5 of a sequence then they're
just as guessable as the sequence.

Its not a question of right or wrong. Its the method. One thing I see here is
a failing to use several security methods at different layers....why are you not using HTTPS (i.e. authentication) ?
What about using a crytographic cookies to identify your session and link that
to you userid (after authorization)?

Ok, my point is not one of security as much as the obscurity. I have the
security aspect already covered whereby I only select the customer
record from
the database where the logged in account has access to the record. So, if
you are not the admin or the actual customer, the select will return a code
indicating that you do not have permission to view the given record.

Maybe a better example of my problem is with records throughout the system
like invoices, customer data, etc... If any of these items use a sequence
and that sequence is global to the table in the database and the number is
exposed externally, then it is possible to infer the success of the company
underneath, is it not?

For instance, if I generate sequential numbers for invoice ids and the
customer
sees #123 as an invoice number one month and sees #128 the next month,
it might
imply that there are only 4 customers getting invoiced each month.

Another example ... let's say customers can create 'Widgets' in their
account.
There might be a page that lists all their 'widgets'. If you click on the
widget, you can edit it. A link to do this might look as follows:

http://.../account/widget_list.html
http://.../account/widget_edit.html?widget_id=12345

Well, if the widget_id is a sequence (global to the widget table), then
by creating
one widget, customer would get widget id (WIDG_1) and another widget
(WIDG_2),
the customer could see that the widget_id increased by only an amount of

N = WIDG_2 - WIDG_1

and would therefore provide the assumption that the number of customers
creating
widgets in total does not exceed N. I don't see this as much of a
problem about
'security' in the respect of who can access the data as much as who can make
conclusions about the company beind the data.

See what I mean? What do you propose as the best solution for this?
Not expose
the sequences to the user and use user-enumerated ids? Then a trigger
on the
table would assign ids like:

SELECT (MAX(widget_id) +1) INTO NEW.widget_id
WHERE cust_id = NEW.cust_id;

But I think after several hundred customer records, this trigger would start
getting slow. I don't know really, any ideas?

Dante





---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #15

"D. Dante Lorenso" <da***@lorenso. com> writes:
Maybe a better example of my problem is with records throughout the system
like invoices, customer data, etc... If any of these items use a sequence
and that sequence is global to the table in the database and the number is
exposed externally, then it is possible to infer the success of the company
underneath, is it not?


Except that's exactly the way business has always been done. Though people
usually start new accounts with check# 50000 or something like that for
precisely that reason. But it's still pretty transparent, and they don't
really worry about it too much.

What you're saying is fundamentally valid, but I tend to think these kinds of
concerns are just generically overblown.

My only comment was that just taking an MD5 of the sequence gives you no
security. At the very least you have to include a secret. Even then I suspect
there are further subtle cryptographic issues. There always are.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #16

----- Original Message -----
From: "Alex Satrapa" <al**@lintelsys .com.au>
As long as you don't use RFC1918 addresses, the IPv4 address(es) of the
host should be unique for the Internet. Append/prepend a 32 bit
timestamp and you have a 64bit unique identifier that is "universall y"
unique (to one second).


Aarrgh... So if you have 2 inserts in the same second, you have key
collision? Why not append a sequence to that so you have: Unique address
|| timestamp || sequence value. In a case such as this I can see why you
might want to use md5() to hash that value.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 22 '05 #17
From: "Keith C. Perry" <ne******@vcsn. com>
Using an MD5 hash to
"hide" them will slow your app down by some delta and not protect your
connection. Granted garbling that id with a password is somewhat more secure but your connection could still be attacked or even hijacked.

In the URL's you gave above, why are you not using HTTPS (i.e. authentication) ? What about using a crytographic cookies to identify your session and link that to you userid (after authorization)?


Https I can see. I am having difficulty understanding how you could use
cryptographic cookies to prevent session hijacking though given the current
setup. Also you could use ssl between the web server and PostgreSQL to
secure that connection.

As a side question: Does PostgreSQL support using Kerberos for encrypted
connections (beyond authentication) , or do you need to use SSL for that?

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 22 '05 #18
Answers inline.

----- Original Message -----
From: "Greg Stark" <gs*****@mit.ed u>
On the one hand if your GUIDs are just an MD5 of a sequence then they're just as guessable as the sequence. The attacker can try MD5 of various numbers
until he finds the one he is (it's probably on the web site somewhere anyways) and then run MD5 himself on whatever number he feels.

On the other hand it is possible to do this right. Include a secret of some kind in the MD5 hash, something that's not publically available. That secret is in essence the password to the scheme. Now it's not really "obscurity" any more than any password based scheme is "security through obscurity".


You still have the following problem: the PK is not really used for very
much in this case except referencing data. This is done internally
(invoices, etc), so the application is presumed to know the ID when looking
up a customer. Nothing you do will prevent any attack based on searching
the database, i.e. select customer_id from customers; if such an attack is
possible in an application. I actually think that developers should enforce
security as far back (towards the database) as possible, so if this needs to
be prevented, using a view which only provides access to the customers
required is the preferred solution. You could also use triggers.

If, however, you want a global unique id which will never collide with any
other records (f. ex. for distributed server solutions), then you have
another problem-- MD5 is NOT guaranteed to be unique. Think about it-- if
the return digest is of a set length, then there must be many different
values which will create that same digest. Instead MD5 is designed to
prevent deliberate duplication, which is not what we are talking about here
(accidental duplication) and so you may want to be cautious about hashing
your keys. In this case, a more open, transparent key would be better. For
example:

machine identifier || sequence.

You *could* hash these, but it is unnecessary and may actually create
collisions if the machine identifier is sufficiently large. However,
mac_address || ipv4 address should be sufficient, I would think. It would
still be attackable in your view, so you could add a timestamp :-) but
again, I see limited utility of guids as a security feature.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 22 '05 #19
Sounds to me you have concerns more along the lines of counterintellig ence.
Maybe a better example of my problem is with records throughout the system
like invoices, customer data, etc... If any of these items use a sequence
and that sequence is global to the table in the database and the number is
exposed externally, then it is possible to infer the success of the company underneath, is it not?
IMO, the solution here is to start your sequences at an arbitrary value
(preferably not round) such as 1543691. Therefore the first customer
doesn't know that you don't have 1.5M other customers :-) This could be
calculated for each sequence with a formula such as
SELECT (random() * 1000000 + 1000000)::bigin t;

For instance, if I generate sequential numbers for invoice ids and the
customer
sees #123 as an invoice number one month and sees #128 the next month,
it might
imply that there are only 4 customers getting invoiced each month.
Another solution I have seen is to use a formula for your invoices based on:
Letter key for invoice type followed by YYYYMMDD followed by a numeric
sequence. This also helps to obscure things since the customer may not know
how often you reset the sequence (could be every month, or every day). The
letter key can uniquely identify your server on your network thereby
creating a GUID. In other words your sequence need only be unique to a
given time frame. You could even add a timestamp and a sequence that wraps
around after 9 :-) That way as long as you don't create 10 invoices in the
same second you are OK.
http://.../account/widget_list.html
http://.../account/widget_edit.html?widget_id=12345
Provided that each customer is only creating one widget at a time, you could
then take the customer_id and append to it a value of a customer-specific
sequence. You could even have this as a compound primary key. That way,
each customer can only determine how many widgets they have created :-)
See what I mean? What do you propose as the best solution for this?


Create GUIDS which contain only the information you want. No need to hash.
See above for examples.

Best Wishes,
Chris Travers
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 22 '05 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
22183
by: Alvar Freude | last post by:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I want to change a column from text to bytea; since it seems that alter table can't change the column type, i have to add a temporary column and copy the data from the old one to the new, delete the old and rename the new.
4
3687
by: David Garamond | last post by:
May I request that connectby() supports BYTEA keys too? My keys are GUID (16-byte stored in BYTEA). In this case, branch_delim does not make sense because the keys should be fixed-length anyway, unless if connectby() also wants to support outputing the branch as encoded text. Btw, is recursive join (CONNECT BY ...) in SQL standard? (I have a copy of the 1992 draft and it doesn't seem to be there). -- dave
1
6301
by: Matthew Hixson | last post by:
I am currently working on a Java web application in which we are making use of the JDBC driver for Postgres 7.4.1. Part of our application allows the administrators to manage a large number of small images, most of them not exceeding 5KB. There is about a gigabyte of these small files. We're currently storing the files on disk and the other information about the file in the database (historical reasons that I won't complain about here)....
7
2796
by: C G | last post by:
Dear All, What's the best way to store jpgs in postgresql to use in a web page? I tried to use large objects, but how would you extract them from a table to be viewed in a web-page without having to write them to a scratch file somewhere first? Thanks
2
2685
by: Carlos | last post by:
Do I need to use the -b option in pg_dump to dump bytea fields? For a while now I have been routinely using the -b option in pg_dump to back up, restore, and copy my databases because I thought that this was necessary to dump bytea fields. I am not using other types of large objects in my database. Recently, I tried to dump a database with this option and the dump failed; however I can dump the database without the -b option and my...
7
4462
by: Dennis Gearon | last post by:
when bytea, text, and varchar(no limit entered) columns are used, do they ALWAYS use an extra table/file? Or do they only do it after a certain size of input? Also, if I wanted to put a *.pdf file in a bytea column, what functions do I use to escape any characters in it? ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
4
11449
by: Jerry LeVan | last post by:
Hi, I am adding image and large object support in my Cocoa postgresql browser. Are there going to be any enhanced bytea support functions coming along? It seems sorta silly to have to write customized C code to import a file into a bytea field.
0
4068
by: Oliver Nolden | last post by:
Hi everyone, I have a table with a bytea-column: CREATE TABLE picture( id int primary key, preview bytea NOT NULL); How can I insert a value in the bytea-column 'preview'? The function 'lo_import()' does only work
4
4513
by: Együd Csaba | last post by:
Hi, the restoration of a dump stops at the line above. The dump was created with pgsql 7.3.2 and I need to pump it into a 7.4.3 one. Should anybody tell me what the problem can be and how I can solve it. (There are double apostophes many times in the string - is it normal??? Besides of the field separator of course...) Many thanks, Csaba Együd
0
9997
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9845
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11279
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10981
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10499
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9673
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8047
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7205
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5893
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.