Bytea/Large Objects/external files best practices

David Helgason

I think those best practices threads are a treat to follow (might even
consider archiving some of them in a sort of best-practices faq), so
here's one more.

In coding an game asset server I want to keep a large number of file
revisions of varying sizes (1Kb-50Mb) inside the database.

Naturally I want to avoid having to allocate whole buffers of 50Mb too
often.

I am also throwing around rsync-style deltas (with librsync), that I
run through functions to generate new objects. The librsync functions
have an elegant buffer oriented approach to reading&writing data, so I
need a way to read through my objects inside the database, again
without having them allocated in RAM too often.

I tried browsing around, but couldn't find any good discussion of this
(or just plain good advice), but of course that may be due to
google-impairment on my part.

I see several options:
1) Bytea column. Seems the cleanest solution, but
*) it includes lots of escaping-unescaping with associated allocation
of huge buffers.
*) I seem to remember reading in a discussion in the [hackers] list
that the TOAST column substring operation had been optimized so that
partial fetches might be viable.
*) Incremental inserts (using a series of "update file set data = data
|| moredata where id=666") probably won't scale nicely.
*) However it's fully ACID,
*) and recommended by the docs, if that's a good reason.
*) TOAST compression. Can't quite figure out if this happens, but in
many cases would store a lot of disk-space & -access.
*) passing them as arguments to the librsync functions may be a hassle.
open questions: I wonder what timeframe for this to be viable. It
seems that two things are being worked on that would fix this. The
wire-protocol to avoid having to escape all communications, and methods
for fast incremental update. Both seem to be at least a year away (Pg
7.5, and someone even suggested jumping to 8.0 with the wire-protocol
changes).

2) Bytea column with many rows per file.
*) Also some escaping-unescaping, but with small buffers
*) Simple and fast to do partial inserts/selects.
*) Full ACID (and with a couple of triggers, also easy to keep
completely correct)
*) Butt ugly solution, but what we used to do with ye olde versions of
MSQL.
*) Compression, but probably lesser space savings with smaller chunks.
open questions: What would the chunk-size be? I would imagine 5-20Kb
per row.

3) Large Objects.
*) Requires special interface to talk to, not that its so bad.
*) Need a trigger to garbage collect them (simple to do though)
*) Fully ACID.
*) I've actually implemented the librsync functions using this
allready, and it seems to work nicely with large files.

4) External files. I really don't like the idea of this.
*) As high performance as anything.
*) Need an alternate mechanism to access the server, with security
implications etc.
*) Complications trying to maintain more-or-less full ACIDity
open questions: What tricks to use to keep this ACID-ish? I won't have
any updating of content once it's committed, so just keeping the files
read-only most of the time would do most of the trick. Also, I imagine

I'm going with 3) for now, but even tempted by 2) when looking this
over.

At one point I was playing around with 4), and made a plperlu server to
run inside the database to ease this. The security was implemented by
the client receiving a ticket from the database when requesting to
create a file, and then having to send this ticket before being allowed
to send any data to the perl-server. Not an un-clever system, I think,
that I'd be happy to share.

I wonder what other people are doing and if anyone has other arguments.

David Helgason,
Over the Edge Entertainments
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #1

Subscribe Post Reply

4179

Karsten Hilbert

> I think those best practices threads are a treat to follow (might even

consider archiving some of them in a sort of best-practices faq), so Please do.
1) Bytea column. Seems the cleanest solution, but
*) I seem to remember reading in a discussion in the [hackers] list
that the TOAST column substring operation had been optimized so that
partial fetches might be viable.

Works. Not sure what happens on the wire, though.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 22 '05 #2

Tom Lane

David Helgason <da***@uti.is> writes:

*) I seem to remember reading in a discussion in the [hackers] list
that the TOAST column substring operation had been optimized so that
partial fetches might be viable.
I believe that works now for both text and bytea columns. However the
cost is that you have to store the data uncompressed (by disabling the
normal automatic compression, see ALTER COLUMN SET STORAGE). Pick
your poison...
open questions: I wonder what timeframe for this to be viable. It
seems that two things are being worked on that would fix this. The
wire-protocol to avoid having to escape all communications, and methods
for fast incremental update.
Binary wire protocol is done in 7.4 (at least if you are using libpq;
not sure about the state of binary protocol support in other client
libraries). Incremental update does remain an unsolved problem.
2) Bytea column with many rows per file.
open questions: What would the chunk-size be? I would imagine 5-20Kb
per row.
You'd want to keep it under 2Kb per row to avoid triggering TOASTing of
any single row. What you're doing here amounts to hand TOASTing, and
you don't want to pay the penalty of another layer of the same mechanism
underneath. (Alternatively you might be able to set the storage option
so that compression is allowed but out-of-line storage is not, even for
rows up to 8K. I forget what the options are exactly.)
4) External files. I really don't like the idea of this.
open questions: What tricks to use to keep this ACID-ish?

AFAICS, you can't guarantee much of anything if you do this.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #3

Similar topics

Coding practices of large sites

by: Jonas Smithson | last post by:

I've seen a few attractive multi-column sites whose geometry is based on pure CSS-P, but they're what you might call "code afficionado" sites, where the subject matter of the site is "coding...

HTML / CSS

Querying Very Large XML

by: Greg | last post by:

I am working on a project that will have about 500,000 records in an XML document. This document will need to be queried with XPath, and records will need to be updated. I was thinking about...

.NET Framework

Which is a better method to concatenate large blocks of text?

by: J.S. | last post by:

In a Windows Form application, which is the better method to concatenate large blocks of code? 1. Reading the text from text files. 2. Adding the text to the VB file itself? Thanks! J.S. ...

Visual Basic .NET

Bytea/Large Objects/external files best practices

by: David Helgason | last post by:

I think those best practices threads are a treat to follow (might even consider archiving some of them in a sort of best-practices faq), so here's one more. In coding an game asset server I want...

PostgreSQL Database

Storing jpgs

by: C G | last post by:

Dear All, What's the best way to store jpgs in postgresql to use in a web page? I tried to use large objects, but how would you extract them from a table to be viewed in a web-page without...

PostgreSQL Database

Option required to dump bytea fields?

by: Carlos | last post by:

Do I need to use the -b option in pg_dump to dump bytea fields? For a while now I have been routinely using the -b option in pg_dump to back up, restore, and copy my databases because I thought...

PostgreSQL Database

bytea

by: Dennis Gearon | last post by:

when bytea, text, and varchar(no limit entered) columns are used, do they ALWAYS use an extra table/file? Or do they only do it after a certain size of input? Also, if I wanted to put a *.pdf...

PostgreSQL Database

pg_dump, bytea, dump order questions

by: Együd Csaba | last post by:

Hi All, I'd like to dump out my database using plain text format. Everything is ok, but the bytea filds. Restoring the dump file (using <<psql -f LO_TRY_INSERT_BYTEA.sql tmp7>>) it sends an...

PostgreSQL Database

Loading Binary Data into Bytea Fields?

by: Jerry LeVan | last post by:

I am getting ready to try to add image viewing into my postgresql browser... I am probably going to start with storing a thumbnail and a URL for the actual image in the table. Is there any...

PostgreSQL Database

Best practices for moving large amounts of data using WCF ...

by: =?Utf-8?B?TW9iaWxlTWFu?= | last post by:

Hello everyone: I am looking for everyone's thoughts on moving large amounts (actually, not very large, but large enough that I'm throwing exceptions using the default configurations). We're...

.NET Framework

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++