473,657 Members | 2,550 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How can mySQL only use 1 byte for storage?

>From the manual "Storage Requirements":

"ENUM('value1', 'value2',...) =1 or 2 bytes, depending on the number
of enumeration values (65,535 values maximum)"

This seems to mean:

"a" = 1 byte of storage
"prestidigitati on" = 1 byte of storage

and

enum("a","b","c ","d") = 4 bytes of storage
enum("prestidig itation","sleig ht of hand","disappea rances","card
tricks") = 4 bytes of storage

How is this possible? I'm trying to get my head around WHY these values
in an ENUM-type column occupy the same amount (1 byte or 4 bytes) of
storage space (<256 enumeration values).

Am I misreading the manual? Is it just referring to the index, and if
so, what accounts for the string storage? If not, doesn't the index
take up 1 byte per value, itself? How can these strings occupy the same
amount of storage? Is there some sort of serialization going on inside
a MyISAM table?

Thanks in advance for any illumination.

Aug 14 '06 #1
7 4182
StupidScript wrote:
enum("a","b","c ","d") = 4 bytes of storage
No, that would only take 1 byte of storage. For example:
"a" = 0000 in binary,
"b" = 0001,
"c" = 0010,
"d" = 0011

Actually only 2 bits are used to record which of four values are used.
But it's easier to round up to whole bytes, so even though only 2 bits
are being used, the implementation "generously " allocates 1 byte.

Somewhere else, the mapping from binary values to the four strings is
stored, but it is stored only once. No need to store it for every row
that has an ENUM column. When you query the column, it looks up the
string corresponding to the binary pattern, and returns that string.
enum("prestidig itation","sleig ht of hand","disappea rances","card
tricks") = 4 bytes of storage
That does require more storage for the mapping between binary values and
the strings, but only once. It requires only 2 bits (or round up to 1
whole byte) per row.

Regards,
Bill K.
Aug 14 '06 #2
>>From the manual "Storage Requirements":

This is the storage requirements *FOR A ROW OF DATA*. The table
definition is not included in each row of data. Strings are not
included in each row of data because they do not appear there.

It is my observation that table definitions take between 8.5K and
10.5K for the tables I've got, which don't go overboard with strings
in enums. Table definitions don't change size with the number of
rows stored.
>"ENUM('value1' ,'value2',...) =1 or 2 bytes, depending on the number
of enumeration values (65,535 values maximum)"

This seems to mean:

"a" = 1 byte of storage
"prestidigitat ion" = 1 byte of storage

and

enum("a","b"," c","d") = 4 bytes of storage
enum("prestidi gitation","slei ght of hand","disappea rances","card
tricks") = 4 bytes of storage

How is this possible?
It's easy. Number the possible values. Store the index in the record.
MySQL has to do some translation on the way in and out, a tradeoff
of a little CPU time for reduced disk storage.

>I'm trying to get my head around WHY these values
in an ENUM-type column occupy the same amount (1 byte or 4 bytes) of
storage space (<256 enumeration values).
How come "Sue Li" and "Alexander Brownschweiger
Supercalifragil isticexpialidoc ious" both have 9-digit Social Security
numbers?
>Am I misreading the manual? Is it just referring to the index, and if
so, what accounts for the string storage?
As I understand it, it's referring to a data row (*.MYD for MyISAM).
It does *NOT* include the index (*.MYI for MyISAM), if the enum field
is in an index. And the table definition (*.frm for MyISAM) also isn't
included. If you really need to sweat over the disk storage used
by a table definition, you have much more serious problems to worry
about (like logs).

Do you really care about the length of the strings, which are stored
once *per table*, in a table with a few hundred million rows where the
data field is stored *per row*?
>If not, doesn't the index
take up 1 byte per value, itself?
No. Values not indexed don't take any storage at all and values that
are indexed certainly take more than one byte per value even if that
value is only one byte.
>How can these strings occupy the same
amount of storage?
The strings aren't stored in each row.
>Is there some sort of serialization going on inside
a MyISAM table?

Thanks in advance for any illumination.

Aug 14 '06 #3
Bill:
>Somewhere else, the mapping from binary values
to the four strings is stored, but it is stored
only once."
But where is it stored only once? And why doesn't that data retention
add to the storage requirements for a set of variable string data, like
an ENUM row?

Gordon:
>It's easy. Number the possible values. Store the
index in the record. MySQL has to do some translation
on the way in and out, a tradeoff of a little CPU
time for reduced disk storage.
Translation ... from the index mySQL stores to the string value ...
which is stored ... where? And how is it reduced, if there needs to be
an additional 1-2 byte associative element generated and stored for
each element in the ENUM array?
>How come "Sue Li" and "Alexander Brownschweiger
Supercalifragi listicexpialido cious" both have 9-digit
Social Security numbers?
Unless every string is reduced to some predictable-length value, such
as that used to count bytes-stored, as we are discussing, how could a
simple binary string of 1-2 byte length be capable of associating a
predictable 9-digit code (SSN) with a variable User Name? I mean, the
second name you used in your example is 22 characters! I can imagine
much longer strings. Doesn't that need to be stored, somewhere?

Somewhere the name is associated with the number, and the length of
that name's characters would seem to take up additional storage space
.... somewhere. Maybe there's serialization occuring, in order to
standardize the string length. I dunno. But there's more than 1-2 bytes
involved, isn't there?

I'm delighted with the avenues these comments are making into my brain,
and thank you both very much. I'm just wondering where my understanding
of 'storage' should break with the laws of physics?

Seriously, though, thanks. Good stuff! :)

Aug 15 '06 #4
Okay, how about this? (No C.S. degree ... just guessing ...)

1) mySQL does, indeed, serialize both the individual elements of the
ENUM row (!) and the aggregated total of those values and then adds an
index for itself to minimize its own seek-deliver activity ... dunno
exactly how, but that's my working theory ... all taking up 1-2 bytes
of total storage space per ENUM row (1 value).

2) Therefore, the ENUM-type column is better-suited for short,
relatively predictable types of values, like "one", "two", "three" and
"true", "false", and not-so-good for pages of text or image data, etc.

Here's why:

Jimmy Crunch gets it into his head that he should store EVERY value he
needs in ENUM-type columns because, after all, it only uses 1-2 bytes
for the whole row! (1 value) The problem is, serializing gets lossy ...
it can't quite reconstruct the long-ish strings right, so it fudges a
little. (Y'all know what I mean ...)

So Mr. Crunch decides to use ENUM columns for short, predictable string
values, and BLOB and TEXT and VARCHAR and, et al. for the other
datatypes, because those types of storage simply do the job better, for
his data storage needs.

I know it seems like I'm hooked on serial, but whaddya think? Something
like that?

Regards,

James

Aug 15 '06 #5
StupidScript wrote:
Bill:
>Somewhere else, the mapping from binary values
to the four strings is stored, but it is stored
only once."

But where is it stored only once? And why doesn't that data retention
add to the storage requirements for a set of variable string data, like
an ENUM row?

Somewhere the name is associated with the number, and the length of
that name's characters would seem to take up additional storage space
... somewhere.
They're stored as part of the table definition. Just as when you store
a column with type VARCHAR(200), the table definition stores the value
200 as the maximum length of that column. An ENUM column must store all
the possible values (strings).

In the case of MySQL, it goes in the .FRM file for a given table, which
is where it stores all information about the columns for the respective
table. Try it:

CREATE TABLE test.test_enum (
col1 ENUM("prestidig itation", "sleight of hand",
"disappearances ","card tricks")
);
EXIT;

Now go look at "<datadir>/test/test_enum.FRM":

$ cd <datadir>/test
$ od -c test_enum.FRM
.. . .
0020520 004 \0 005 c o l 1 \0 004 005 020 020 \0 002 \0 \0
0020540 \b 201 020 \0 001 367 \b \0 \0 377 c o l 1 377 \0
0020560 377 p r e s t i d i g i t a t i o
0020600 n 377 s l e i g h t o f h a n
0020620 d 377 d i s a p p e a r a n c e s
0020640 377 c a r d t r i c k s 377 \0
.. . .

"od" is the UNIX command for "octal dump", or output a binary file with
unprintable characters represented by their base-eight value.

Notice how the values are separated with a byte with octal value 377.
This is apparently how MySQL knows where each enum value terminates.
Then at the end of all these values, there's an 0 byte value. I assume
that MySQL uses the ordering of these strings to determine the value to
store in rows of data.

I don't know the purpose of all those other bytes, but one can assume
they say other things about the column, like the fact that it is an
ENUM, the position of the column within the table, whether it is NOT
NULL or not, etc.
I'm delighted with the avenues these comments are making into my brain,
and thank you both very much. I'm just wondering where my understanding
of 'storage' should break with the laws of physics?
I think you are making this harder than it needs to be! :-)

A similar technique is used by GIF image files. You can have 256
distinct colors in the palette in a GIF file, but you can choose any
24-bit color for each of those palette entries. The palette is stored
near the beginning of the file, before the image data, that maps values
in the image data to the 24-bit RGB values. Therefore a GIF image
stores 1 byte per pixel, even though the color is a 3-byte RGB value.
There's a small amount of overhead (256 x 3 bytes) near the beginning of
the file. Any software that reads GIF images knows to use that palette
data as an index for the image data, and therefore knows how to display
the right color.

Regards,
Bill K.
Aug 15 '06 #6
StupidScript wrote:
Jimmy Crunch gets it into his head that he should store EVERY value
he needs in ENUM-type columns because, after all, it only uses 1-2
bytes for the whole row!
I wouldn't use ENUM for storing a large number of distinct strings.
You'd be likely to make the .FRM file very large, and that could have
its own scalability problems. The strings should never be lossy, but
they are probably managed with linear search, instead of hashing or
indexing.

Also, any time you add a new value would constitute a schema change. In
MySQL, this causes the table to be _restructured_ (making a copy of the
whole table and then dropping the original). Changing the values of an
ENUM shouldn't require this, but MySQL isn't very smart about optimizing
schema changes. There are many cases which shouldn't require a table
restructure, but MySQL does one anyway.

An ENUM should be used for a column that has a small number of distinct
values, which change very seldom (preferably never). For example:
yes/no, male/female, red/yellow/green (states of a traffic light),
january through december (month names), etc. I wouldn't use ENUM for an
arbitrary text column.

In fact, I don't think I've ever actually used an ENUM in a real
project. It's not worth the trouble.

Anything that ENUM would be used for, usually is better implemented as a
foreign key referencing a lookup table. Then it's very easy to add a
new value to the set, query the set of permitted values, add attributes
or descriptions to each value, etc.

Being overly meticulous with storage efficiency is not worth the effort.
Disk drives are incredibly cheap these days -- on the order of pennies
per gigabyte. You're better off spending your effort on designing a
simple physical data model, and implementing it in a way that makes it
easy to work on.

Regards,
Bill K.
Aug 15 '06 #7
Many, many thanks, Bill.

I believe I have a handle on how ENUM fields operate, and why they only
take 1-2 bytes in the table's DATA storage. The table definitions
storage requirement (which would be of a variable amount depending on
the ENUM's definition values) is insignificant.

I also appreciate the insight into using alternative field types for
efficiency, scalability and portability.

Great answers! Thanks.

James

Aug 15 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3253
by: Xenophobe | last post by:
I have a series of checkboxes and need to store their values in a MySQL table. I'm wondering what data type people recommend: varchar or text, exp. "True/False", "Yes/No" tinyint, exp. "0/1" Maybe it depends on the usage or maybe it doesn't matter and/or is a choice of personal preference.
4
25450
by: Ced | last post by:
Hi, i'm not an expert in C but i try to compile BTNG software under linux kernel 2.4.2-2. I get these errors at the very first stage. Does someone could have a rapid look on this and tell me what's wrong regards I get this error:
9
2433
by: CptDondo | last post by:
I am working on an embedded platform which has a block of battery-backed RAM. I need to store various types of data in this block of memory - for example, bitmapped data for control registers, strings for logging, and structures for data points. I want to use one function to read data from this block and one function to write data, for example: sram_read(OBJECT_IDENTIFIER) would return a pointer to the appriate object and
1
2563
by: Eþref DURNA | last post by:
hello, I have a mysql database(which I could not modify) and there is a Field type of blob which exuals to byte at mysql. I could not read the data from mysql to get that byte values and convert to string. Thanks for your help Esref DURNA esref_durna@mircosoft.com
110
10553
by: alf | last post by:
Hi, is it possible that due to OS crash or mysql itself crash or some e.g. SCSI failure to lose all the data stored in the table (let's say million of 1KB rows). In other words what is the worst case scenario for MyISAM backend? Also is it possible to not to lose data but get them corrupted?
3
1905
by: ScarletPimpernal | last post by:
Here are my Storage Engine Status. mysql> SHOW ENGINES; +------------+---------+----------------------------------------------------------------+ | Engine | Support | Comment | +------------+---------+----------------------------------------------------------------+ | MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance | | MEMORY | YES |...
0
12885
Coldfire
by: Coldfire | last post by:
Since i cannot show the differences in a two-column like table. I am first putting MS SQL Server 2005 and then MySQL 5.x. MS SQL Server 2005 Brief Overview - SQL Server is a full-fledged database system developed specifically for large enterprise databases. All advanced features of a relational database are fully implemented. - Once you purchase the product, you are only limited to the Sybase-derived engine.
3
18155
by: blackevanuz | last post by:
Hi I want to storage a txt file into mysql using a blob and after that extract it, the code i use to save it is: fs = new FileStream("C:\\Documents and Settings\\Pru.txt", FileMode.Open, FileAccess.Read); FileSize = fs.Length; rawData = new byte; fs.Read(rawData, 0, Convert.ToInt32(FileSize)); fs.Close(); SQL = "INSERT INTO EEG (eeg)values ('" + rawData...
2
1358
TheServant
by: TheServant | last post by:
Hi guys, Hope this finds you all well. I have been away (getting married and taking a break from everything) but I am back now and I have need of some more experienced php guru's: I am trying to make an in-game chat system which will create a new chatroom when a user creates a group. I have a couple options and I want to know which is best in your opinion: 1. Create a single table for all chatrooms and include 100 columns for messages...
0
8425
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8743
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8622
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7355
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6177
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5647
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4173
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
1973
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1736
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.