473,327 Members | 1,892 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

writing bits to file

Hello,

I am working on a compression project and I want to write ASCII characters using
the minimum amount of bits. Since I will be writing ASCII characters from 0-127
I only need 7 bits to represent a character. Therefore, if I write each
character at a time, I will end up writing 8 bits.

One method would be to somehow concatinate all the 7 bit words I am trying to
write and just pad the last byte.

e.g. Read the first 7-bit word, concatinate a bit from the next word and write
the byte. Concat two bits from the next word and write the byte, e.g. until I'm
finished.

The method seems overly complicated and I was wondering if there are any bit
libraries out there that I could take advantage of? Maybe you know of another
way for me to achieve this.

Thanks,
Matt
May 21 '06 #1
7 5507
"Matt Kowalczyk" <ma******@comcast.net> wrote
Hello,

I am working on a compression project and I want to write ASCII characters
using
the minimum amount of bits. Since I will be writing ASCII characters from
0-127
I only need 7 bits to represent a character. Therefore, if I write each
character at a time, I will end up writing 8 bits.

One method would be to somehow concatinate all the 7 bit words I am trying
to
write and just pad the last byte.

e.g. Read the first 7-bit word, concatinate a bit from the next word and
write
the byte. Concat two bits from the next word and write the byte, e.g.
until I'm
finished.

The method seems overly complicated and I was wondering if there are any
bit
libraries out there that I could take advantage of? Maybe you know of
another
way for me to achieve this.


Write a function

int squashASCII(unsigned char *out, char *in)

to squash a string. Return the number of bytes, which will be 1/8th less
than the input, -1 on error (if someone passes a non-ASCII character).

It should not be at all difficult to do.
--
Buy my book 12 Common Atheist Arguments (refuted)
$1.25 download or $7.20 paper, available www.lulu.com/bgy1mm
May 21 '06 #2
Malcolm wrote:
"Matt Kowalczyk" <ma******@comcast.net> wrote
Hello,

I am working on a compression project and I want to write ASCII characters
using
the minimum amount of bits. Since I will be writing ASCII characters from
0-127
I only need 7 bits to represent a character. Therefore, if I write each
character at a time, I will end up writing 8 bits.

One method would be to somehow concatinate all the 7 bit words I am trying
to
write and just pad the last byte.

e.g. Read the first 7-bit word, concatinate a bit from the next word and
write
the byte. Concat two bits from the next word and write the byte, e.g.
until I'm
finished.

The method seems overly complicated and I was wondering if there are any
bit
libraries out there that I could take advantage of? Maybe you know of
another
way for me to achieve this.

Write a function

int squashASCII(unsigned char *out, char *in)

to squash a string. Return the number of bytes, which will be 1/8th less
than the input, -1 on error (if someone passes a non-ASCII character).

It should not be at all difficult to do.


Looks like that's what I will have to do. What I really want is a bitbuffer! I
was browsing around on the internet and I found one used by tooLAME (an mpeg
encoder) which hopefully won't be too complicated to integrate into my project.
May 21 '06 #3
Matt Kowalczyk said:
Hello,

I am working on a compression project and I want to write ASCII characters
using
the minimum amount of bits. Since I will be writing ASCII characters from
0-127
I only need 7 bits to represent a character. Therefore, if I write each
character at a time, I will end up writing 8 bits.

One method would be to somehow concatinate all the 7 bit words I am trying
to write and just pad the last byte.

e.g. Read the first 7-bit word, concatinate a bit from the next word and
write
the byte. Concat two bits from the next word and write the byte, e.g.
until I'm finished.


A few years ago, a guy on sci.crypt defended his use of gcc extensions on
the grounds that his desire to use 19-bit integers could not be met in ISO
C. Naturally enough, then, I spent a few minutes cutting some ISO C code
that would let him do just that. (It's a bit of a hack, as you'll see in a
moment, but it should work fine on a C8S16IL32 system, which is almost
certainly what you're using.)

It occurs to me that you could easily adapt it to your purposes. Set up a
buffer - an array of unsigned char - that is at least (N + 6) / 7 bytes in
size where N is the number of ASCII characters you want to crunch down, and
write to it using Put_nBit_Int. Once you've finished, the array of unsigned
char holds your "packed" data (and possibly some spare space, so you should
beware of that - maybe retain (binary) 0000000 as a terminator). You can
store or send that. At the receiving end, use Get_nBit_Int to retrieve the
crunched bytes.

The main() function at the bottom should give you a rough idea of how to use
the nbit functions.

You are welcome to use what follows, without payment, provided you credit me
in the source code.

#include <stdio.h>
#include <limits.h>

#define SET_BIT(a, n) (a)[(n) / CHAR_BIT] |= \
(unsigned char)(1U << ((n) % CHAR_BIT))

#define CLEAR_BIT(a, n) (a)[(n) / CHAR_BIT] &= \
(unsigned char)(~(1U << ((n) % CHAR_BIT)))

#define TEST_BIT(a, n) (((a)[(n) / CHAR_BIT] & \
(unsigned char)(1U << ((n) % CHAR_BIT))) ? 1 : 0)

/* Debugging function, used for printing len * CHAR_BIT
* bits from s.
*/
int print_bits(unsigned char *s, int len)
{
int i, j;
for(i = 0; i < len; i++)
{
for(j = 0; j < CHAR_BIT; j++)
{
printf("%d", TEST_BIT(s, i * CHAR_BIT + j) ? 1 : 0);
}
printf(" ");
}
printf("\n");
return 0;

}

unsigned int BitsInUnsignedInt(void)
{
static unsigned int answer = 0;
unsigned int testval = UINT_MAX;
if(answer == 0)
{
while(testval > 0)
{
++answer;
testval >>= 1;
}
}

return answer;

}

/* This function gets the Indexth n-bit unsigned int field from the bit
* array. To do this, it builds the unsigned int value bit by bit.
*
* Example call:
*
* unsigned int val;
* val = Get_nBit_Int(MyBitArray, this is the base address
* 19, get a 19-bit number
* 13, get the 14th number (0 to max - 1)
* 7); skip 7 leading bits at the start of
the array
*
*/
unsigned int Get_nBit_Int(unsigned char *BitArray,
unsigned int n,
unsigned int Index,
unsigned int BaseBit)
{
unsigned int Value = 0;
unsigned int j;
unsigned int i = Index * n;

if(n <= BitsInUnsignedInt())
{
i += BaseBit;
BitArray += i / CHAR_BIT;
i %= CHAR_BIT;

for(j = 0; j < n; j++)
{
/* Move the populated bits out of the way.
* Yes, this means that the first iteration
* of the loop does a useless shift. I think
* I can live with that. :-)
*/
Value <<= 1;

/* Populate the low bit */
Value |= TEST_BIT(BitArray, i + j);
}
}

return Value;

}

void Put_nBit_Int(unsigned char *BitArray,
unsigned int n,
unsigned int Index,
unsigned int BaseBit,
unsigned int Value)
{
unsigned int j;
unsigned int i = Index * n;

if(n <= 32)
{
i += BaseBit;

BitArray += i / CHAR_BIT;
i %= CHAR_BIT;

j = n;
while(j--)
{
/* Use the rightmost bit */
if(Value & 1)
{
SET_BIT(BitArray, i + j);
}
else
{
CLEAR_BIT(BitArray, i + j);
}
/* Throw the rightmost bit away, moving the next bit into
* position. On the last iteration of the loop, this
* instruction is pointless. <shrug>
*/
Value >>= 1;
}
}

}

int main(void)
{
unsigned char test_array[9] = {0};

print_bits(test_array, 9);
printf("Storing the 19-bit value 0x7FFFF starting at bit 3.\n");
Put_nBit_Int(test_array, 19, 0, 3, 0x7FFFF);
print_bits(test_array, 9);
printf("Retrieving the 19-bit value starting at bit 3: %X\n",
Get_nBit_Int(test_array, 19, 0, 3));
printf("Storing the 19-bit value 0x7EDCB starting at bit 3 + (2 *
19).\n");
Put_nBit_Int(test_array, 19, 2, 3, 0x7EDCB);
print_bits(test_array, 9);
printf("Retrieving the 19-bit value starting at bit 3 + (2 * 19): %X\n",
Get_nBit_Int(test_array, 19, 2, 3));

return 0;

}

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
May 21 '06 #4
Richard Heathfield <in*****@invalid.invalid> writes:
[...]
A few years ago, a guy on sci.crypt defended his use of gcc extensions on
the grounds that his desire to use 19-bit integers could not be met in ISO
C. Naturally enough, then, I spent a few minutes cutting some ISO C code
that would let him do just that. (It's a bit of a hack, as you'll see in a
moment, but it should work fine on a C8S16IL32 system, which is almost
certainly what you're using.) [...]

I haven't studied the code in any depth, but I'll bet you could make
it portable, at least to systems with CHAR_BIT==8, by judicious use of
<stdint.h> or equivalent.

[...]
You are welcome to use what follows, without payment, provided you credit me
in the source code.

#include <stdio.h>
#include <limits.h>

#define SET_BIT(a, n) (a)[(n) / CHAR_BIT] |= \
(unsigned char)(1U << ((n) % CHAR_BIT))

#define CLEAR_BIT(a, n) (a)[(n) / CHAR_BIT] &= \
(unsigned char)(~(1U << ((n) % CHAR_BIT)))

#define TEST_BIT(a, n) (((a)[(n) / CHAR_BIT] & \
(unsigned char)(1U << ((n) % CHAR_BIT))) ? 1 : 0)
[snip]
unsigned int BitsInUnsignedInt(void)
{
static unsigned int answer = 0;
unsigned int testval = UINT_MAX;
if(answer == 0)
{
while(testval > 0)
{
++answer;
testval >>= 1;
}
}

return answer;

}
[snip]
unsigned int Get_nBit_Int(unsigned char *BitArray,
unsigned int n,
unsigned int Index,
unsigned int BaseBit)
{
unsigned int Value = 0;
unsigned int j;
unsigned int i = Index * n;

if(n <= BitsInUnsignedInt()) [snip] return Value;

}


BitsInUnsignedInt() always returns the same value. I'd call it once,
store the value, and use the stored value.

Note that it differs from CHAR_BIT*sizeof(unsigned int) only if
unsigned int has padding bits.

[snip]

In both Get_nBit_Int() and Put_nBit_Int(), you process one bit at a
time. It occurs to me that you could probably work on a byte at a
time, using shifts and masks to grab any partial bytes at the
beginning and end of the bit sequence; the resulting code might be
significantly faster. Alas, I'm too lazy to write it.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
May 21 '06 #5
Keith Thompson said:
It occurs to me that you could probably work on a byte at a
time, using shifts and masks to grab any partial bytes at the
beginning and end of the bit sequence; the resulting code might be
significantly faster. Alas, I'm too lazy to write it.


Likewise. Your comments are noted and appreciated, but since this is only
the second time in N years that the code could conceivably have been the
slightest use to anyone, I prefer to focus my programming efforts
elsewhere. In other words, I'm too lazy to fix it. :-)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
May 21 '06 #6
"Matt Kowalczyk" <ma******@comcast.net> wrote

Write a function

int squashASCII(unsigned char *out, char *in)

to squash a string. Return the number of bytes, which will be 1/8th less
than the input, -1 on error (if someone passes a non-ASCII character).

It should not be at all difficult to do.


Looks like that's what I will have to do. What I really want is a
bitbuffer! I
was browsing around on the internet and I found one used by tooLAME (an
mpeg
encoder) which hopefully won't be too complicated to integrate into my
project.

You really shouldn't have to raid an MPEG library to write this simple
function.

int squashASCII(unsigned char *out, char *in)
{
int rack = 0; /* buffer to hold bits */
int racklen = 0; /* number of bits in buffer */
int answer = 0; /* number of bytes written */
while(*in)
{
if(!isascii(*in))
return -1;
rack += *in;
rack <<= 7;
racklen += 7;
if(racklen > 8)
{
out++ = (rack & 0xFF) >> 8;
racklen -= 8;
answer++;
}
in++;
}
/* write the last few to output */
if(racklen > 0)
{
rack <<= (16 - racklen);
*out++ = (rack & 0xFF) >> 8;
answer++;
if(racklen > 8)
{
*out++ = (rack & 0xFF);
answer++;
}
}

return answer;
}

It hasn't been tested.
--
Buy my book 12 Common Atheist Arguments (refuted)
$1.25 download or $7.20 paper, available www.lulu.com/bgy1mm
May 21 '06 #7

"Malcolm" <re*******@btinternet.com> wrote in message
news:y4********************@bt.com...
"Matt Kowalczyk" <ma******@comcast.net> wrote

Write a function

int squashASCII(unsigned char *out, char *in)

to squash a string. Return the number of bytes, which will be 1/8th less
than the input, -1 on error (if someone passes a non-ASCII character).

It should not be at all difficult to do.


Looks like that's what I will have to do. What I really want is a
bitbuffer! I
was browsing around on the internet and I found one used by tooLAME (an
mpeg
encoder) which hopefully won't be too complicated to integrate into my
project.

You really shouldn't have to raid an MPEG library to write this simple
function.

int squashASCII(unsigned char *out, char *in)
{
int rack = 0; /* buffer to hold bits */
int racklen = 0; /* number of bits in buffer */
int answer = 0; /* number of bytes written */
while(*in)
{
if(!isascii(*in))
return -1;
rack += *in;
rack <<= 7;
racklen += 7;
if(racklen > 8)
{
out++ = (rack & 0xFF) >> 8;
racklen -= 8;
answer++;
}
in++;
}
/* write the last few to output */
if(racklen > 0)
{
rack <<= (16 - racklen);
*out++ = (rack & 0xFF) >> 8;
answer++;
if(racklen > 8)
{
*out++ = (rack & 0xFF);
answer++;
}
}

return answer;
}

It hasn't been tested.

In fact it's a sleepy Sunday night function.
It won't work properly and you'll have to fiddle with it to get it to work
(too tired now to fix it).
--
www.personal.leeds.ac.uk/~bgy1mm
May 21 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Grumfish | last post by:
In order to familiarize my self with Flash files and their bytecode I've started to make an assembler. My first problem is writing the bitfields the format uses often. It is a series of fields,...
5
by: rob | last post by:
hey every1, I've got alot of data to write out to file and it's all just 1's and 0's. It's all stored in 2 dimensional arrays of width 32 and varying height. At the moment it's all just...
10
by: Kristian Nybo | last post by:
Hi, I'm writing a simple image file exporter as part of a school project. To implement my image format of choice I need to work with big-endian bytes, where 'byte' of course means '8 bits', not...
5
by: zambak | last post by:
Hi I have assignment for some wierd compression alghoritam that will read in from a file convert characters to 5 bit codes and then write out compressed version of the original file. For...
9
by: curious_one | last post by:
All, I have a struct struct { char a; char b; }some_struct; I have a shared memory that can contain 16bit wide data, I find that when writing an 8bit value in to char "a" the same value is...
3
by: Joshua Russell | last post by:
Hi, I've got a program (see source below) that makes a file and fills it with random binary values (from 0 to 255). The source below works, however the program creates files at a rate of about...
13
by: Speed | last post by:
Hi, I have a 57000 bit long binary number and I want to save it to a file. At the moment I am using an unsigned char array to store the array and then fwrite to store it as follows. fwrite(...
89
by: Skybuck Flying | last post by:
Hello, This morning I had an idea how to write Scalable Software in general. Unfortunately with Delphi 2007 it can't be done because it does not support operating overloading for classes, or...
59
by: riva | last post by:
I am developing a compression program. Is there any way to write a data to file in the form of bits, like write bit 0 then bit 1 and then bit 1 and so on ....
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.