Im trying to do this for work. I am to design a binary code for encoding words. They are both upper and lower case, basic punctuation,16 different font styles, bold, itilicied, underlined or regular, font sizes (10-30, even numbers)
i need the least number of bits used to represent each aspect of the given scenario
i need to specify codes for representing each aspect and to code a sentence.
Also, specify the coding scheme so anyone can follow it to uniquely decode a read the word/sentence
Finally, I need to give an example of a coded sentence and explain howu your codes so that the sentence can be interpreted
Thanks for any help
Well, let's see here ....
There are 2^4 = 16 font styles, so 4 bits there.
There are 2^4 < 21 < 2^5 font sizes, so 5 bits there but with some waste.
bold, italicized, underlined or regular: 4 = 2^2, so 2 bits with no waste.
2^4 < 26 < 2^5 letters, so 5 bits there with room for 7 punctuation symbols. I guess tab, space, and newline might be considered 3 of these punctuation symbols.
lowercase or uppercase: 1 bit.
Total: 17 bits, with a slight amount of waste on the font sizes. This is just over 2 bytes, so you'd need 3 bytes. Interestingly enough, that's (red,green,blue) in true color. :-)
You should probably make some sort of struct that is 3 bytes long. IIRC, however, you can't guarantee the memory alignment in a struct that isn't a multiple of 4 bytes, so you may need to throw yet another byte in there for padding. You can access individual bits with bit shift and boolean operations.
One thing to note: This assumes that every single character that you encode for can be a different font size, style, type, etc. If font size, style, upper/lowercase, etc. are "state variables" (current font size, style, shape, etc.), you could condense this somewhat. One bit can encode to say either formatting or data.
Then, you have about 12 bits for style, and 5 bits for characters. The great thing about that is that it fits inside a 2 byte struct!
bit 0: format or character
if format mode:
bit 1: caps or lowercase
bits 2-3: bold, italics, underline, regular
bits 4-7: font style
bits 8-12: font size
bits 13-15: unused. might as well allow more font sizes
if charcter mode:
bits 0-7: extended characters (yen symbol, etc.)
bits 8-15: one regular byte, so just take the unsigned character and put it here
Super efficient, with more options than called for. However, it might be possible to make it even more efficient. Notice how we got it under two bytes by allowing formating and character modes. Now, consider 4 modes, instead of two. that takes up 2 bits. Is it possible to squeeze this into one byte?
Notice that there are three types of fields that are relatively large: font size, font style, and what letter to encode. The largest of these is 5 bits. That's 3 categories.
Hmm ...
category 0: font style
category 1: font size
category 2: b/i/u/r and upper/lower
category 3: character encoding
bits 0-1: choose a category
bits 2-7: you get 6 bits to encode the stuff, and only need at most 5.
Yep. This fits the whole thing in 1 byte. So, imagine this:
byte 1: category 0 - set the current font style
byte 2: category 1 - set the current font size
byte 3: category 2 - set the current b/i/u/r and upper/lower
byte 4: category 3 - encode the first letter, probably capitalized.
byte 5: category 2 - set current b/i/u/r and upper/lower
byte 6+: category 3 - encode the remaining letters.
See what I mean? -- Paul