473,396 Members | 1,990 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Compress ASCII text as Hex?

Hi -

I was speaking with someone who mentioned that it's possible to encode
an ascii string as hex(?) in order to fit more data into the same # of
chars. Can anyone enlighten me?

The scenario is - I've got a CSV with a field that has a 16 character
limit. I need to fit potentially 24 ASCII characters into it.

Thanks.
-Ben
--
to reply, remove .s.p.a.m. from email
Nov 16 '05 #1
8 10963
Ben,

You can't do that unless you limit the range of characters that can be
used in the 24 character string. Without doing that, you have to accept the
full range of characters and you can't just squeeze them in there without
some loss.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Ben Bloom" <bb****@macg.s.p.a.m.regor.com> wrote in message
news:ew**************@TK2MSFTNGP15.phx.gbl...
Hi -

I was speaking with someone who mentioned that it's possible to encode an
ascii string as hex(?) in order to fit more data into the same # of chars.
Can anyone enlighten me?

The scenario is - I've got a CSV with a field that has a 16 character
limit. I need to fit potentially 24 ASCII characters into it.

Thanks.
-Ben
--
to reply, remove .s.p.a.m. from email

Nov 16 '05 #2
Thanks Nicholas,

The 24 character string is a concatenation of a number (8-10 digits, I
believe) and two other string fields. Would I have more success if I
tried to shrink the number only?

-Ben

Nicholas Paldino [.NET/C# MVP] wrote:
Ben,

You can't do that unless you limit the range of characters that can be
used in the 24 character string. Without doing that, you have to accept the
full range of characters and you can't just squeeze them in there without
some loss.

Hope this helps.

--
to reply, remove .s.p.a.m. from email
Nov 16 '05 #3
if you are using a subset of characters, try fit 2 characters into character
written to the csv file,
say for example you were only interested in the character codes from 0-127,
you could write the string "me" i.e. hex codes 6d and 65, into one character
[pseudo]
char c = 0x6d65
[/pseudo]

and write that single char to the text file,
then when you read it, you breake it up again.
hope that helps

"Ben Bloom" wrote:
Thanks Nicholas,

The 24 character string is a concatenation of a number (8-10 digits, I
believe) and two other string fields. Would I have more success if I
tried to shrink the number only?

-Ben

Nicholas Paldino [.NET/C# MVP] wrote:
Ben,

You can't do that unless you limit the range of characters that can be
used in the 24 character string. Without doing that, you have to accept the
full range of characters and you can't just squeeze them in there without
some loss.

Hope this helps.

--
to reply, remove .s.p.a.m. from email

Nov 16 '05 #4
<"=?Utf-8?B?QnJpYW4gS2VhdGluZyBFSTlGWEI=?=" <csharp at
briankeating.net>> wrote:
if you are using a subset of characters, try fit 2 characters into character
written to the csv file,
say for example you were only interested in the character codes from 0-127,
you could write the string "me" i.e. hex codes 6d and 65, into one character
[pseudo]
char c = 0x6d65
[/pseudo]

and write that single char to the text file,
then when you read it, you breake it up again.


Note that that will only work if your CSV file is written in a Unicode-
supporting encoding. There's also no absolute guarantee that it won't
end up forming invalid characters, or characters which the reader might
normalize to a different but equivalent form as far as Unicode is
concerned. I doubt that it'll be a problem, but it's worth bearing in
mind.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #5
Yes your right,
Encoding could prevent a problem but my description was slightly actually
more than slightly incorrect,
if we were limited the the 0-127 characters for the ascii table then we
would be using 7 bits to represent a character, therefore for every 7
characters we could squeeze in an extra char.
More trouble than it's worth i guess.

regards
Brian.
"Jon Skeet [C# MVP]" wrote:
<"=?Utf-8?B?QnJpYW4gS2VhdGluZyBFSTlGWEI=?=" <csharp at
briankeating.net>> wrote:
if you are using a subset of characters, try fit 2 characters into character
written to the csv file,
say for example you were only interested in the character codes from 0-127,
you could write the string "me" i.e. hex codes 6d and 65, into one character
[pseudo]
char c = 0x6d65
[/pseudo]

and write that single char to the text file,
then when you read it, you breake it up again.


Note that that will only work if your CSV file is written in a Unicode-
supporting encoding. There's also no absolute guarantee that it won't
end up forming invalid characters, or characters which the reader might
normalize to a different but equivalent form as far as Unicode is
concerned. I doubt that it'll be a problem, but it's worth bearing in
mind.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #6
<"=?Utf-8?B?QnJpYW4gS2VhdGluZyBFSTlGWEI=?=" <csharp at
briankeating.net>> wrote:
Yes your right,
Encoding could prevent a problem but my description was slightly actually
more than slightly incorrect,
if we were limited the the 0-127 characters for the ascii table then we
would be using 7 bits to represent a character, therefore for every 7
characters we could squeeze in an extra char.
More trouble than it's worth i guess.


Certainly when the only necessity is to squeeze 24 characters into 16
:)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #7
There's a method, but it's a bit snarky....

There an encoding format code BASE64 (also known as UUEncoding in some
quarters). It take fully binary data (0-255) and converts it a set of 64
printable characters (digits, uppercase, lowercase plus two symbols + and
/). Since email messages are required to be pure printable text (due to
some ancient hardware, which are almost certainly no longer on the 'net),
all attachments are BASE64 encoded. It converts 3 binary bytes into 4
characters, so encoded blocks increase 33% in size.

So, what does this effect you? Well, as long as your "encoded" string meets
the criteria of Base64 encoding, you can "decode" it into a smaller block of
binary data. 4 characters will become 3 bytes, or in your case, 20
characters can become 15 bytes.

string origString = "123456,abcdef,ghijkl"; // 20 character CSV text

string prepareText = origString.Replace(',', '+'); // Replace commas with
plus signs
byte[] compressedText = Convert.FromBase64String(prepareText);
Console.WriteLine("Length of Conpressed text = {0}", compressedText.Length);
// Save compressedText to your store.
// :
// Later read it back
string alteredText = Convert.ToBase64String(compressedText);
string finalString = alteredText.Replace('+', ',');

Console.WriteLine("Text: {0}, this {1} the same as the original",
finalString, finalString == origString ? "IS" : "IS NOT");

Running the above, I get:
Length of Conpressed text = 15
Text: 123456,abcdef,ghijkl, this IS the same as the original

--
Truth,
James Curran
[erstwhile VC++ MVP]
Home: www.noveltheory.com Work: www.njtheater.com
Blog: www.honestillusion.com Day Job: www.partsearch.com

"Ben Bloom" <bb****@macg.s.p.a.m.regor.com> wrote in message
news:ew**************@TK2MSFTNGP15.phx.gbl...
Hi -

I was speaking with someone who mentioned that it's possible to encode
an ascii string as hex(?) in order to fit more data into the same # of
chars. Can anyone enlighten me?

The scenario is - I've got a CSV with a field that has a 16 character
limit. I need to fit potentially 24 ASCII characters into it.

Thanks.
-Ben
--
to reply, remove .s.p.a.m. from email

Nov 16 '05 #8
James Curran <Ja*********@mvps.org> wrote:
There's a method, but it's a bit snarky....

There an encoding format code BASE64 (also known as UUEncoding in some
quarters). It take fully binary data (0-255) and converts it a set of 64
printable characters (digits, uppercase, lowercase plus two symbols + and
/). Since email messages are required to be pure printable text (due to
some ancient hardware, which are almost certainly no longer on the 'net),
all attachments are BASE64 encoded. It converts 3 binary bytes into 4
characters, so encoded blocks increase 33% in size.

So, what does this effect you? Well, as long as your "encoded" string meets
the criteria of Base64 encoding, you can "decode" it into a smaller block of
binary data. 4 characters will become 3 bytes, or in your case, 20
characters can become 15 bytes.


Yes... it does mean you can only have 63 distinct characters though
(IIRC, '=' is used for end padding, which you also need to work out).

It also doesn't get 24 characters down to 16 :( Possibly a combination
of that (if it all applies appropriately) with something clever to do
with the 8 digits (which can be represented as a 4 byte integer, which
should help) could help.

It all sounds like something which should be redesigned rather than
munged like this though...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Patrick Questembert | last post by:
I am developping with Visual Studio 2003 + C# + MySQL 4.1 and the OleDb components. My problem is that a stament using the COMPRESS() function seems to work or not depending on the data ... Here...
17
by: DraguVaso | last post by:
Hi, For my SMS-application I need to be able to send characters with accents (like é and à). But this doesn't seem to work in Text Mode, so i will need to do it in PDU Mode. Does anybody has...
5
by: Lenard Gunda | last post by:
hi! I have the following problem. I need to read data from a TXT file our company receives. I would use StreamReader, and process it line by line using ReadLine, however, the following problem...
18
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found...
31
by: Claude Yih | last post by:
Hi, everyone. I got a question. How can I identify whether a file is a binary file or an ascii text file? For instance, I wrote a piece of code and saved as "Test.c". I knew it was an ascii text...
6
by: Champika Nirosh | last post by:
Hi, I have two machine where I needed to have a extended TCP/IP protocol to make the link between the two machines Mean,I need to write a application that compress every data the machine send...
6
by: Adriano | last post by:
Can anyone recommend a simple way to compress/decomress a String in .NET 1.1 ? I have a random string of 70 characters, the output from a DES3 encryption, and I wish to reduce the lengh of it, ...
6
by: =?Utf-8?B?V2F5bmUgR29yZQ==?= | last post by:
Hi I want to achive 2 things. First I would like to compress an existing file on my harddrive. I can easily find out if a file is compressed or not by using "File.GetAttributes". But no matter...
5
by: zgh1970 | last post by:
Hi, Friends, default DB2 compression library. I am wondering if this option will have any new restriction on RESTORE in the following. (Can I used that backup imsage for restore at the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.