473,466 Members | 1,439 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Compress ASCII text as Hex?

Hi -

I was speaking with someone who mentioned that it's possible to encode
an ascii string as hex(?) in order to fit more data into the same # of
chars. Can anyone enlighten me?

The scenario is - I've got a CSV with a field that has a 16 character
limit. I need to fit potentially 24 ASCII characters into it.

Thanks.
-Ben
--
to reply, remove .s.p.a.m. from email
Nov 16 '05 #1
8 10968
Ben,

You can't do that unless you limit the range of characters that can be
used in the 24 character string. Without doing that, you have to accept the
full range of characters and you can't just squeeze them in there without
some loss.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Ben Bloom" <bb****@macg.s.p.a.m.regor.com> wrote in message
news:ew**************@TK2MSFTNGP15.phx.gbl...
Hi -

I was speaking with someone who mentioned that it's possible to encode an
ascii string as hex(?) in order to fit more data into the same # of chars.
Can anyone enlighten me?

The scenario is - I've got a CSV with a field that has a 16 character
limit. I need to fit potentially 24 ASCII characters into it.

Thanks.
-Ben
--
to reply, remove .s.p.a.m. from email

Nov 16 '05 #2
Thanks Nicholas,

The 24 character string is a concatenation of a number (8-10 digits, I
believe) and two other string fields. Would I have more success if I
tried to shrink the number only?

-Ben

Nicholas Paldino [.NET/C# MVP] wrote:
Ben,

You can't do that unless you limit the range of characters that can be
used in the 24 character string. Without doing that, you have to accept the
full range of characters and you can't just squeeze them in there without
some loss.

Hope this helps.

--
to reply, remove .s.p.a.m. from email
Nov 16 '05 #3
if you are using a subset of characters, try fit 2 characters into character
written to the csv file,
say for example you were only interested in the character codes from 0-127,
you could write the string "me" i.e. hex codes 6d and 65, into one character
[pseudo]
char c = 0x6d65
[/pseudo]

and write that single char to the text file,
then when you read it, you breake it up again.
hope that helps

"Ben Bloom" wrote:
Thanks Nicholas,

The 24 character string is a concatenation of a number (8-10 digits, I
believe) and two other string fields. Would I have more success if I
tried to shrink the number only?

-Ben

Nicholas Paldino [.NET/C# MVP] wrote:
Ben,

You can't do that unless you limit the range of characters that can be
used in the 24 character string. Without doing that, you have to accept the
full range of characters and you can't just squeeze them in there without
some loss.

Hope this helps.

--
to reply, remove .s.p.a.m. from email

Nov 16 '05 #4
<"=?Utf-8?B?QnJpYW4gS2VhdGluZyBFSTlGWEI=?=" <csharp at
briankeating.net>> wrote:
if you are using a subset of characters, try fit 2 characters into character
written to the csv file,
say for example you were only interested in the character codes from 0-127,
you could write the string "me" i.e. hex codes 6d and 65, into one character
[pseudo]
char c = 0x6d65
[/pseudo]

and write that single char to the text file,
then when you read it, you breake it up again.


Note that that will only work if your CSV file is written in a Unicode-
supporting encoding. There's also no absolute guarantee that it won't
end up forming invalid characters, or characters which the reader might
normalize to a different but equivalent form as far as Unicode is
concerned. I doubt that it'll be a problem, but it's worth bearing in
mind.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #5
Yes your right,
Encoding could prevent a problem but my description was slightly actually
more than slightly incorrect,
if we were limited the the 0-127 characters for the ascii table then we
would be using 7 bits to represent a character, therefore for every 7
characters we could squeeze in an extra char.
More trouble than it's worth i guess.

regards
Brian.
"Jon Skeet [C# MVP]" wrote:
<"=?Utf-8?B?QnJpYW4gS2VhdGluZyBFSTlGWEI=?=" <csharp at
briankeating.net>> wrote:
if you are using a subset of characters, try fit 2 characters into character
written to the csv file,
say for example you were only interested in the character codes from 0-127,
you could write the string "me" i.e. hex codes 6d and 65, into one character
[pseudo]
char c = 0x6d65
[/pseudo]

and write that single char to the text file,
then when you read it, you breake it up again.


Note that that will only work if your CSV file is written in a Unicode-
supporting encoding. There's also no absolute guarantee that it won't
end up forming invalid characters, or characters which the reader might
normalize to a different but equivalent form as far as Unicode is
concerned. I doubt that it'll be a problem, but it's worth bearing in
mind.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #6
<"=?Utf-8?B?QnJpYW4gS2VhdGluZyBFSTlGWEI=?=" <csharp at
briankeating.net>> wrote:
Yes your right,
Encoding could prevent a problem but my description was slightly actually
more than slightly incorrect,
if we were limited the the 0-127 characters for the ascii table then we
would be using 7 bits to represent a character, therefore for every 7
characters we could squeeze in an extra char.
More trouble than it's worth i guess.


Certainly when the only necessity is to squeeze 24 characters into 16
:)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #7
There's a method, but it's a bit snarky....

There an encoding format code BASE64 (also known as UUEncoding in some
quarters). It take fully binary data (0-255) and converts it a set of 64
printable characters (digits, uppercase, lowercase plus two symbols + and
/). Since email messages are required to be pure printable text (due to
some ancient hardware, which are almost certainly no longer on the 'net),
all attachments are BASE64 encoded. It converts 3 binary bytes into 4
characters, so encoded blocks increase 33% in size.

So, what does this effect you? Well, as long as your "encoded" string meets
the criteria of Base64 encoding, you can "decode" it into a smaller block of
binary data. 4 characters will become 3 bytes, or in your case, 20
characters can become 15 bytes.

string origString = "123456,abcdef,ghijkl"; // 20 character CSV text

string prepareText = origString.Replace(',', '+'); // Replace commas with
plus signs
byte[] compressedText = Convert.FromBase64String(prepareText);
Console.WriteLine("Length of Conpressed text = {0}", compressedText.Length);
// Save compressedText to your store.
// :
// Later read it back
string alteredText = Convert.ToBase64String(compressedText);
string finalString = alteredText.Replace('+', ',');

Console.WriteLine("Text: {0}, this {1} the same as the original",
finalString, finalString == origString ? "IS" : "IS NOT");

Running the above, I get:
Length of Conpressed text = 15
Text: 123456,abcdef,ghijkl, this IS the same as the original

--
Truth,
James Curran
[erstwhile VC++ MVP]
Home: www.noveltheory.com Work: www.njtheater.com
Blog: www.honestillusion.com Day Job: www.partsearch.com

"Ben Bloom" <bb****@macg.s.p.a.m.regor.com> wrote in message
news:ew**************@TK2MSFTNGP15.phx.gbl...
Hi -

I was speaking with someone who mentioned that it's possible to encode
an ascii string as hex(?) in order to fit more data into the same # of
chars. Can anyone enlighten me?

The scenario is - I've got a CSV with a field that has a 16 character
limit. I need to fit potentially 24 ASCII characters into it.

Thanks.
-Ben
--
to reply, remove .s.p.a.m. from email

Nov 16 '05 #8
James Curran <Ja*********@mvps.org> wrote:
There's a method, but it's a bit snarky....

There an encoding format code BASE64 (also known as UUEncoding in some
quarters). It take fully binary data (0-255) and converts it a set of 64
printable characters (digits, uppercase, lowercase plus two symbols + and
/). Since email messages are required to be pure printable text (due to
some ancient hardware, which are almost certainly no longer on the 'net),
all attachments are BASE64 encoded. It converts 3 binary bytes into 4
characters, so encoded blocks increase 33% in size.

So, what does this effect you? Well, as long as your "encoded" string meets
the criteria of Base64 encoding, you can "decode" it into a smaller block of
binary data. 4 characters will become 3 bytes, or in your case, 20
characters can become 15 bytes.


Yes... it does mean you can only have 63 distinct characters though
(IIRC, '=' is used for end padding, which you also need to work out).

It also doesn't get 24 characters down to 16 :( Possibly a combination
of that (if it all applies appropriately) with something clever to do
with the 8 digits (which can be represented as a 4 byte integer, which
should help) could help.

It all sounds like something which should be redesigned rather than
munged like this though...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Patrick Questembert | last post by:
I am developping with Visual Studio 2003 + C# + MySQL 4.1 and the OleDb components. My problem is that a stament using the COMPRESS() function seems to work or not depending on the data ... Here...
17
by: DraguVaso | last post by:
Hi, For my SMS-application I need to be able to send characters with accents (like é and à). But this doesn't seem to work in Text Mode, so i will need to do it in PDU Mode. Does anybody has...
5
by: Lenard Gunda | last post by:
hi! I have the following problem. I need to read data from a TXT file our company receives. I would use StreamReader, and process it line by line using ReadLine, however, the following problem...
18
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found...
31
by: Claude Yih | last post by:
Hi, everyone. I got a question. How can I identify whether a file is a binary file or an ascii text file? For instance, I wrote a piece of code and saved as "Test.c". I knew it was an ascii text...
6
by: Champika Nirosh | last post by:
Hi, I have two machine where I needed to have a extended TCP/IP protocol to make the link between the two machines Mean,I need to write a application that compress every data the machine send...
6
by: Adriano | last post by:
Can anyone recommend a simple way to compress/decomress a String in .NET 1.1 ? I have a random string of 70 characters, the output from a DES3 encryption, and I wish to reduce the lengh of it, ...
6
by: =?Utf-8?B?V2F5bmUgR29yZQ==?= | last post by:
Hi I want to achive 2 things. First I would like to compress an existing file on my harddrive. I can easily find out if a file is compressed or not by using "File.GetAttributes". But no matter...
5
by: zgh1970 | last post by:
Hi, Friends, default DB2 compression library. I am wondering if this option will have any new restriction on RESTORE in the following. (Can I used that backup imsage for restore at the...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.