473,566 Members | 3,342 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Unicode Optimization

I have a program will really big embedded text resources.
Because of internationaliz ation I have to save the embedded text in UTF-8,
but it more then triple bigger then the original file.
The last problem is the compiled file size, the REALLY problem is the memory
amount used by the program, because of the embedded unicode file. It loads
the values of the file into some hashes.
With ASCII the program takes about 17k of RAM in runtime
With UTF-8 the program takes at least 70k (!!!) of RAM
1) How (if it is) possible to optimzie the embedded in programm?
2) How it possible at least to shrank the memory usage of the program?

TNX

--
Tamir Khason
You want dot.NET? Just ask:
"Please, www.dotnet.us "
Nov 16 '05 #1
5 2071
Tamir Khason <ta**********@t con-NOSPAM.co.il> wrote:
I have a program will really big embedded text resources.
Because of internationaliz ation I have to save the embedded text in UTF-8,
but it more then triple bigger then the original file.
The last problem is the compiled file size, the REALLY problem is the memory
amount used by the program, because of the embedded unicode file. It loads
the values of the file into some hashes.
With ASCII the program takes about 17k of RAM in runtime
With UTF-8 the program takes at least 70k (!!!) of RAM
1) How (if it is) possible to optimzie the embedded in programm?
2) How it possible at least to shrank the memory usage of the program?


70K is hardly huge. However, if you *can* store your file correctly as
ASCII, then the same text in UTF-8 should be exactly the same file, as
every ASCII character has the same encoding in UTF-8.

It's not at all clear exactly what's going on here. Could you post a
short but complete program which demonstrates the problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

My guess is that something else is going on beyond what you're aware
of.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #2
Jon ,Tnx for reply
Following the sample of such files:
BOF file1---------------------
[Title]
Some Title
[Value]
Whatever
EOF file1---------------------

BOF file2----------------------
[Title]
????? ?????
[Value]
?? ??? ????
EOF file2----------------------

BOF file3---------------------
[Title]
?????????
[Value]
? ????? ???????
EOF file3---------------------

Follow the sample code reading
with those files:
if(_fileEmbedde d==null)

{

fs = new FileStream(file Path, FileMode.Open, FileAccess.Read ,
FileShare.Read) ;

sr = new StreamReader(fs , Encoding.UTF8);//Today - Have to change encoding

}

else

{

sr = new StreamReader(_f ileEmbedded,Enc oding.UTF8);//Today - Have to change
encoding

}

Only UTF8 (while the file save as UTF 8) return correct results

While encoding the files and readers with any other encoding - this does not
works

--
Tamir Khason
You want dot.NET? Just ask:
"Please, www.dotnet.us "

"Jon Skeet [C# MVP]" <sk***@pobox.co m> wrote in message
news:MP******** *************** *@msnews.micros oft.com...
Tamir Khason <ta**********@t con-NOSPAM.co.il> wrote:
I have a program will really big embedded text resources.
Because of internationaliz ation I have to save the embedded text in
UTF-8,
but it more then triple bigger then the original file.
The last problem is the compiled file size, the REALLY problem is the
memory
amount used by the program, because of the embedded unicode file. It
loads
the values of the file into some hashes.
With ASCII the program takes about 17k of RAM in runtime
With UTF-8 the program takes at least 70k (!!!) of RAM
1) How (if it is) possible to optimzie the embedded in programm?
2) How it possible at least to shrank the memory usage of the program?


70K is hardly huge. However, if you *can* store your file correctly as
ASCII, then the same text in UTF-8 should be exactly the same file, as
every ASCII character has the same encoding in UTF-8.

It's not at all clear exactly what's going on here. Could you post a
short but complete program which demonstrates the problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

My guess is that something else is going on beyond what you're aware
of.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #3
Tamir Khason <ta**********@t con-NOSPAM.co.il> wrote:
Following the sample of such files:


<snip>

That still doesn't describe the situation adequately. Yes, if you've
saved the file as UTF-8 you obviously need to use Encoding.UTF8 to read
it. You haven't really explained why that's a problem though, or how
you've determined that it *is* a problem.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #4
The problem is following
Current situation: I'm using Unicode (UTF-8) text file as embedded resource
and use it in my application. This works, but the problem is the amount of
memory it need (because of uft-8 huge file)
After some research I tried to convert this file to ANSI encoding and the
file become x3 smaller, as well as RAM needed, BUT the program works BAD due
the encoding issue (we spoke about it in previouse thread about all unicode
strings in .NET, remember?)
I want to do any of those:
1) Optimize unicode
2) Convert it proper into ANSI and use it as ANSI from C#

Please advice


"Jon Skeet [C# MVP]" <sk***@pobox.co m> wrote in message
news:MP******** *************** *@msnews.micros oft.com...
Tamir Khason <ta**********@t con-NOSPAM.co.il> wrote:
Following the sample of such files:


<snip>

That still doesn't describe the situation adequately. Yes, if you've
saved the file as UTF-8 you obviously need to use Encoding.UTF8 to read
it. You haven't really explained why that's a problem though, or how
you've determined that it *is* a problem.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #5
Tamir Khason <ta**********@t con-NOSPAM.co.il> wrote:
The problem is following
Current situation: I'm using Unicode (UTF-8) text file as embedded resource
and use it in my application. This works, but the problem is the amount of
memory it need (because of uft-8 huge file)
After some research I tried to convert this file to ANSI encoding and the
file become x3 smaller
That's very strange in itself. If it's 3x smaller, that suggests you're
losing a lot of data. Which ANSI encoding are you using, and what kind
of proportion of the file is actually in just plain ASCII. Could you
mail me both the UTF-8 and the ANSI files?
as well as RAM needed
It should make no difference to the amount of RAM needed. By the time
you've loaded the strings into memory, they'll be in Unicode anyway.
BUT the program works BAD due
the encoding issue (we spoke about it in previouse thread about all unicode
strings in .NET, remember?)
I want to do any of those:
1) Optimize unicode
2) Convert it proper into ANSI and use it as ANSI from C#


I suspect you'll find that your ANSI file actually doesn't have nearly
as much real data in as the UTF-8 file, which is why it's using less
memory.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2696
by: gabor | last post by:
hi, today i made some tests... i tested some unicode symbols, that are above the 16bit limit (gothic:http://www.unicode.org/charts/PDF/U10330.pdf) .. i played around with iconv and so on, so at the end i created an utf8 encoded text file,
10
8504
by: Maxim Kasimov | last post by:
there are a few questions i can find answer in manual: 1. how to define which is internal encoding of python unicode strings (UTF-8, UTF-16 ...) 2. how to convert string to UCS-2 (Python 2.2.3 on freebsd4) -- Best regards, Maxim
9
2386
by: Rune | last post by:
Is it best to use double quotes and let PHP expand variables inside strings, or is it faster to do the string manipulation yourself manually? Which is quicker? 1) $insert = 'To Be'; $sentence = "$insert or not $insert. That is the question."; or
6
6598
by: John Sidney-Woollett | last post by:
Hi I need to store accented characters in a postgres (7.4) database, and access the data (mostly) using the postgres JDBC driver (from a web app). Does anyone know if: 1) Is there a performance loss using (multibyte) UNICODE vs (single byte) SQL_ASCII/LATINxxx character encoding? (In terms of extra data, and searching/sorting speeds).
4
1941
by: David Siroky | last post by:
Hi! I need to enlighten myself in Python unicode speed and implementation. My platform is AMD Athlon@1300 (x86-32), Debian, Python 2.4. First a simple example (and time results): x = "a"*50000000 real 0m0.195s
10
2198
by: Larry Hastings | last post by:
I'm an indie shareware Windows game developer. In indie shareware game development, download size is terribly important; conventional wisdom holds that--even today--your download should be 5MB or less. I'd like to use Python in my games. However, python24.dll is 1.86MB, and zips down to 877k. I can't afford to devote 1/6 of my download...
14
1580
by: willie | last post by:
(beating a dead horse) Is it too ridiculous to suggest that it'd be nice if the unicode object were to remember the encoding of the string it was decoded from? So that it's feasible to calculate the number of bytes that make up the unicode code points. # U+270C # 11100010 10011100 10001100
13
3666
by: mario | last post by:
Hello! i stumbled on this situation, that is if I decode some string, below just the empty string, using the mcbs encoding, it succeeds, but if I try to encode it back with the same encoding it surprisingly fails with a LookupError. This seems like something to be corrected? $ python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) on...
20
2323
by: Ravikiran | last post by:
Hi Friends, I wanted know about whatt is ment by zero optimization and sign optimization and its differences.... Thank you...
0
7673
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7893
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
1
7645
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7953
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6263
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5485
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3626
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2085
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
0
926
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.