using in-memory zlib deflate from c# (with max performance :-)

tombrogan3

Hi, I need to implement in-memory zlib compression in c# to replace an
old c++ app.

Pre-requisites..
1) The performance must be FAST (with source memory sizes from a few k
to a meg).
2) The output must match exactly compression generated with the c++
zlib.org code with default compression.

zlib.org c# code is as slow as hell! Not sure if I'm doing anything
wrong but it crawls - on the plus side it matches what's generated
with the c++ libraries exactly.

system.io.compression deflate is fast, as s SharpZLib, but the output
is different from the zlib.org.

Any help would be GREATLY appreciated!

Cheers,

Tom

Jun 30 '08 #1

Subscribe Post Reply

4796

Jon Skeet [C# MVP]

On Jun 30, 10:45*am, tombrog...@googlemail.com wrote:

Hi, I need to implement in-memory zlib compression in c# to replace an
old c++ app.

Pre-requisites..
1) The performance must be FAST (with source memory sizes from a few k
to a meg).
2) The output must match exactly compression generated with the c++
zlib.org code with default compression.

That second point is an odd one, and likely to give you issues. What's
the basis of that requirement? Obviously the decompressed data should
be the same, but do you really need the compressed version to be
identical?

Jon

Jun 30 '08 #2

tombrogan3

On 30 Jun, 11:14, "Jon Skeet [C# MVP]" <sk...@pobox.comwrote:

On Jun 30, 10:45*am, tombrog...@googlemail.com wrote:

Hi, I need to implement in-memory zlib compression in c# to replace an
old c++ app.

Pre-requisites..
1) The performance must be FAST (with source memory sizes from a few k
to a meg).
2) The output must match exactly compression generated with the c++
zlib.org code with default compression.

That second point is an odd one, and likely to give you issues. What's
the basis of that requirement? Obviously the decompressed data should
be the same, but do you really need the compressed version to be
identical?

Jon

Hi, unfortunately yes.

I'm compressing the data then writing it to a file (the file has an
extremely propriatary format that means I can't just compress into it
directly).

The file will then be read by another c++ process.

Obviously if both match the deflate spec then c++ will be able to read
it, but my solution will be a lot more "acceptable" if the output
files are the same for c# amd c++.

Thanks,

Tom

Jun 30 '08 #3

Jon Skeet [C# MVP]

On Jun 30, 11:20*am, tombrog...@googlemail.com wrote:

That second point is an odd one, and likely to give you issues. What's
the basis of that requirement? Obviously the decompressed data should
be the same, but do you really need the compressed version to be
identical?

Hi, unfortunately yes.

I'm compressing the data then writing it to a file (the file has an
extremely propriatary format that means I can't just compress into it
directly).

The file will then be read by another c++ process.

Obviously if both match the deflate spec then c++ will be able to read
it, but my solution will be a lot more "acceptable" if the output
files are the same for c# amd c++.

In that case you may find yourself digging into the zlib.org code in
the normal profiling kind of way. It's unlikely that other compressors
will produce *exactly* the same output, although you can try tweaking
options (window sizes etc) to see if that will help.

Personally I'd try to push back on the "identical output" requirement,
satisfying myself instead with a comprehensive sets of tests for the
"compress and then uncompress" cycle. I realise that may be futile in
some situations, but it may be worth pointing out that if the C++ zlib
code is ever patched that may well change the output in a harmless
manner too.

Jon

Jun 30 '08 #4

tombrogan3

On 30 Jun, 11:45, "Jon Skeet [C# MVP]" <sk...@pobox.comwrote:

On Jun 30, 11:20*am, tombrog...@googlemail.com wrote:

That second point is an odd one, and likely to give you issues. What's
the basis of that requirement? Obviously the decompressed data should
be the same, but do you really need the compressed version to be
identical?

Hi, unfortunately yes.

I'm compressing the data then writing it to a file (the file has an
extremely propriatary format that means I can't just compress into it
directly).

The file will then be read by another c++ process.

Obviously if both match the deflate spec then c++ will be able to read
it, but my solution will be a lot more "acceptable" if the output
files are the same for c# amd c++.

In that case you may find yourself digging into the zlib.org code in
the normal profiling kind of way. It's unlikely that other compressors
will produce *exactly* the same output, although you can try tweaking
options (window sizes etc) to see if that will help.

Personally I'd try to push back on the "identical output" requirement,
satisfying myself instead with a comprehensive sets of tests for the
"compress and then uncompress" cycle. I realise that may be futile in
some situations, but it may be worth pointing out that if the C++ zlib
code is ever patched that may well change the output in a harmless
manner too.

Jon- Hide quoted text -

- Show quoted text -

Cheers Jon, I'll do that.

You don't know any way to get maximum performance do you?

I'm having to iterate through many structures (up to 1 million),
compressing them one at a time, with a source data size ranging from a
few kbytes to a meg.

Do you think threading would help? (it will run on multi processor
machines).

Thanks,

Tom

Jun 30 '08 #5

Jon Skeet [C# MVP]

On Jun 30, 12:08*pm, tombrog...@googlemail.com wrote:

Personally I'd try to push back on the "identical output" requirement,
satisfying myself instead with a comprehensive sets of tests for the
"compress and then uncompress" cycle. I realise that may be futile in
some situations, but it may be worth pointing out that if the C++ zlib
code is ever patched that may well change the output in a harmless
manner too.

- Show quoted text -

Cheers Jon, I'll do that.

You don't know any way to get maximum performance do you?

Find a bottleneck, squish it. Lather, rinse repeat :)

The exact details of squishing the bottleneck depend on the kind of
bottleneck, but basically profiling is your friend. Don't expect a
profiler to necessarily give you accurate results - the various
techniques used by different profilers always skew results, but you
can still use them a lot to help. (Basically you need to make sure
you've got a benchmark which runs in release mode, not under a
profiler, to see the *actual* improvements gained by making changes
suggested by the profiler.)

I'm having to iterate through many structures (up to 1 million),
compressing them one at a time, with a source data size ranging from a
few kbytes to a meg.

Do you think threading would help? (it will run on multi processor
machines).

Threading should help in that case, if you've got a naturally parallel
system - if you can compress two data sources independently, without
caring about which ends up being written first, for instance. If it's
not naturally parallel it may be harder, but still feasible.

If you're not close to release and don't mind using beta software,
Parallel Extensions makes life a lot simpler in my experience.

Jon

Jun 30 '08 #6

Similar topics

Keeping an Array in Memory

by: Andrew | last post by:

I'm afraid I don't know PHP well enough to figure this out. What I would like is to keep an array in memory so that it doesn't have to be reloaded each time a .php script is run. Is this...

PHP

creating a objekt with an array linear in memory

by: cody | last post by:

i basically want to create an object which contains an array (the last element of the class). the size of the array is determined when the object is created. for performance reasons (avoiding...

C / C++

compression algorithm in memory

by: Howie | last post by:

Hi, does someone know a simple algorithm (Huffman, LZW etc.) for compression in memory with ANSI - C++ or STL-strings ? An implemention with zlib seems a little bit heavy for me. Thanks,...

C / C++

How can I find what tables are loaded in memory ?

by: csomberg | last post by:

I'm doing some performance reviews and wish to know what tables SQL has pinned in memory and which ones have are loaded through usage ... Is there a way ? Thanks, Craig

Microsoft SQL Server

How to keep XSLs in Memory?

by: Alexis | last post by:

Hello, I'm working on a project that uses over a hundred XLSs for transforming xml documents. The project consists of several webservices (IIS) calling a few dlls. This dlls make the business...

.NET Framework

Making stored procedures resident in memory

by: Rhino | last post by:

I am trying to determine the behaviour of stored procedures in DB2 V8.2.x in Windows/Unix/Linux and how I can control that behaviour. Some documentation in the manuals is confusing the issue...

DB2 Database

Put data to DataGrid in memory

by: Sara T. | last post by:

Can I add some data to data grid control on the fly by not connecting to database ? I mean I need to put data to data grid control on the memory, I need to use it ro manage page such as next and...

ASP.NET

are arrays contiguous in memory?

by: Peteroid | last post by:

I looked at the addresses in an 'array<>' during debug and noticed that the addresses were contiguous. Is this guaranteed, or just something it does if it can? PS = VS C++.NET 2005 Express...

.NET Framework

Storing file information in memory

by: deciacco | last post by:

I'm writing a command line utility to move some files. I'm dealing with thousands of files and I was wondering if anyone had any suggestions. This is what I have currently: $arrayVirtualFile =...

PHP

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server