Hi,
I need to convert XML files from Japanese encoding to UTF-8.
I was using the following code:
using ( FileStream fs = File.OpenRead(fromFile) )
{
int fileSize = (int)fs.Length;
int buffer = fileSize;
byte[] b = new byte[buffer];
using(StreamWriter sw = new StreamWriter(toFile, true, toEnc))
{
while (fs.Read(b,0,buffer) 0)
{
byte[] utf8Bytes = Encoding.Convert(fromEnc, toEnc, b);
// Convert the new byte[] into a char[] and then into a string.
char[] utf8Chars = new char[toEnc.GetCharCount(utf8Bytes, 0,
utf8Bytes.Length)];
toEnc.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars,
0);
string utfString = new string(utf8Chars);
sw.Write(replaceXmlEncodingHeader(utfString, fromEncHeader
,toEncHeader));
}
}
}
Everything worked fine until we get 100MB file - I got OutOfMemory
exception.
I've tried to read it by pieces :
if (fileSize >30000000)
{
buffer = 1024;
}
but then it fills the end of the buffer with other bytes (lets say last
chunk 900 bytes - it adds 124 bytes from somewhere else) -so my
converted xml is not well formed.
Please help.
Thanks in advance.
Regards 5 3883
Your code to read in chuncks isn't correct. You are assuming that the
buffer is filled completely, when in reality, it is not. Chances are, you
are adding the 124 bytes yourself, instead of writing only what is read.
You might also want to consider some other mechanism for storing your
data other than XML for these files. At 100MB, its going to be very
difficult to process this file for updates if the time comes.
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com
"bbb" <bb*****@gmail.comwrote in message
news:11**********************@p79g2000cwp.googlegr oups.com...
Hi,
I need to convert XML files from Japanese encoding to UTF-8.
I was using the following code:
using ( FileStream fs = File.OpenRead(fromFile) )
{
int fileSize = (int)fs.Length;
int buffer = fileSize;
byte[] b = new byte[buffer];
using(StreamWriter sw = new StreamWriter(toFile, true, toEnc))
{
while (fs.Read(b,0,buffer) 0)
{
byte[] utf8Bytes = Encoding.Convert(fromEnc, toEnc, b);
// Convert the new byte[] into a char[] and then into a string.
char[] utf8Chars = new char[toEnc.GetCharCount(utf8Bytes, 0,
utf8Bytes.Length)];
toEnc.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars,
0);
string utfString = new string(utf8Chars);
sw.Write(replaceXmlEncodingHeader(utfString, fromEncHeader
,toEncHeader));
}
}
}
Everything worked fine until we get 100MB file - I got OutOfMemory
exception.
I've tried to read it by pieces :
if (fileSize >30000000)
{
buffer = 1024;
}
but then it fills the end of the buffer with other bytes (lets say last
chunk 900 bytes - it adds 124 bytes from somewhere else) -so my
converted xml is not well formed.
Please help.
Thanks in advance.
Regards
Nicholas,
Thanks for reply.
Unfortunately I cannot choose the format ( XML ) - that's given.
100MB files appear once in a while but they need to be processed.
I understand, that's I'm doing it wrong way.
Can you show me please how to do it correct.
Thanks in advance.
Regards
Nicholas Paldino [.NET/C# MVP] wrote:
Your code to read in chuncks isn't correct. You are assuming that the
buffer is filled completely, when in reality, it is not. Chances are, you
are adding the 124 bytes yourself, instead of writing only what is read.
You might also want to consider some other mechanism for storing your
data other than XML for these files. At 100MB, its going to be very
difficult to process this file for updates if the time comes.
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com
"bbb" <bb*****@gmail.comwrote in message
news:11**********************@p79g2000cwp.googlegr oups.com...
Hi,
I need to convert XML files from Japanese encoding to UTF-8.
I was using the following code:
using ( FileStream fs = File.OpenRead(fromFile) )
{
int fileSize = (int)fs.Length;
int buffer = fileSize;
byte[] b = new byte[buffer];
using(StreamWriter sw = new StreamWriter(toFile, true, toEnc))
{
while (fs.Read(b,0,buffer) 0)
{
byte[] utf8Bytes = Encoding.Convert(fromEnc, toEnc, b);
// Convert the new byte[] into a char[] and then into a string.
char[] utf8Chars = new char[toEnc.GetCharCount(utf8Bytes, 0,
utf8Bytes.Length)];
toEnc.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars,
0);
string utfString = new string(utf8Chars);
sw.Write(replaceXmlEncodingHeader(utfString, fromEncHeader
,toEncHeader));
}
}
}
Everything worked fine until we get 100MB file - I got OutOfMemory
exception.
I've tried to read it by pieces :
if (fileSize >30000000)
{
buffer = 1024;
}
but then it fills the end of the buffer with other bytes (lets say last
chunk 900 bytes - it adds 124 bytes from somewhere else) -so my
converted xml is not well formed.
Please help.
Thanks in advance.
Regards
bbb,
You aren't checking the return value to the call to read. That value
tells you how many bytes were read into the buffer. Subsequently, you
should only be trying to convert those number of bytes, not the whole
buffer.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com
"bbb" <bb*****@gmail.comwrote in message
news:11*********************@75g2000cwc.googlegrou ps.com...
Nicholas,
Thanks for reply.
Unfortunately I cannot choose the format ( XML ) - that's given.
100MB files appear once in a while but they need to be processed.
I understand, that's I'm doing it wrong way.
Can you show me please how to do it correct.
Thanks in advance.
Regards
Nicholas Paldino [.NET/C# MVP] wrote:
>Your code to read in chuncks isn't correct. You are assuming that the buffer is filled completely, when in reality, it is not. Chances are, you are adding the 124 bytes yourself, instead of writing only what is read.
You might also want to consider some other mechanism for storing your data other than XML for these files. At 100MB, its going to be very difficult to process this file for updates if the time comes.
Hope this helps.
-- - Nicholas Paldino [.NET/C# MVP] - mv*@spam.guard.caspershouse.com
"bbb" <bb*****@gmail.comwrote in message news:11**********************@p79g2000cwp.googleg roups.com...
Hi,
I need to convert XML files from Japanese encoding to UTF-8.
I was using the following code:
using ( FileStream fs = File.OpenRead(fromFile) )
{
int fileSize = (int)fs.Length;
int buffer = fileSize;
byte[] b = new byte[buffer];
using(StreamWriter sw = new StreamWriter(toFile, true, toEnc))
{
while (fs.Read(b,0,buffer) 0)
{
byte[] utf8Bytes = Encoding.Convert(fromEnc, toEnc, b);
// Convert the new byte[] into a char[] and then into a string.
char[] utf8Chars = new char[toEnc.GetCharCount(utf8Bytes, 0,
utf8Bytes.Length)];
toEnc.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars,
0);
string utfString = new string(utf8Chars);
sw.Write(replaceXmlEncodingHeader(utfString, fromEncHeader
,toEncHeader));
}
}
}
Everything worked fine until we get 100MB file - I got OutOfMemory
exception.
I've tried to read it by pieces :
if (fileSize >30000000)
{
buffer = 1024;
}
but then it fills the end of the buffer with other bytes (lets say last
chunk 900 bytes - it adds 124 bytes from somewhere else) -so my
converted xml is not well formed.
Please help.
Thanks in advance.
Regards
int bytesread = fs.Read(b,0,buffer)
bytesread != 1024
bytesread == 900
then you process the entire array ...
byte[] utf8Bytes = Encoding.Convert(fromEnc, toEnc, b);
Just make this only process bytesread bytes of the buffer. http://msdn.microsoft.com/library/de...vertTopic1.asp
Should handle this for you ending up with ..
byte[] utf8Bytes = Encoding.Convert(fromEnc, toEnc, b, 0, bytesread);
Cheers,
Greg Young
MVP - C# http://codebetter.com/blogs/gregyoung
"bbb" <bb*****@gmail.comwrote in message
news:11*********************@75g2000cwc.googlegrou ps.com...
Nicholas,
Thanks for reply.
Unfortunately I cannot choose the format ( XML ) - that's given.
100MB files appear once in a while but they need to be processed.
I understand, that's I'm doing it wrong way.
Can you show me please how to do it correct.
Thanks in advance.
Regards
Nicholas Paldino [.NET/C# MVP] wrote:
>Your code to read in chuncks isn't correct. You are assuming that the buffer is filled completely, when in reality, it is not. Chances are, you are adding the 124 bytes yourself, instead of writing only what is read.
You might also want to consider some other mechanism for storing your data other than XML for these files. At 100MB, its going to be very difficult to process this file for updates if the time comes.
Hope this helps.
-- - Nicholas Paldino [.NET/C# MVP] - mv*@spam.guard.caspershouse.com
"bbb" <bb*****@gmail.comwrote in message news:11**********************@p79g2000cwp.googleg roups.com...
Hi,
I need to convert XML files from Japanese encoding to UTF-8.
I was using the following code:
using ( FileStream fs = File.OpenRead(fromFile) )
{
int fileSize = (int)fs.Length;
int buffer = fileSize;
byte[] b = new byte[buffer];
using(StreamWriter sw = new StreamWriter(toFile, true, toEnc))
{
while (fs.Read(b,0,buffer) 0)
{
byte[] utf8Bytes = Encoding.Convert(fromEnc, toEnc, b);
// Convert the new byte[] into a char[] and then into a string.
char[] utf8Chars = new char[toEnc.GetCharCount(utf8Bytes, 0,
utf8Bytes.Length)];
toEnc.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars,
0);
string utfString = new string(utf8Chars);
sw.Write(replaceXmlEncodingHeader(utfString, fromEncHeader
,toEncHeader));
}
}
}
Everything worked fine until we get 100MB file - I got OutOfMemory
exception.
I've tried to read it by pieces :
if (fileSize >30000000)
{
buffer = 1024;
}
but then it fills the end of the buffer with other bytes (lets say last
chunk 900 bytes - it adds 124 bytes from somewhere else) -so my
converted xml is not well formed.
Please help.
Thanks in advance.
Regards
Thank you very much for your help.
It works perfect.
Regards,
Greg Young wrote:
int bytesread = fs.Read(b,0,buffer)
bytesread != 1024
bytesread == 900
then you process the entire array ...
byte[] utf8Bytes = Encoding.Convert(fromEnc, toEnc, b);
Just make this only process bytesread bytes of the buffer. http://msdn.microsoft.com/library/de...vertTopic1.asp
Should handle this for you ending up with ..
byte[] utf8Bytes = Encoding.Convert(fromEnc, toEnc, b, 0, bytesread);
Cheers,
Greg Young
MVP - C# http://codebetter.com/blogs/gregyoung
"bbb" <bb*****@gmail.comwrote in message
news:11*********************@75g2000cwc.googlegrou ps.com...
Nicholas,
Thanks for reply.
Unfortunately I cannot choose the format ( XML ) - that's given.
100MB files appear once in a while but they need to be processed.
I understand, that's I'm doing it wrong way.
Can you show me please how to do it correct.
Thanks in advance.
Regards
Nicholas Paldino [.NET/C# MVP] wrote:
Your code to read in chuncks isn't correct. You are assuming that the
buffer is filled completely, when in reality, it is not. Chances are,
you
are adding the 124 bytes yourself, instead of writing only what is read.
You might also want to consider some other mechanism for storing your
data other than XML for these files. At 100MB, its going to be very
difficult to process this file for updates if the time comes.
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com
"bbb" <bb*****@gmail.comwrote in message
news:11**********************@p79g2000cwp.googlegr oups.com...
Hi,
I need to convert XML files from Japanese encoding to UTF-8.
I was using the following code:
using ( FileStream fs = File.OpenRead(fromFile) )
{
int fileSize = (int)fs.Length;
int buffer = fileSize;
byte[] b = new byte[buffer];
using(StreamWriter sw = new StreamWriter(toFile, true, toEnc))
{
while (fs.Read(b,0,buffer) 0)
{
byte[] utf8Bytes = Encoding.Convert(fromEnc, toEnc, b);
// Convert the new byte[] into a char[] and then into a string.
char[] utf8Chars = new char[toEnc.GetCharCount(utf8Bytes, 0,
utf8Bytes.Length)];
toEnc.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars,
0);
string utfString = new string(utf8Chars);
sw.Write(replaceXmlEncodingHeader(utfString, fromEncHeader
,toEncHeader));
}
}
}
Everything worked fine until we get 100MB file - I got OutOfMemory
exception.
I've tried to read it by pieces :
if (fileSize >30000000)
{
buffer = 1024;
}
but then it fills the end of the buffer with other bytes (lets say last
chunk 900 bytes - it adds 124 bytes from somewhere else) -so my
converted xml is not well formed.
Please help.
Thanks in advance.
Regards
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: ohaya |
last post by:
Hi,
I'm a real newbie, but have been asked to try to fix a problem in one of
our JSP pages that is suppose to read in a text file and display it.
From my testing thus far, it appears this page...
|
by: DJTB |
last post by:
zodb-dev@zope.org]
Hi,
I'm having problems storing large amounts of objects in a ZODB.
After committing changes to the database, elements are not cleared from
memory. Since the number of...
|
by: pruebauno |
last post by:
Hello all,
I am having issues compiling Python with large file support. I tried
forcing the configure script to add it but then it bombs in the make
process. Any help will be appreciated.
...
|
by: Joseph |
last post by:
Hi,
I'm having bit of questions on recursive pointer. I have following
code that supports upto 8K files but when i do a file like 12K i get a
segment fault. I Know it is in this line of code. ...
|
by: Charlie |
last post by:
Dear all,
I am currently writting a trace analyzer in C++.
It always fails to open a very large input file (3.7Gb).
I tried on a simple program, same thing happens:...
|
by: Thomas Due |
last post by:
Hi,
I am writing an ASP.NET project where I allow users to upload files to
the server. I have changed to web.config to allow a total file size of
100MB. My problem is that if the total file size...
|
by: David |
last post by:
Hello.
I can't upload large file with HtmlInputFile control:(
Is there any file size limitation in HtmlInputFile control?
If yes how can I upload to server large size file?
Than you.
|
by: ZSP747 |
last post by:
How can I get the encode of a txt file and convert it into UTF-8?
I just want to find a class can do this in a simple way.
And if I want to handled a UTF-8 string which class should it use?
Can...
|
by: Jared Wiltshire |
last post by:
I'm trying to convert a wstring (actually a BSTR) to UTF-8.
This is what I've currently got:
size_t arraySize;
setlocale(LC_CTYPE,"C-UTF-8");
arraySize = wcstombs(NULL, wstr, 0);
char...
|
by: robert |
last post by:
Somebody who uses my app gets a error :
os.stat('/path/filename')
OSError: Value too large for defined data type:
'/path/filename'
on a big file >4GB
( Python 2.4.4 / Linux )
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |