469,337 Members | 5,884 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,337 developers. It's quick & easy.

Store in a file a web page written in chinese

Hi,
I want to read an html page written in chinese and store it in a file
having extension .aspx , I'm not sure where is the problem, I use the
following lines of code:

String sAddress = "http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http://www.etantonio.it/EN/index.aspx"
;

WebRequest req = WebRequest.Create(sAddress);
WebResponse result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 );
String sHtmlTradotto = reader.ReadToEnd();

StreamWriter writer = new StreamWriter( "prova.aspx" , false,
System.Text.Encoding.UTF8) ;

writer.Write(sHtmlTradotto);
writer.Flush();
writer.Close();

But the file produced didn't contain the chinese characters so, how
can I solve the problem???

Many Thanks in advance ...

Ing. Antonio D'Ottavio
Jul 21 '05 #1
3 1578
Antonio <et*******@libero.it> wrote:
I want to read an html page written in chinese and store it in a file
having extension .aspx , I'm not sure where is the problem, I use the
following lines of code:

String sAddress =
"http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh
&trurl=http://www.etantonio.it/EN/index.aspx"
;

WebRequest req = WebRequest.Create(sAddress);
WebResponse result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 );
String sHtmlTradotto = reader.ReadToEnd();

StreamWriter writer = new StreamWriter( "prova.aspx" , false,
System.Text.Encoding.UTF8) ;

writer.Write(sHtmlTradotto);
writer.Flush();
writer.Close();

But the file produced didn't contain the chinese characters so, how
can I solve the problem???


Are you sure that it's returning the data in UTF-8? How are you
checking whether or not the file contained Chinese characters?

I'd look in more depth myself, but using the code above, it's
complaining that the server committed an HTTP protocol violation :(

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #2
Hi,
I simply try to connect to the url

http://babelfish.altavista.com/babel.../EN/index.aspx

with internet explorer and this is the result where I can see that the
charset=UTF-8 and I can normally see chinese symbols :
<html><meta http-equiv="content-type" content="text/html;
charset=UTF-8"><base href="http://www.etantonio.it/EN/index.aspx">
<!-- removed --><meta http-equiv="Content-Type" content="text/html ;
CHARSET=UTF-8"><base href="http://www.etantonio.it/EN/index.aspx">
<!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<head>
<title>Etantonio</title>
<meta name="author" content="Antonio DOttavio">
<meta name="description" content="Etantonio Index">
<link href="Stili.css" rel="stylesheet" type="text/css">
</head>
<body>

<script language=JavaScript src="menu_array.js"
type=text/javascript></script>
<script language=JavaScript src="mmenu.js"
type=text/javascript></script>

<table width="750" height="430" border="0" cellpadding="0"
cellspacing="0" background="/images/EsserSpettatoriNonEstSerioElefante.jpg">
<tr>
<td valign="top">

<table width="90%" border="0" align="center" cellspacing="12">
<tr height="70" valign="top">
<td>&nbsp;</td>
<td width="25%" rowspan="2">
<p align="center"><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fUniversita%2findex.aspx"
class="testoMedioVerde">大学</a></p>
<p align="center"
class="testoPiccolissimoVerde">学士路线的笔记在工程学电子,
论文、研究方法和适当尊敬对起源村庄。
</p>
</td>
<td width="25%" rowspan="2">
<p align="center"><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fEconomia%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fEconomia%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fEconomia%2findex.aspx"
class="testoMedioVerde">经济</a> </p>
<p align="center"
class="testoPiccolissimoVerde">委员会、为财政社区的技术和仪器,
详尽阐述对您在,
在供选择变迁之间,
持续从1994
年个人经验的基地。</p></td>
<td width="25%">&nbsp;</td>
</tr>
<tr height="140" valign="top">
<td width="25%">
<p align="center"><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fLavoro%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fLavoro%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fLavoro%2findex.aspx"
class="testoMedioVerde">工作</a> </p>
<p align="center"
class="testoPiccolissimoVerde">简历,
图象证实对您,
和一些仪器和参考为工作机会查寻。
</p>
</td>
<td width="25%">
<p align="center" ><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fWeb%2fGifAnimate%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fWeb%2fGifAnimate%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fWeb%2fGifAnimate%2findex.aspx"
class="testoMedioVerde">网</a> </p>
<p align="center"
class="testoPiccolissimoVerde">搜索引擎在无数GIF
赋予生命从我选择了和详尽阐述了,
随后将来网的被插入的实验。
</p>
</td>
</tr>
<tr valign="top">
<td width="25%">
<p align="center"><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fVarie%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fVarie%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fVarie%2findex.aspx"
class="testoMedioVerde">数</a> </p>
<p align="center"
class="testoPiccolissimoVerde">巨大我的利益发现这里出气孔,
艺术, 旅行,
激情以远对我的热点表的链接。
</p>
</td>
<td width="25%"> <div align="center"></div></td>
<td width="25%"> <div align="center"></div></td>
<td width="25%">
<p align="center"><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fContatti%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fContatti%2findex.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fContatti%2findex.aspx"
class="testoMedioVerde">联络</a></p>
<p align="center"
class="testoPiccolissimoVerde">这里它是可能接触对我为每必要或理事会是 通过编写形式或插入消息nel
论坛delle 想法的邮件。
</p>
</td>
</tr>
</table>

</td>
</tr>
</table>
<script>InserisciFooter();</script>
<br>
<a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fEN%2fUniversita%2findex.aspx"
class="trasparente"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fEN%2fUniversita%2findex.aspx"
class="trasparente"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fEN%2fUniversita%2findex.aspx"
class="trasparente">Universita 用 </a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fFR%2fUniversita%2findex.aspx"
class="trasparente">Universita</a>
<a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fFR%2fUniversita%2findex.aspx"
class="trasparente"></a><a
href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fEN%2fUniversita%2findex.aspx"
class="trasparente">英语
</a><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http%3a%2f%2fwww. etantonio.it%2fEN%2fFR%2fUniversita%2findex.aspx"
class="trasparente">用法语</a>
</td>
</a>
<td>
</body>
</html>

I'm trying to read and store it in a file
having extension .aspx , the result is that many characters are not
right evaluated, I use the following lines of code:

String sAddress = "http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh&trurl=http://www.etantonio.it/EN/index.aspx";

WebRequest req = WebRequest.Create(sAddress);
WebResponse result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 );
String sHtmlTradotto = reader.ReadToEnd();

StreamWriter writer = new StreamWriter( "prova.aspx" , false,
System.Text.Encoding.UTF8) ;

writer.Write(sHtmlTradotto);
writer.Flush();
writer.Close();

Can you help me to solve the problem???

Many Thanks in advance ...

Ing. Antonio D'Ottavio
Jul 21 '05 #3
hi jon
problem u r getting can be resolved
by updating one entry in machine.config file for unsafe headers

<httpWebRequest useUnsafeHeaderParsing="true" />
make this entry in under <Systems.net><Settings>
section of machine.config file
"Jon Skeet [C# MVP]" wrote:
Antonio <et*******@libero.it> wrote:
I want to read an html page written in chinese and store it in a file
having extension .aspx , I'm not sure where is the problem, I use the
following lines of code:

String sAddress =
"http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh
&trurl=http://www.etantonio.it/EN/index.aspx"
;

WebRequest req = WebRequest.Create(sAddress);
WebResponse result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 );
String sHtmlTradotto = reader.ReadToEnd();

StreamWriter writer = new StreamWriter( "prova.aspx" , false,
System.Text.Encoding.UTF8) ;

writer.Write(sHtmlTradotto);
writer.Flush();
writer.Close();

But the file produced didn't contain the chinese characters so, how
can I solve the problem???


Are you sure that it's returning the data in UTF-8? How are you
checking whether or not the file contained Chinese characters?

I'd look in more depth myself, but using the code above, it's
complaining that the server committed an HTTP protocol violation :(

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by YAN | last post: by
6 posts views Thread by Matt Hollingworth | last post: by
26 posts views Thread by Hongyi Zhao | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
1 post views Thread by Marylou17 | last post: by
1 post views Thread by Marylou17 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.