473,320 Members | 1,600 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

GB2312 encoding in PHP

I have language text stored as variables in text files, which are
'included' by my PHP scripts (is there a better way?). However, I seem
to have a problem with the simplified chinese GB2312 encoding format.

I thought that most foreign encoding mechanisms would avoid the use of
the quotation mark - however, I saved something out as GB2312 and now
effectively get parse errors (due to premature quotes which appear to
form parts of the Chinese characters themselves).

A few other websites I tested though don't seem to have a problem
sending me mail in GB2312 though ... so they must have somehow managed.

Any ideas anyone?

Thanks

Terence

Jul 17 '05 #1
5 3402
te************@gmail.com wrote:
I have language text stored as variables in text files, which are
'included' by my PHP scripts (is there a better way?). However, I seem
to have a problem with the simplified chinese GB2312 encoding format.
I am not sure if you by 'included' mean that you use
"include('somefile')", if so then "readfile" (or "file_get_contents")
might be what you want to use.

If the text-files dont or ought not to contain any php-code, then
"include" shouldn't be used as the file will be evaluated, which
obviously can be a bad thing if it is not intended.

I thought that most foreign encoding mechanisms would avoid the use of
the quotation mark - however, I saved something out as GB2312 and now
effectively get parse errors (due to premature quotes which appear to
form parts of the Chinese characters themselves).

[snip]

If php gives you parse-errors, then the above (include) might be the
problem. If you browser complain about the returned html, then use of
"htmlentities()" could probably fix it for you.
/Bent
Jul 17 '05 #2
Thanks for that.

I included the file using "include" - I did this because the file
contained several variables relevant to the mail i'm using it to send
out (so $from = "XXX" ; $body="Text body") ... the problem is when I
have Chinese text in the $body variable which happens to include
quotation marks in its composition.

I suppose I can try using file_get_contents instead, but that would
mean moving everything else out of the file and defining them
elsewhere. If it solves my problem I suppose that's acceptable - so
i'll give it a go.

Thanks for the suggestion.

Terence

Jul 17 '05 #3
IIRC the GB2312 doesn't use the 0-127 range for Chinese characters.
Both bytes of a two-byte character would have their most-significant
bit set. The quotation mark itself can be used in Chinese text though.

Jul 17 '05 #4
I didn't think that GB2312 included quotation marks within the chinese
characters either - as I know that BIG5 does not have this problem, and
nor does UTF-8. However, what I tried doing was writing some text,
saving it out as GB2312 - and then having PHP process this. When PHP
complained I then opened the Chinese text file in standard ISO-8859-1
mode and I could visibly see a " included amongst the other garbled
characters in several locations. This is not visible when viewing the
Chinese - suggesting it was forming part of the characters themselves.

If what you say, that the 0-127 range is not used is true (which I can
believe it to be since I didn't expect it to be used either) - then I
am totally puzzled as to why quotation marks are appearing within my
Chinese text.

Strange

Terence

Jul 17 '05 #5
It's possible that the text is conrupted. An editor might simply toss
out the character if it sees that the second byte of a two-byte
character is incorrect. I know IE does that with UTF8 text.

Curly quotation marks appear in the 128-255 range in the CP1251
codepage. It's possible that when the text was copied and pasted from
one application to another, the curly quotes were replaced by straight
ones.

Jul 17 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: steven | last post by:
When I specify an source encoding such as: # -*- coding: GBK -*- or # -*- coding: GB2312 -*- as the first line of source, I got the following error: SyntaxError: 'unknown encoding: GBK'
10
by: Christopher H. Laco | last post by:
Long story longer. I need to get web user input into a backend system that a) only grocks single byte encoding, b) expectes the data transer to be 1 bytes = 1 character, and c) uses the HP Roman-6...
8
by: Demon News | last post by:
I'm trying to do a transform (Using XmlTransform class in c#) and in the Transform I'm specifying the the output xsl below: <xsl:output method="xml" encoding="UTF-8" indent="no"/> the...
4
by: fitsch | last post by:
Hi, I am trying to write a generic RSS/Atom/OPML feed client. The problem is, that those xml feeds may have different encodings: - <?xml version="1.0" encoding="ISO-8859-1" ?>... - <?xml...
0
by: Chris McDonough | last post by:
ElementTree's XML serialization routine implied by tree._write(file, node, encoding, namespaces looks like this (elided): def _write(self, file, node, encoding, namespaces): # write XML to file...
7
by: wood0000 | last post by:
Hi, I know the code of a character is -12590, do you know how to convert to the character in ASP (It is a Chinese character in gb2312 format)? Or, do you know how to convert this -12590 to its...
1
by: ujjwaltrivedi | last post by:
Hey guys, Can anyone tell me how to create a text file with Unicode Encoding. In am using FileStream Finalfile = new FileStream("finalfile.txt", FileMode.Append, FileAccess.Write); ...
1
by: peterjo | last post by:
Hi All, We have a file thats contain english and chinese data(in GB2312 format). I need to convert chinese GB2312 data into BIG5 format. Could someone please let me know, how can we do it...
0
by: Sadie | last post by:
Hi, I am trying to read in a url which is encoded in GB2312 (http:// top.baidu.com/winkvane.html). I have tried other pages with this encoding and each time I get a connection time out error. ...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.