By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,405 Members | 1,691 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,405 IT Pros & Developers. It's quick & easy.

GB2312 encoding in PHP

P: n/a
I have language text stored as variables in text files, which are
'included' by my PHP scripts (is there a better way?). However, I seem
to have a problem with the simplified chinese GB2312 encoding format.

I thought that most foreign encoding mechanisms would avoid the use of
the quotation mark - however, I saved something out as GB2312 and now
effectively get parse errors (due to premature quotes which appear to
form parts of the Chinese characters themselves).

A few other websites I tested though don't seem to have a problem
sending me mail in GB2312 though ... so they must have somehow managed.

Any ideas anyone?

Thanks

Terence

Jul 17 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
te************@gmail.com wrote:
I have language text stored as variables in text files, which are
'included' by my PHP scripts (is there a better way?). However, I seem
to have a problem with the simplified chinese GB2312 encoding format.
I am not sure if you by 'included' mean that you use
"include('somefile')", if so then "readfile" (or "file_get_contents")
might be what you want to use.

If the text-files dont or ought not to contain any php-code, then
"include" shouldn't be used as the file will be evaluated, which
obviously can be a bad thing if it is not intended.

I thought that most foreign encoding mechanisms would avoid the use of
the quotation mark - however, I saved something out as GB2312 and now
effectively get parse errors (due to premature quotes which appear to
form parts of the Chinese characters themselves).

[snip]

If php gives you parse-errors, then the above (include) might be the
problem. If you browser complain about the returned html, then use of
"htmlentities()" could probably fix it for you.
/Bent
Jul 17 '05 #2

P: n/a
Thanks for that.

I included the file using "include" - I did this because the file
contained several variables relevant to the mail i'm using it to send
out (so $from = "XXX" ; $body="Text body") ... the problem is when I
have Chinese text in the $body variable which happens to include
quotation marks in its composition.

I suppose I can try using file_get_contents instead, but that would
mean moving everything else out of the file and defining them
elsewhere. If it solves my problem I suppose that's acceptable - so
i'll give it a go.

Thanks for the suggestion.

Terence

Jul 17 '05 #3

P: n/a
IIRC the GB2312 doesn't use the 0-127 range for Chinese characters.
Both bytes of a two-byte character would have their most-significant
bit set. The quotation mark itself can be used in Chinese text though.

Jul 17 '05 #4

P: n/a
I didn't think that GB2312 included quotation marks within the chinese
characters either - as I know that BIG5 does not have this problem, and
nor does UTF-8. However, what I tried doing was writing some text,
saving it out as GB2312 - and then having PHP process this. When PHP
complained I then opened the Chinese text file in standard ISO-8859-1
mode and I could visibly see a " included amongst the other garbled
characters in several locations. This is not visible when viewing the
Chinese - suggesting it was forming part of the characters themselves.

If what you say, that the 0-127 range is not used is true (which I can
believe it to be since I didn't expect it to be used either) - then I
am totally puzzled as to why quotation marks are appearing within my
Chinese text.

Strange

Terence

Jul 17 '05 #5

P: n/a
It's possible that the text is conrupted. An editor might simply toss
out the character if it sees that the second byte of a two-byte
character is incorrect. I know IE does that with UTF8 text.

Curly quotation marks appear in the 128-255 range in the CP1251
codepage. It's possible that when the text was copied and pasted from
one application to another, the curly quotes were replaced by straight
ones.

Jul 17 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.