473,241 Members | 1,601 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,241 software developers and data experts.

an idiot question: if ord() returns a value between 0 and 127, then the character is ASCII?


I know I'm missing something obvious, but I looked hard at this page
and did not see the format of the return specified:

http://us3.php.net/manual/en/function.ord.php
From the limited example I assume it is the decimal (not hex or binary)

value of the character being returned?

So long as the returned value is between 0 and 127, I can treat it as
ascii?

If, for some reason, I had to restrict my output to ascii, then any
time I encountered a value outside of 0 to 127, then I'd have to
replace it with something in range?

Despite all the terrific help I've gotten on comp.lang.php regarding
this issue, my RSS feeds continue to fail when users write weblog posts
in Microsoft Word and then copy and paste their post to a form and
input that. So I'm resorting to severe measures. I don't care if I end
up with a lot of garbage characters, that is fine, I just want the RSS
feed to validate.

Aug 18 '05 #1
5 2433
lk******@geocities.com wrote:

: I know I'm missing something obvious, but I looked hard at this page
: and did not see the format of the return specified:

: http://us3.php.net/manual/en/function.ord.php

: >From the limited example I assume it is the decimal (not hex or binary)
: value of the character being returned?

The value returned is a number, it has no format unless you try to use it
as a string, then php would have to decide how to format the value.

: So long as the returned value is between 0 and 127, I can treat it as
: ascii?

As ascii yes, but as a character no.

ord() returns the numeric value of a byte, not the byte itself.

You can use the numeric value to do compares, but if you try to put the
numeric value into a string then you get the formatted number representing
the value, not the original byte.

If you want to get the original byte then you use chr() to convert the
numeric value into a byte that php can insert as-is into a string.

: If, for some reason, I had to restrict my output to ascii, then any
: time I encountered a value outside of 0 to 127, then I'd have to
: replace it with something in range?

Sure, I wouldn't use those words, but I guess you could say that.
: Despite all the terrific help I've gotten on comp.lang.php regarding
: this issue, my RSS feeds continue to fail when users write weblog posts
: in Microsoft Word and then copy and paste their post to a form and
: input that. So I'm resorting to severe measures. I don't care if I end
: up with a lot of garbage characters, that is fine, I just want the RSS
: feed to validate.

You need to do more than check ascii, you need to check for printable
characters. Binary data can contain ascii control codes, and those are
not allowed in xml.

Without checking all the details of the specs, a quick start would be to
check each character, and perhaps replace it with a . or ? when ever it
was below the value of a space char and greater than equal to 127
(decimal)
# untested
$len = strlen($the_input_data)
$space = ' ';
for ($i = 0; $i<$len; $i++)
{
$ch = substr($the_input_data,$i,1);
$value = ord($ch);
if ( $value >= ord($space) and $value < 127 )
{
echo chr($value);
}else
{
echo '?';
}
}
php has lots of functions that would do various part of this, and
perhaps even the entire thing, so personally I would expect to do this
without any of the above, but on the other hand, I know I sometimes need
to do it myself before I realize and properly appreciate what the
existing functions would be doing for me if I used them, so I show the
above.

--

This space not for rent.
Aug 18 '05 #2
On 18/08/2005 18:33, lk******@geocities.com wrote:
I know I'm missing something obvious, but I looked hard at [the ord()
function] and did not see the format of the return specified:
A number between 0 and 255. This is the range of characters represented
in PHP strings, as mentioned in the first paragraph of the description
of the string type.

[snip]
So long as the returned value is between 0 and 127, I can treat it as
ascii?
Not necessarily. It depends upon the origin of the string, and whether
that string is encoded.

Some encoding schemes have a direct correspondence between themselves
and ASCII in the 0 to 127 range, but not all. For instance, bytes in a
UTF-8 string may or may not map to 7-bit ASCII, whereas a UTF-16 string
will never do so (each two-byte (16-bit) word combines to form one code
point value).
If, for some reason, I had to restrict my output to ascii, then any
time I encountered a value outside of 0 to 127, then I'd have to
replace it with something in range?
That would qualify as restricting the output, but I wouldn't say that
was sensible. On the face of it, you'd have no idea what you're
converting, nor what the consequences will be should you take that action.

Perhaps you should explain the background of this issue. What sort of
data is being received that's causing these problems? Characters from a
foreign language? Bizarre characters that seem out-of-place amongst
something entirely recognisable (a paragraph of English, for example)?
[...] my RSS feeds continue to fail when users write weblog posts
in Microsoft Word and then copy and paste their post to a form and
input that. [...]
So exactly what's being sent that's invalid? Perhaps "smart" quotes that
haven't been converted to their Unicode counterparts (the characters
represented by the entities, &ldquo; and &rdquo;)?
I don't care if I end up with a lot of garbage characters, that is
fine, I just want the RSS feed to validate.


I would be more worried about the data. Validity might play a part in
that, but it's not the be-all and end-all. Indeed, well-formedness is
certainly critical with regards to XML, but what use is valid data if
it's complete garbage?

Mike

--
Michael Winter
Prefix subject with [News] before replying by e-mail.
Aug 18 '05 #3
> That would qualify as restricting the output, but I wouldn't say that
was sensible. On the face of it, you'd have no idea what you're
converting, nor what the consequences will be should you take that action.
RSS validators insist that output must have character encoding. I
decided I would go with UTF-8. However, sometimes my users write their
weblog posts in Microsoft Word, or Word Perfect, or MacWrite, or
OpenOffice, and then they copy and paste their entry to the input form,
and they hit input. What they've input is not UTF-8. And then my RSS
feeds fail validation, because they have characters in them that are
not UTF-8. I'm trying to get my RSS feeds to validate, no matter what
people input. I see that many services on the web have solved this
problem, but I haven't yet figured out how they do it. If I had more
resources, I could probably develop more extensive tests for character
encoding.

I keep trying to fix this problem but I never get it fixed, and I feel
like I've harrassed people on comp.lang.php for quite a bit of help
already

Perhaps you should explain the background of this issue. What sort of
data is being received that's causing these problems? Characters from a
foreign language? Bizarre characters that seem out-of-place amongst
something entirely recognisable (a paragraph of English, for example)?


Characters outside of my chosen character encoding is the problem. I"m
trying to scrub them out of the posts.

Aug 19 '05 #4
> I would be more worried about the data. Validity might play a part in
that, but it's not the be-all and end-all. Indeed, well-formedness is
certainly critical with regards to XML, but what use is valid data if
it's complete garbage?


Frankly, I think I would be delighted if one of my users ended up with
an RSS feed where every single character was a garbage character. Maybe
then they'd finally listen to me and stop inputting stuff from
Microsoft Word, encoded in who-knows-what encoding.

Aug 19 '05 #5
You can do this far more efficiently with regular expression. Something
like preg_replace('/[\x80-\xFF]/', '?', $s) should do.

Aug 19 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Alexander Eisenhuth | last post by:
Hallo all there, maby I don't see the forest because of all that trees, but : from struct import * # how can I convert f = float(0.5)
14
by: wolfgang haefelinger | last post by:
Hi, I wonder whether someone could explain me a bit what's going on here: import sys # I'm running Mandrake 1o and Windows XP. print sys.version ## 2.3.3 (#2, Feb 17 2004, 11:45:40)
7
by: Kevin Stern | last post by:
Hi All, Here is a program I wrote and compiled using g++ #include <iostream> int main() { char a; std::cin.get(a); std::cout << a; std::cin.ignore(100, '\n');
21
by: nephish | last post by:
i have an interesting project at work going on. here is the challenge. i am using the serial module to read data from a serial input. it comes in as a hex. i need to make it a binary and compare it...
1
by: siliconwafer | last post by:
Hi All, here is one code: int main() { FILE*fp; unsigned long a; fp = fopen("my_file.txt","w+"); a = 24; fprintf(fp,"%ld",a); while(fscanf(fp,"%ld",&a) == 1) {
4
by: chris_fieldhouse | last post by:
Hi, I'm almost done with a php driven email filter and automated forwarder, I've tested it out with various emails and ironed out plain text and html. But this final item has me stumped. ...
1
by: tony | last post by:
Does PHP have a byte or character type ? If I do $email = "tony@tony.com"; echo $email; I can get the letter "t" However according to the documentation (and my tests) this is not being
2
by: guilesf2 | last post by:
When serializing the following object: class a { private $_x = 0; } I get O:1:"a":1:{s:5:"a?_x";i:0;} The question mark character in that string, which I belive has something to do with
13
by: Liang Chen | last post by:
Hope you all had a nice weekend. I have a question that I hope someone can help me out. I want to run a Python program that uses Tkinter for the user interface (GUI). The program allows me to type...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.