473,789 Members | 2,254 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

allowed characters in a string (stripping it)

I'd like to have a set of "allowed characters", and strip a string
from everything besides those.

I've tried and tried but so far every time I enter strings containing
unicode, it goes mad and output makes no sense.

I'm sure I'm missing something but no idea what.

$ACCENTED_ALL_L OW="*èìòù é*óúâêî ôûäëïöü ãõñçæøå ăāĕē*īŏ *ūəß";
$ACCENTED_ALL_B IG="ÀÈÌÒÙ ÉÍÓÚÂÊÎ ÔÛÄËÏÖÜ ÃÕÑÇÆØÅ ĂĀĔĒĬĪŎ ŬŪƏß";
$ACCENTED_ALL=$ ACCENTED_ALL_LO W.$ACCENTED_ALL _BIG;
$ALPHABET_LOW=" qwertyuiopasdfg hjklzxcvbnm";
$ALPHABET_BIG=" QWERTYUIOPASDFG HJKLZXCVBNM";
$ALPHABET_ALL=$ ALPHABET_LOW.$A LPHABET_BIG;
$SYMBOLS_NAME=" .'- ";

first time I tried using something like this:
$name=preg_repl ace("/([a-zA-Z]|-|[$al])|./",'$1',$nam e);
(bear in mind I do *NOT* know regexp, a friend wrote this line)

now I tried instead using str_split:
function clear_name_comp lex ($name, $ok_chars) {
$ass=str_split( $ok_chars);
$al=array();
foreach ($ass as $a) {
$al[$a]=TRUE;
}

$s=str_split($n ame);
$ret="";
foreach ($s as $c) {
if (!$al[$c]) continue;
$ret.=$c;
}

return $ret;
}

still nothing.
unicode, and it goes mad and output makes no sense.
I belive that's because in both cases it treats unicode characters
splitting into single bytes, but still, I'm clueless about what am I
supposed to do.
Dec 11 '07
11 6498

"Tim Roberts" <ti**@probo.com wrote in message
news:i2******** *************** *********@4ax.c om...
Jerry Stuckle <js*******@attg lobal.netwrote:
>>Tim Roberts wrote:
>>"Lo'oris" <lo****@gmail.c omwrote:

I'd like to have a set of "allowed characters", and strip a string
from everything besides those.
I've tried and tried but so far every time I enter strings containing
unicode, it goes mad and output makes no sense.

How are you entering "strings containing unicode"? Browsers don't send
Unicode.

Excuse me? They sure can, depending on the language being used.

Yes, I know better. That was not the sentiment I intended to convey.
>>So the rest of your post is immaterial. Steve's suggestion is a lot
closer.

Damn you, Stuckle. How can you see anything at all from up there on your
high horse?
with jerry, it's a matter of people in glass houses. except when you start
throwing rocks at his, he will claim you have no rock and that, in fact,
you've not broken any windows. :)
Despite my faux pas, my suggestion was also correct, your invective
notwithstanding .
good word, invective...he likes doing that apparently. at least we encounter
it often in his posts.

cheers.
Dec 13 '07 #11

"Steve" <no****@example .comwrote in message
news:HI******** *****@newsfe07. lga...
>
"Tim Roberts" <ti**@probo.com wrote in message
news:dp******** *************** *********@4ax.c om...
>"Steve" <no****@example .comwrote:
>>>
$name=preg_ replace("/([a-zA-Z]|-|[$al])|./",'$1',$nam e);

it's not expensive at all. and a dot is any single character...not a
greedy
wild card. the only reason he wouldn't want a dot is because it could be
an
'illegal' character that he's trying to get rid of anyway. as it is, he
just
didn't escape the dot so that it is the character (period) and not the
directive (any single character).

That statement as written will replace each character with itself, one by
one, repeatedly, for each character in $name.

It is an expensive no-op.

that is true, however each character is analyzed *as a single character*.
there is no marker being set and a pattern being sought beyond that marker
to see if there is another pattern match. markers are set, the replacement
is made to those characters marked, the process is done. one of the least
expensive operations one could ask of preg.

may be a good idea to write a pattern you think would be less expense that
does similar things...see if you can time-test compare the two. you can
also measure memory consumption too. i don't think you'll find any
significant consumption of resources running the above, esp.
comparitively.
sorry tim, i needed to make it clear - as i'd mentioned in one of my first
responses to you - that i think the dot in his preg is just mistakenly not
excaped. i don't think he means "any single character", rather, "a period".
anyway, my comments above are made under that assumption. otherwise you are
more right than before, however more in the line of "that's dumb to put
or'ed patterns when one of those will basically make the other
conditions/patterns moot". still, in this case, the expense is nominal since
all conditions/patterns work over a single character.

just thought i'd clarify.

cheers
Dec 13 '07 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
2431
by: JKop | last post by:
Can some-one please point me to a nice site that gives an exhaustive list of all the memberfunctions, membervariables, operators, etc. of the std::string class, along with an informative description of how each works. I've been trying Google for the last 20 minutes but can't get anything decent. Thanks.
4
5266
by: Ewok | last post by:
let me just say. it's not by choice but im dealing with a .net web app (top down approach with VB and a MySQL database) sigh..... Anyhow, I've just about got all the kinks worked out but I am having trouble preserving data as it gets entered into the database. Primarily, quotes and special characters. Spcifically, I noticed it stripped out some double quotes and a "Registered" symbol &reg; (not the ascii but the actual character"
4
6675
by: Lu | last post by:
Hi, i am currently working on ASP.Net v1.0 and is encountering the following problem. In javascript, I'm passing in: "somepage.aspx?QSParameter=<RowID>Chèques</RowID>" as part of the query string. However, in the code behind when I tried to get the query string value by calling Request.QueryString("QSParameter"), the value I got is: "<RowID>Chques</RowID>". The special character "è" has been stripped out. The web.config file is...
3
2582
by: et | last post by:
How can I strip out unwanted characters in a string before updating the database? For instance, in names & addresses in our client table, we want only letters and numbers, no punctuation. Is there a way to do this?
4
3029
by: vvenk | last post by:
Hello: I have a string, "Testing_!@#$%^&*()". It may have single and double quotations as well. I would like to strip all chararcters others than a-z, A-Z, 0-9 and the comma. I came across the following snippet in the online help but the output does not change at all:
13
3233
by: preport | last post by:
I'm trying to ensure that all the characters in my XML document are characters specified in this document: http://www.w3.org/TR/2000/REC-xml-20001006#charsets Would a function like this work: private static string formatXMLString(string n) { if (string.IsNullOrEmpty(n)) return n; System.Text.StringBuilder sb = new System.Text.StringBuilder();
1
27834
Plater
by: Plater | last post by:
I have been using MS SQL server (8.0.194) and I have been wondering whatacters should I strip from entries before putting them into a varchar() field? I check for single quote (') and handle that, and malicious attempts. But is it ok to have the newline characters in there(\r\n)? The always show up as the ASCII-square box, so I was wondering if I need to be stripping them out as well? What other "normally used" text characters do I also need...
9
2079
by: Abandoned | last post by:
Hi.. I want to delete all now allowed characters in my text. I use this function: def clear(s1=""): if s1: allowed = s1 = "".join(ch for ch in s1 if ch in allowed) return s1
7
3794
by: Grok | last post by:
I need an elegant way to remove any characters in a string if they are not in an allowed char list. The part cleaning files of the non-allowed characters will run as a service, so no forms here. The list also needs to be editable by the end-user so I'll be providing a form on which they can edit the allowed character list. The end-user is non-technical so asking them to type a regular expression is out.
0
9499
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10374
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10121
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8995
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing, and deploymentwithout human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7519
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5404
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5539
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4076
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3677
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.