a -very- case sensitive search

Ola K

Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...
--Ola

Nov 25 '06 #1

Subscribe Post Reply

1661

Goofy666

* and I need to do all these considering the fact that not all letters
are indeed English letters.

You mean letters from the English alphabet (derived from the Latin/Roman
alphabet, fyi)? I'm sorry for the nitpicking, but 'English letters'
sounds a bit too 'ackward' to me.

I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...

I'm still (trying to) learn(ing) it myself, but you can try looking into
using regular expressions. There's a standard module for it (re), see
the PyLib Reference for details; http://docs.python.org/lib/module-re.html.

--Laurens

Ola K wrote:

Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...
--Ola

Nov 25 '06 #2

Dustan

Ola K wrote:

Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...
--Ola

I'm not sure exactly what you mean by "considering the fact that not
all letters are indeed English letters"; you could mean you don't care
about the non-english characters, or you could mean you don't want any
non-english characters at all (so the function should return False in
that case). If the case is the former, there's a simple test for each:

>>word = 'hi'
word.upper() == word # evaluates to True if the word is all caps

False

>>word.lower() == word # evaluates to True if the word is all lowercase

True

>>word.title() == word # evaluates to True if the word is in a title format

False

>>>

Nov 25 '06 #3

Steven D'Aprano

On Sat, 25 Nov 2006 13:39:55 -0800, Ola K wrote:

Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.

At the command prompt:

>>dir('')

# result edited for clarity
[ ... 'isalnum', 'isalpha', 'isdigit', 'islower',
'isspace', 'istitle', 'isupper', ... ]

Then do this:

>>help(''.islower)

and read the text it provides. Then experiment on the command line:

>>'abcd1234'.islower()

True

>>'aBcd1234'.islower()

False

Then come back to us if they aren't suitable, and tell us WHY they aren't
suitable.
--
Steven.

Nov 25 '06 #4

Dustan

Dustan wrote:

Ola K wrote:
Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...
--Ola

I'm not sure exactly what you mean by "considering the fact that not
all letters are indeed English letters"; you could mean you don't care
about the non-english characters, or you could mean you don't want any
non-english characters at all (so the function should return False in
that case). If the case is the former, there's a simple test for each:

>word = 'hi'
word.upper() == word # evaluates to True if the word is all caps

False

>word.lower() == word # evaluates to True if the word is all lowercase

True

>word.title() == word # evaluates to True if the word is in a title format

False

>>

If you're using google groups, it for some reason thought my example
code was 'quoted text', which it certainly isn't, seeing as it's not
found anywhere prior to my message.

Nov 25 '06 #5

Dustan

Steven D'Aprano wrote:

On Sat, 25 Nov 2006 13:39:55 -0800, Ola K wrote:

Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.

At the command prompt:

>dir('')

# result edited for clarity
[ ... 'isalnum', 'isalpha', 'isdigit', 'islower',
'isspace', 'istitle', 'isupper', ... ]

Then do this:

>help(''.islower)

and read the text it provides. Then experiment on the command line:

>'abcd1234'.islower()

True

>'aBcd1234'.islower()

False

Forget what I said; I didn't know about the str.is* methods.

Then come back to us if they aren't suitable, and tell us WHY they aren't
suitable.
--
Steven.

Nov 25 '06 #6

John Machin

Dustan wrote:

Steven D'Aprano wrote:
On Sat, 25 Nov 2006 13:39:55 -0800, Ola K wrote:

Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.
>
I went through different documention section but couldn't find a right
condition, function or method for it.
At the command prompt:

>>dir('')
# result edited for clarity
[ ... 'isalnum', 'isalpha', 'isdigit', 'islower',
'isspace', 'istitle', 'isupper', ... ]

Then do this:

>>help(''.islower)
and read the text it provides. Then experiment on the command line:

>>'abcd1234'.islower()
True
>>'aBcd1234'.islower()
False

Forget what I said; I didn't know about the str.is* methods.

.... or about the unicode.is* [same bunch of] methods, which may
possibly address the OP's
"not all letters are indeed English letters" concerns. If he's stuck
with 8-bit str objects, he may need to ensure the locale is set
properly, as vaguely hinted at in the docs that Steven pointed him to.

Cheers,
John

Nov 25 '06 #7

John Machin

Dustan wrote:

>
If you're using google groups, it for some reason thought my example
code was 'quoted text', which it certainly isn't, seeing as it's not
found anywhere prior to my message.

Sigh. And if we're NOT using Google Groups, it still thinks so ...

The reason is that your "example code" was in fact a screen-dump of a
Python interactive session, in which lines are preceded by ">>>" which
Google Groups simplistically thinks is quoted text from a previous
message.

HTH,
John

Nov 26 '06 #8

Paul McGuire

"Ola K" <ol***@walla.co.ilwrote in message
news:11**********************@45g2000cws.googlegro ups.com...

Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...
--Ola

Ola,

You may be new to Python, but are you new to regular expressions too? I am
no wiz at them, but here is a script that takes a stab at what you are
trying to do. (For more regular expression info, see
http://www.amk.ca/python/howto/regex/.)

The script has these steps:
- create strings containing all unicode chars that are considered "lower"
and "upper", using the unicode.is* methods
- use these strings to construct 3 regular expressions (or "re"s), one for
words of all lowercase letters, one for words of all uppercase letters, and
one for words that start with an uppercase letter followed by at least one
lowercase letter.
- use each re to search the string u"YES yes Yes", and print the found
matches

I've used unicode strings throughout, so this should be applicable to your
text consisting of letters beyond the basic Latin set (since Outlook Express
is trying to install Israeli fonts when reading your post, I assume these
are the characters you are trying to handle). You may have to do some setup
of your locale for proper handling of unicode.isupper, etc., but I hope this
gives you a jump start on your problem.

-- Paul
import sys
import re

uppers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).isupper() )
lowers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).islower() )

allUpperRe = ur"\b[%s]+\b" % uppers
allLowerRe = ur"\b[%s]+\b" % lowers
capWordRe = ur"\b[%s][%s]+\b" % (uppers,lowers)

regexes = [
(allUpperRe, "all upper"),
(allLowerRe, "all lower"),
(capWordRe, "title case"),
]
for reString,label in regexes:
reg = re.compile(reString)
result = reg.findall(u" YES yes Yes ")
print label,":",result

Prints:
all upper : [u'YES']
all lower : [u'yes']
title case : [u'Yes']

Nov 26 '06 #9

John Machin

Paul McGuire wrote:

"Ola K" <ol***@walla.co.ilwrote in message
news:11**********************@45g2000cws.googlegro ups.com...
Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...
--Ola
Ola,

You may be new to Python, but are you new to regular expressions too? I am
no wiz at them, but here is a script that takes a stab at what you are
trying to do. (For more regular expression info, see
http://www.amk.ca/python/howto/regex/.)

The script has these steps:
- create strings containing all unicode chars that are considered "lower"
and "upper", using the unicode.is* methods
- use these strings to construct 3 regular expressions (or "re"s), one for
words of all lowercase letters, one for words of all uppercase letters, and
one for words that start with an uppercase letter followed by at least one
lowercase letter.
- use each re to search the string u"YES yes Yes", and print the found
matches

I've used unicode strings throughout, so this should be applicable to your
text consisting of letters beyond the basic Latin set (since Outlook Express
is trying to install Israeli fonts when reading your post, I assume these
are the characters you are trying to handle).

I'd guessed the OP was in Israel from his e-mail address. If that's
what Outlook Express is doing, then that's conclusive proof :-)

An aside to the OP: Pardon my ignorance, but does Hebrew have upper and
lower case?

You may have to do some setup
of your locale for proper handling of unicode.isupper, etc.,

Whatever gave you that impression?

but I hope this
gives you a jump start on your problem.

-- Paul
import sys
import re

uppers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).isupper() )
lowers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).islower() )

Just in case the OP is running a 32-bit unicode implementation, you
might want to make that xrange, not range :-)

>
allUpperRe = ur"\b[%s]+\b" % uppers
allLowerRe = ur"\b[%s]+\b" % lowers
capWordRe = ur"\b[%s][%s]+\b" % (uppers,lowers)

regexes = [
(allUpperRe, "all upper"),
(allLowerRe, "all lower"),
(capWordRe, "title case"),
]
for reString,label in regexes:
reg = re.compile(reString)
result = reg.findall(u" YES yes Yes ")
print label,":",result

Prints:
all upper : [u'YES']
all lower : [u'yes']
title case : [u'Yes']

Cheers,
John

Nov 26 '06 #10

Paul McGuire

"John Machin" <sj******@lexicon.netwrote in message
news:11**********************@l12g2000cwl.googlegr oups.com...

>

John -

Thanks for the updates. Comments below...

-- Paul

Paul McGuire wrote:

>You may have to do some setup
of your locale for proper handling of unicode.isupper, etc.,

Whatever gave you that impression?

Nothing. Just my own ignorance of unicode and i18n. This post really is
just string mechanics and re's - I wasn't sure I had all the underlying
unicode stuff right.

>uppers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).isupper() )
lowers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).islower() )

Just in case the OP is running a 32-bit unicode implementation, you
might want to make that xrange, not range :-)

Good tip. I rarely use xrange, it seems like such a language wart. Isn't
"range" going to become what "xrange" is in Py3k?

Nov 26 '06 #11

Ola K

Thank you! This was really helpful. Also the data bit about .istitle()
was the missinng piece of the puzzle for me... So now my script is nice
and working :)

And as beside the point, yes I am from Israel, and no, we don't have
uper case and lower case letters. Hebrew has only one set of letters.
So my script was actualy for the english letters inside the hebrew
text...

--Ola

Paul McGuire ëúá:

"Ola K" <ol***@walla.co.ilwrote in message
news:11**********************@45g2000cws.googlegro ups.com...
Hi,
I am pretty new to Python and I want to make a script that will search
for the following options:
1) words made of uppercase characters -only- (like "YES")
2) words made of lowercase character -only- (like "yes")
3) and words with only the first letter capitalized (like "Yes")
* and I need to do all these considering the fact that not all letters
are indeed English letters.

I went through different documention section but couldn't find a right
condition, function or method for it.
Suggestions will be very much appriciated...
--Ola
Ola,

You may be new to Python, but are you new to regular expressions too? I am
no wiz at them, but here is a script that takes a stab at what you are
trying to do. (For more regular expression info, see
http://www.amk.ca/python/howto/regex/.)

The script has these steps:
- create strings containing all unicode chars that are considered "lower"
and "upper", using the unicode.is* methods
- use these strings to construct 3 regular expressions (or "re"s), one for
words of all lowercase letters, one for words of all uppercase letters, and
one for words that start with an uppercase letter followed by at least one
lowercase letter.
- use each re to search the string u"YES yes Yes", and print the found
matches

I've used unicode strings throughout, so this should be applicable to your
text consisting of letters beyond the basic Latin set (since Outlook Express
is trying to install Israeli fonts when reading your post, I assume these
are the characters you are trying to handle). You may have to do some setup
of your locale for proper handling of unicode.isupper, etc., but I hope this
gives you a jump start on your problem.

-- Paul
import sys
import re

uppers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).isupper() )
lowers = u"".join( unichr(i) for i in range(sys.maxunicode)
if unichr(i).islower() )

allUpperRe = ur"\b[%s]+\b" % uppers
allLowerRe = ur"\b[%s]+\b" % lowers
capWordRe = ur"\b[%s][%s]+\b" % (uppers,lowers)

regexes = [
(allUpperRe, "all upper"),
(allLowerRe, "all lower"),
(capWordRe, "title case"),
]
for reString,label in regexes:
reg = re.compile(reString)
result = reg.findall(u" YES yes Yes ")
print label,":",result

Prints:
all upper : [u'YES']
all lower : [u'yes']
title case : [u'Yes']

Dec 2 '06 #12

Similar topics

Not a case sensitive search?

by: chelleybabyger | last post by:

Below is my sql search code in my asp page. But my search seems to be case sensitive. How can i modify it to make it not case sensitive? Thanks <% sqlString = "SELECT product_image, product_name,...

ASP / Active Server Pages

case-sensitive search in sql 7

by: JP | last post by:

Hi, I have yet to find an answer for this: I want to do a case-sensitive query using "like" on a table in sql 7. Currently, "like" performs case-insensitive query. I understand that you can...

Microsoft SQL Server

Are meta tags case sensitive?

by: Barbara White | last post by:

Are meta tags case sensitive or does case sensitivity vary depending on the search engine a user is using? I've read that some search engines care about case wrt meta tags (for keywords in...

HTML / CSS

meta tags case sensitive?

by: Barbara White | last post by:

Are meta tags case sensitive or does case sensitivity vary depending on the search engine a user is using? I'm wondering whether there's a standard regarding case (and other rules) in meta tags...

HTML / CSS

sql query - case sensitive search

by: desi90 | last post by:

Hi, I have a table stored with names. One of the column is name. Example Name= 'John' I have created a search functionality. How can I add sql seach so that it return 'John' even when 'john' or...

Microsoft Access / VBA

Case Insensitive Search with Sensitive Replace

by: ericswebber | last post by:

Case Insensitive Search with Sensitive Replace -------------------------------------------------------------------- Need a REGEX case insensitve search & replace where case of found string is...

C# / C Sharp

case insensitive find on case sensitive stl map

by: benhoefer | last post by:

I have been searching around and have not been able to find any info on this. I have a unique situation where I need a case sensitive map: std::map<string, intimap; I need to be able to run a...

C / C++

Why Case Sensitive?

by: Bart | last post by:

Why is C case sensitive? I know it's a bit late to change it now but there would seem to be far more advantages in ignoring letter case in source code. In real life such a situation would be...

C / C++

Making class attributes non-case-sensitive?

by: Rafe | last post by:

Hi, I'm working within an application (making a lot of wrappers), but the application is not case sensitive. For example, Typing obj.name, obj.Name, or even object.naMe is all fine (as far as...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++