Hi I'm writing a python script that creates directories from user
input.
Sometimes the user inputs characters that aren't valid characters for a
file or directory name.
Here are the characters that I consider to be valid characters...
valid =
':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ '
if I have a string called fname I want to go through each character in
the filename and if it is not a valid character, then I want to replace
it with a space.
This is what I have:
def fixfilename(fname):
valid =
':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGH IJKLMNOPQRSTUVWXYZ '
for i in range(len(fname)):
if valid.find(fname[i]) < 0:
fname[i] = ' '
return fname
Anyone think of a simpler solution? 10 4098
I would suggest something like string.maketrans http://docs.python.org/lib/node41.html. I don't remember exactly how
it works, but I think it's something like
>>invalid_chars = "abc" replace_chars = "123" char_map = string.maketrans(invalid_chars, replace_chars) filename = "abc123.txt" filename.translate(charmap)
'123123.txt'
--
Jerry
SpreadTooThin wrote:
Hi I'm writing a python script that creates directories from user
input.
Sometimes the user inputs characters that aren't valid characters for a
file or directory name.
Here are the characters that I consider to be valid characters...
valid =
':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ '
if I have a string called fname I want to go through each character in
the filename and if it is not a valid character, then I want to replace
it with a space.
This is what I have:
def fixfilename(fname):
valid =
':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGH IJKLMNOPQRSTUVWXYZ '
for i in range(len(fname)):
if valid.find(fname[i]) < 0:
fname[i] = ' '
return fname
Anyone think of a simpler solution?
If you want to strip 'em:
>>valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ ' filename = '!"£!£$"$££$%$£%$£lasfjalsfjdlasfjasfd()()()someth ingelse.dat' stripped = ''.join(c for c in filename if c in valid) stripped
'lasfjalsfjdlasfjasfdsomethingelse.dat'
If you want to replace them with something, be careful of the regex
string being built (ie a space character).
import re
>>re.sub(r'[^%s]' % valid,' ',filename)
' lasfjalsfjdlasfjasfd somethingelse.dat'
Jon.
Sometimes the user inputs characters that aren't valid
characters for a file or directory name. Here are the
characters that I consider to be valid characters...
valid =
':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ '
Just a caveat, as colons and slashes can give grief on various
operating systems...combined with periods, it may be possible to
cause trouble too...
This is what I have:
def fixfilename(fname):
valid =
':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGH IJKLMNOPQRSTUVWXYZ '
for i in range(len(fname)):
if valid.find(fname[i]) < 0:
fname[i] = ' '
return fname
Anyone think of a simpler solution?
I don't know if it's simpler, but you can use
>>fname = "this is a test & it ain't expen$ive.py" ''.join(c in valid and c or ' ' for c in fname)
'this is a test it ain t expen ive.py'
It does use the "it's almost a ternary operator, but not quite"
method concurrently being discussed/lambasted in another thread.
Treat accordingly, with all that may entail. Should be good in
this case though.
If you're doing it on a time-critical basis, it might help to
make "valid" a set, which should have O(1) membership testing,
rather than using the "in" test with a string. I don't know how
well the find() method of a string performs in relationship to
"in" testing of a set. Test and see, if it's important.
-tkc
Hi,
On 10/17/2006 06:22:45 PM, SpreadTooThin wrote:
valid =
':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ '
not specifying the OS platform, these are not all the characters
that may occur in a filename: '[]{}-=", etc. And '/' is NOT valid.
On a unix platform. And it should be easy to scan the filename and
check every character against the 'valid-string'.
HTH, cu l8r, Edgar.
--
\|||/
(o o) Just curious...
----ooO-(_)-Ooo---------------------------------------------------------
On 2006-10-17, Tim Chase <py*********@tim.thechases.comwrote:
If you're doing it on a time-critical basis, it might help to
make "valid" a set, which should have O(1) membership testing,
rather than using the "in" test with a string. I don't know
how well the find() method of a string performs in relationship
to "in" testing of a set. Test and see, if it's important.
The find method of (8-bit) strings is really, really fast. My
guess is that set can't beat it. I tried to beat it recently with
a binary search function. Even after applying psyco find was
still faster (though I could beat the bisect functions by a
little bit by replacing a divide with a shift).
--
Neil Cerutti
This is not a book to be put down lightly. It should be thrown
with great force. --Dorothy Parker
>If you're doing it on a time-critical basis, it might help to
>make "valid" a set, which should have O(1) membership testing, rather than using the "in" test with a string. I don't know how well the find() method of a string performs in relationship to "in" testing of a set. Test and see, if it's important.
The find method of (8-bit) strings is really, really fast. My
guess is that set can't beat it. I tried to beat it recently with
a binary search function. Even after applying psyco find was
still faster (though I could beat the bisect functions by a
little bit by replacing a divide with a shift).
In "theory" (you know...that little town in west Texas where
everything goes right), a set-membership test should be O(1). A
binary search function would be O(log N). A linear search of a
string for a member should be O(N).
In practice, however, for such small strings as the given
whitelist, the underlying find() operation likely doesn't put a
blip on the radar. If your whitelist were some huge document
that you were searching repeatedly, it could have worse
performance. Additionally, the find() in the underlying C code
is likely about as bare-metal as it gets, whereas the set
membership aspect of things may go through some more convoluted
setup/teardown/hashing and spend a lot more time further from the
processor's op-codes.
And I know that a number of folks have done some hefty
optimization of Python's string-handling abilities. There's
likely a tradeoff point where it's better to use one over the
other depending on the size of the whitelist. YMMV
-tkc
On 2006-10-17, Edgar Matzinger <ed***@edgar-matzinger.nlwrote:
Hi,
On 10/17/2006 06:22:45 PM, SpreadTooThin wrote:
>valid = ':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ '
not specifying the OS platform, these are not all the
characters that may occur in a filename: '[]{}-=", etc. And '/'
is NOT valid. On a unix platform. And it should be easy to
scan the filename and check every character against the
'valid-string'.
In the interactive fiction world where I come from, a portable
filename is only 8 chars long and matches the regex
[A-Z][A-Z0-9]*, i.e., capital letters and numbers, with no
extension. That way it'll work on old DOS machines and on
Risc-OS. Wait... is there Python for Risc-OS?
--
Neil Cerutti
>
HTH, cu l8r, Edgar.
Matthew Warren wrote:
>>import re badfilename='£"%^"£^"£$^ihgeroighroeig3645^£$^"k novin98u4#346#1461461' valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ ' goodfilename=re.sub('[^'+valid+']',' ',badfilename)
to create arbitrary character sets, it's usually best to run the character string through
re.escape() before passing it to the RE engine.
</F>
Tim Chase:
In practice, however, for such small strings as the given
whitelist, the underlying find() operation likely doesn't put a
blip on the radar. If your whitelist were some huge document
that you were searching repeatedly, it could have worse
performance. Additionally, the find() in the underlying C code
is likely about as bare-metal as it gets, whereas the set
membership aspect of things may go through some more convoluted
setup/teardown/hashing and spend a lot more time further from the
processor's op-codes.
With this specific test (half good half bad), on Py2.5, on my PC, sets
start to be faster than the string search when the string "good" is
about 5-6 chars long (this means set are quite fast, I presume).
from random import choice, seed
from time import clock
def main(choice=choice):
seed(1)
n = 100000
for good in ("ab", "abc", "abcdef", "abcdefgh",
"abcdefghijklmnopqrstuvwxyz"):
poss = good + good.upper()
data = [choice(poss) for _ in xrange(n)] * 10
print "len(good) = ", len(good)
t = clock()
for c in data:
c in good
print round(clock()-t, 2)
t = clock()
sgood = set(good)
for c in data:
c in sgood
print round(clock()-t, 2), "\n"
main()
Bye,
bearophile
On 2006-10-18, be************@lycos.com <be************@lycos.comwrote:
Tim Chase:
>In practice, however, for such small strings as the given whitelist, the underlying find() operation likely doesn't put a blip on the radar. If your whitelist were some huge document that you were searching repeatedly, it could have worse performance. Additionally, the find() in the underlying C code is likely about as bare-metal as it gets, whereas the set membership aspect of things may go through some more convoluted setup/teardown/hashing and spend a lot more time further from the processor's op-codes.
With this specific test (half good half bad), on Py2.5, on my PC, sets
start to be faster than the string search when the string "good" is
about 5-6 chars long (this means set are quite fast, I presume).
from random import choice, seed
from time import clock
def main(choice=choice):
seed(1)
n = 100000
for good in ("ab", "abc", "abcdef", "abcdefgh",
"abcdefghijklmnopqrstuvwxyz"):
poss = good + good.upper()
data = [choice(poss) for _ in xrange(n)] * 10
print "len(good) = ", len(good)
t = clock()
for c in data:
c in good
print round(clock()-t, 2)
t = clock()
sgood = set(good)
for c in data:
c in sgood
print round(clock()-t, 2), "\n"
main()
On my Python2.4 for Windows, they are often still neck-and-neck
for len(good) = 26. set's disadvantage of having to be
constructed is heavily amortized over 100,000 membership
tests. Without knowing the usage pattern, it'd be hard to choose
between them.
--
Neil Cerutti This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: |-|erc |
last post by:
Hi!
Small challenge for you.
The index.php uses this file and calls layout(). Take a look at www.chatty.net this file
draws the chat login box...
|
by: Mike McGee |
last post by:
I am new to database apps, but I am making a db with access 2002. Here is
what I have and what I would like for it to do.
tblCustomers = holds...
|
by: Chris |
last post by:
Hi,
In C# I tried to save a file from a generated file name.
Just before launching the dialog I check for a valid file
name to be sure.
There...
|
by: Nathan Sokalski |
last post by:
I have a form that allows the user to upload a file. Even though <input
type="file" runat="server"> is intended to have the user choose the file...
|
by: lucifer |
last post by:
hi
i am making an http server
it has following functions
main()
{
if option is "-?", output the hints and stop
check the directory supplied is...
|
by: CBFalconer |
last post by:
We often find hidden, and totally unnecessary, assumptions being
made in code. The following leans heavily on one particular
example, which...
|
by: keithb |
last post by:
My ASP.NET 2.0 application has a User Control that contains a DataList that
is unable to get style information from a style located in a css file in...
|
by: Academia |
last post by:
I'd like to check a string to see that it is a valid file name.
Is there a Like pattern or RegEx that can do that.
1) Just the file name with...
|
by: Juha Nieminen |
last post by:
I asked a long time ago in this group how to make a smart pointer
which works with incomplete types. I got this answer (only relevant
parts...
|
by: concettolabs |
last post by:
In today's business world, businesses are increasingly turning to PowerApps to develop custom business applications. PowerApps is a powerful tool...
|
by: teenabhardwaj |
last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
|
by: Kemmylinns12 |
last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
|
by: CD Tom |
last post by:
This happens in runtime 2013 and 2016. When a report is run and then closed a toolbar shows up and the only way to get it to go away is to right...
|
by: jalbright99669 |
last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
|
by: antdb |
last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine
In the overall architecture, a new "hyper-convergence" concept was...
|
by: Matthew3360 |
last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
|
by: AndyPSV |
last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
|
by: WisdomUfot |
last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...
| |