472,334 Members | 1,508 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,334 software developers and data experts.

making a valid file name...

Hi I'm writing a python script that creates directories from user
input.
Sometimes the user inputs characters that aren't valid characters for a
file or directory name.
Here are the characters that I consider to be valid characters...

valid =
':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ '

if I have a string called fname I want to go through each character in
the filename and if it is not a valid character, then I want to replace
it with a space.

This is what I have:

def fixfilename(fname):
valid =
':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGH IJKLMNOPQRSTUVWXYZ '
for i in range(len(fname)):
if valid.find(fname[i]) < 0:
fname[i] = ' '
return fname

Anyone think of a simpler solution?

Oct 17 '06 #1
10 4098
I would suggest something like string.maketrans
http://docs.python.org/lib/node41.html. I don't remember exactly how
it works, but I think it's something like
>>invalid_chars = "abc"
replace_chars = "123"
char_map = string.maketrans(invalid_chars, replace_chars)
filename = "abc123.txt"
filename.translate(charmap)
'123123.txt'

--
Jerry

Oct 17 '06 #2

SpreadTooThin wrote:
Hi I'm writing a python script that creates directories from user
input.
Sometimes the user inputs characters that aren't valid characters for a
file or directory name.
Here are the characters that I consider to be valid characters...

valid =
':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ '

if I have a string called fname I want to go through each character in
the filename and if it is not a valid character, then I want to replace
it with a space.

This is what I have:

def fixfilename(fname):
valid =
':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGH IJKLMNOPQRSTUVWXYZ '
for i in range(len(fname)):
if valid.find(fname[i]) < 0:
fname[i] = ' '
return fname

Anyone think of a simpler solution?
If you want to strip 'em:
>>valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ '
filename = '!"£!£$"$££$%$£%$£lasfjalsfjdlasfjasfd()()()someth ingelse.dat'
stripped = ''.join(c for c in filename if c in valid)
stripped
'lasfjalsfjdlasfjasfdsomethingelse.dat'

If you want to replace them with something, be careful of the regex
string being built (ie a space character).
import re
>>re.sub(r'[^%s]' % valid,' ',filename)
' lasfjalsfjdlasfjasfd somethingelse.dat'
Jon.

Oct 17 '06 #3
Sometimes the user inputs characters that aren't valid
characters for a file or directory name. Here are the
characters that I consider to be valid characters...

valid =
':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ '
Just a caveat, as colons and slashes can give grief on various
operating systems...combined with periods, it may be possible to
cause trouble too...
This is what I have:

def fixfilename(fname):
valid =
':.\,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGH IJKLMNOPQRSTUVWXYZ '
for i in range(len(fname)):
if valid.find(fname[i]) < 0:
fname[i] = ' '
return fname

Anyone think of a simpler solution?
I don't know if it's simpler, but you can use
>>fname = "this is a test & it ain't expen$ive.py"
''.join(c in valid and c or ' ' for c in fname)
'this is a test it ain t expen ive.py'

It does use the "it's almost a ternary operator, but not quite"
method concurrently being discussed/lambasted in another thread.
Treat accordingly, with all that may entail. Should be good in
this case though.

If you're doing it on a time-critical basis, it might help to
make "valid" a set, which should have O(1) membership testing,
rather than using the "in" test with a string. I don't know how
well the find() method of a string performs in relationship to
"in" testing of a set. Test and see, if it's important.

-tkc

Oct 17 '06 #4
Hi,

On 10/17/2006 06:22:45 PM, SpreadTooThin wrote:
valid =
':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ '
not specifying the OS platform, these are not all the characters
that may occur in a filename: '[]{}-=", etc. And '/' is NOT valid.
On a unix platform. And it should be easy to scan the filename and
check every character against the 'valid-string'.

HTH, cu l8r, Edgar.
--
\|||/
(o o) Just curious...
----ooO-(_)-Ooo---------------------------------------------------------
Oct 17 '06 #5
On 2006-10-17, Tim Chase <py*********@tim.thechases.comwrote:
If you're doing it on a time-critical basis, it might help to
make "valid" a set, which should have O(1) membership testing,
rather than using the "in" test with a string. I don't know
how well the find() method of a string performs in relationship
to "in" testing of a set. Test and see, if it's important.
The find method of (8-bit) strings is really, really fast. My
guess is that set can't beat it. I tried to beat it recently with
a binary search function. Even after applying psyco find was
still faster (though I could beat the bisect functions by a
little bit by replacing a divide with a shift).

--
Neil Cerutti
This is not a book to be put down lightly. It should be thrown
with great force. --Dorothy Parker
Oct 17 '06 #6
>If you're doing it on a time-critical basis, it might help to
>make "valid" a set, which should have O(1) membership testing,
rather than using the "in" test with a string. I don't know
how well the find() method of a string performs in relationship
to "in" testing of a set. Test and see, if it's important.

The find method of (8-bit) strings is really, really fast. My
guess is that set can't beat it. I tried to beat it recently with
a binary search function. Even after applying psyco find was
still faster (though I could beat the bisect functions by a
little bit by replacing a divide with a shift).
In "theory" (you know...that little town in west Texas where
everything goes right), a set-membership test should be O(1). A
binary search function would be O(log N). A linear search of a
string for a member should be O(N).

In practice, however, for such small strings as the given
whitelist, the underlying find() operation likely doesn't put a
blip on the radar. If your whitelist were some huge document
that you were searching repeatedly, it could have worse
performance. Additionally, the find() in the underlying C code
is likely about as bare-metal as it gets, whereas the set
membership aspect of things may go through some more convoluted
setup/teardown/hashing and spend a lot more time further from the
processor's op-codes.

And I know that a number of folks have done some hefty
optimization of Python's string-handling abilities. There's
likely a tradeoff point where it's better to use one over the
other depending on the size of the whitelist. YMMV

-tkc



Oct 17 '06 #7
On 2006-10-17, Edgar Matzinger <ed***@edgar-matzinger.nlwrote:
Hi,

On 10/17/2006 06:22:45 PM, SpreadTooThin wrote:
>valid =
':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ '

not specifying the OS platform, these are not all the
characters that may occur in a filename: '[]{}-=", etc. And '/'
is NOT valid. On a unix platform. And it should be easy to
scan the filename and check every character against the
'valid-string'.
In the interactive fiction world where I come from, a portable
filename is only 8 chars long and matches the regex
[A-Z][A-Z0-9]*, i.e., capital letters and numbers, with no
extension. That way it'll work on old DOS machines and on
Risc-OS. Wait... is there Python for Risc-OS?
--
Neil Cerutti
>
HTH, cu l8r, Edgar.
Oct 17 '06 #8
Matthew Warren wrote:
>>import re
badfilename='£"%^"£^"£$^ihgeroighroeig3645^£$^"k novin98u4#346#1461461'
valid=':./,^0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKL MNOPQRSTUVWXYZ '
goodfilename=re.sub('[^'+valid+']',' ',badfilename)
to create arbitrary character sets, it's usually best to run the character string through
re.escape() before passing it to the RE engine.

</F>

Oct 18 '06 #9
Tim Chase:
In practice, however, for such small strings as the given
whitelist, the underlying find() operation likely doesn't put a
blip on the radar. If your whitelist were some huge document
that you were searching repeatedly, it could have worse
performance. Additionally, the find() in the underlying C code
is likely about as bare-metal as it gets, whereas the set
membership aspect of things may go through some more convoluted
setup/teardown/hashing and spend a lot more time further from the
processor's op-codes.
With this specific test (half good half bad), on Py2.5, on my PC, sets
start to be faster than the string search when the string "good" is
about 5-6 chars long (this means set are quite fast, I presume).

from random import choice, seed
from time import clock

def main(choice=choice):
seed(1)
n = 100000

for good in ("ab", "abc", "abcdef", "abcdefgh",
"abcdefghijklmnopqrstuvwxyz"):
poss = good + good.upper()
data = [choice(poss) for _ in xrange(n)] * 10
print "len(good) = ", len(good)

t = clock()
for c in data:
c in good
print round(clock()-t, 2)

t = clock()
sgood = set(good)
for c in data:
c in sgood
print round(clock()-t, 2), "\n"

main()
Bye,
bearophile

Oct 18 '06 #10
On 2006-10-18, be************@lycos.com <be************@lycos.comwrote:
Tim Chase:
>In practice, however, for such small strings as the given
whitelist, the underlying find() operation likely doesn't put a
blip on the radar. If your whitelist were some huge document
that you were searching repeatedly, it could have worse
performance. Additionally, the find() in the underlying C code
is likely about as bare-metal as it gets, whereas the set
membership aspect of things may go through some more convoluted
setup/teardown/hashing and spend a lot more time further from the
processor's op-codes.

With this specific test (half good half bad), on Py2.5, on my PC, sets
start to be faster than the string search when the string "good" is
about 5-6 chars long (this means set are quite fast, I presume).

from random import choice, seed
from time import clock

def main(choice=choice):
seed(1)
n = 100000

for good in ("ab", "abc", "abcdef", "abcdefgh",
"abcdefghijklmnopqrstuvwxyz"):
poss = good + good.upper()
data = [choice(poss) for _ in xrange(n)] * 10
print "len(good) = ", len(good)

t = clock()
for c in data:
c in good
print round(clock()-t, 2)

t = clock()
sgood = set(good)
for c in data:
c in sgood
print round(clock()-t, 2), "\n"

main()
On my Python2.4 for Windows, they are often still neck-and-neck
for len(good) = 26. set's disadvantage of having to be
constructed is heavily amortized over 100,000 membership
tests. Without knowing the usage pattern, it'd be hard to choose
between them.

--
Neil Cerutti
Oct 19 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: |-|erc | last post by:
Hi! Small challenge for you. The index.php uses this file and calls layout(). Take a look at www.chatty.net this file draws the chat login box...
9
by: Mike McGee | last post by:
I am new to database apps, but I am making a db with access 2002. Here is what I have and what I would like for it to do. tblCustomers = holds...
3
by: Chris | last post by:
Hi, In C# I tried to save a file from a generated file name. Just before launching the dialog I check for a valid file name to be sure. There...
7
by: Nathan Sokalski | last post by:
I have a form that allows the user to upload a file. Even though <input type="file" runat="server"> is intended to have the user choose the file...
2
by: lucifer | last post by:
hi i am making an http server it has following functions main() { if option is "-?", output the hints and stop check the directory supplied is...
351
by: CBFalconer | last post by:
We often find hidden, and totally unnecessary, assumptions being made in code. The following leans heavily on one particular example, which...
1
by: keithb | last post by:
My ASP.NET 2.0 application has a User Control that contains a DataList that is unable to get style information from a style located in a css file in...
10
by: Academia | last post by:
I'd like to check a string to see that it is a valid file name. Is there a Like pattern or RegEx that can do that. 1) Just the file name with...
50
by: Juha Nieminen | last post by:
I asked a long time ago in this group how to make a smart pointer which works with incomplete types. I got this answer (only relevant parts...
0
by: concettolabs | last post by:
In today's business world, businesses are increasingly turning to PowerApps to develop custom business applications. PowerApps is a powerful tool...
0
by: teenabhardwaj | last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
0
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
0
by: CD Tom | last post by:
This happens in runtime 2013 and 2016. When a report is run and then closed a toolbar shows up and the only way to get it to go away is to right...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
2
by: Matthew3360 | last post by:
Hi, I have a python app that i want to be able to get variables from a php page on my webserver. My python app is on my computer. How would I make it...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.