473,787 Members | 2,971 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Refactoring; arbitrary expression in lists


As continuation to a previous thread, "PyChecker messages", I have a question
regarding code refactoring which the following snippet leads to:
runner.py:200: Function (detectMimeType ) has too many returns (11)

The function is simply a long "else-if" clause, branching out to
different return statements. What's wrong? It's simply a "probably ugly
code" advice?


That is also advice. Generally you use a dict of functions, or some other
structure to lookup what you want to do.


More specifically, my function looks like this:

#--------------------------------------------------------------
def detectMimeType( filename ):

extension = filename[-3:]
basename = os.path.basenam e(filename)

if extension == "php":
return "applicatio n/x-php"

elif extension == "cpp" or extension.endsw ith("cc"):
return "text/x-c++-src"
# etcetera
elif extension == "xsl":
return "text/xsl"

elif basename.find( "Makefile" ) != -1:
return "text/x-makefile"
else:
raise NoMimeError
#--------------------------------------------------------------
(don't bother if the MIME detection looks like stone age, it's temporary until
PyXDG gets support for the XDG mime type spec..)

I'm now wondering if it's possible to write this in a more compact way, such
that the if-clause isn't necessary? Of course, the current code works, but
perhaps it could be prettier.

I'm thinking along the lines of nested lists, but what is the obstacle for me
is that both the test and return statement are simple expressions; not
functions or a particular data type. Any ideas?
Cheers,

Frans

Jul 18 '05 #1
15 1606
"Frans Englich" <fr***********@ telia.com> wrote in message
news:ma******** *************** *************** @python.org...

As continuation to a previous thread, "PyChecker messages", I have a question regarding code refactoring which the following snippet leads to:
runner.py:200: Function (detectMimeType ) has too many returns (11)

The function is simply a long "else-if" clause, branching out to
different return statements. What's wrong? It's simply a "probably ugly code" advice?


That is also advice. Generally you use a dict of functions, or some other structure to lookup what you want to do.


More specifically, my function looks like this:

#--------------------------------------------------------------
def detectMimeType( filename ):

extension = filename[-3:]
basename = os.path.basenam e(filename)

if extension == "php":
return "applicatio n/x-php"

elif extension == "cpp" or extension.endsw ith("cc"):
return "text/x-c++-src"
# etcetera

<snip>

Since the majority of your tests will be fairly direct 'extension "XYZ"
means mimetype "aaa/bbb"', this really sounds like a dictionary type
solution is called for. Still, you might have some desire to do some
order-dependent testing. Here are two ideas - the first iterates over a
list of expressions and resulting types, the other uses a dictionary lookup.

-- Paul
import os

extToMimeMap = [
('"php"', "applicatio n/x-php"),
('"cpp" or extension.endsw ith("cc")', "text/x-c++-src"),
('"xsl"', "text/xsl"),
]

def detectMimeType1 ( filename ):

extension = filename[-3:]
basename = os.path.basenam e(filename)

for exp,mimetype in extToMimeMap:
if eval("extension =="+exp): return mimetype

# do other non-extension-related tests here
if basename.find( "Makefile" ) != -1:
return "text/x-makefile"
else:
raise NoMimeError
extToMimeDict = {
"php": "applicatio n/x-php",
"cpp": "text/x-c++-src",
"xsl": "text/xsl",
}

def detectMimeType2 ( filename ):

extension = filename[-3:]
basename = os.path.basenam e(filename)

# check for straight extension matches
try:
return extToMimeDict[extension]
except KeyError:
pass

# do more complex extension and other non-extension-related tests here
if extension.endsw ith("cc"):
return extToMimeDict["cpp"]

if basename.find( "Makefile" ) != -1:
return "text/x-makefile"

raise NoMimeError

for detectMimeType in (detectMimeType 1, detectMimeType2 ):
for s in ("a.php","z.acc ","Makefile","b lork.xsl"):
print s,"->",detectMimeTy pe(s)

Jul 18 '05 #2
I can not break the original code in 2.4, if I try this:
import os, sys
class NoMimeError(Exc eption):
pass

def detectMimeType( filename ):

extension = filename[-3:]
basename = os.path.basenam e(filename)
if extension == "php":
return "applicatio n/x-php"
elif extension == "cpp" or extension.endsw ith("cc"):
return "text/x-c++-src"
elif extension == "1":
return 'BlahBlah'
elif extension == "2":
return 'BlahBlah'
elif extension == "3":
return 'BlahBlah'
elif extension == "4":
return 'BlahBlah'
elif extension == "5":
return 'BlahBlah'
elif extension == "6":
return 'BlahBlah'
elif extension == "7":
return 'BlahBlah'
elif extension == "8":
return 'BlahBlah'
elif extension == "9":
return 'BlahBlah'
elif extension == "10":
return 'BlahBlah'
elif extension == "11":
return 'BlahBlah'
elif extension == "12":
return 'BlahBlah'
elif extension == "13":
return 'BlahBlah'
elif extension == "14":
return 'BlahBlah'
elif extension == "15":
return 'BlahBlah'
elif extension == "16":
return 'BlahBlah'
elif extension == "17":
return 'BlahBlah'
elif extension == "18":
return 'BlahBlah'
elif extension == "19":
return 'BlahBlah'
elif extension == "20":
return 'BlahBlah'

elif extension == "xsl":
return "text/xsl"

elif basename.find( "Makefile" ) != -1:
return "text/x-makefile"
else:
raise NoMimeError
try:
print detectMimeType( r'c:\test.php')
print detectMimeType( 'c:\test.xsl')
print detectMimeType( 'c:\test.xxx')
except Exception, e:
print >> sys.stderr, '%s: %s' %(e.__class__._ _name__, e)

I get
application/x-php
text/xsl
NoMimeError:


So although the dictionary solution is much nicer nothing seems wrong
with your code as it is - or am I missing something?

Jul 18 '05 #3
On Wednesday 12 January 2005 18:56, wi******@hotmai l.com wrote:
I can not break the original code in 2.4, if I try this:
[...]

So although the dictionary solution is much nicer nothing seems wrong
with your code as it is - or am I missing something?


Nope, the current code works. I'm just looking at Python's cool ways of
solving problems. (the matter about 11 returns was a coding-style report from
PyChecker).
Cheers,

Frans

Jul 18 '05 #4
Paul McGuire wrote:
"Frans Englich" <fr***********@ telia.com> wrote in message
news:ma******** *************** *************** @python.org...
#--------------------------------------------------------------
def detectMimeType( filename ):

extension = filename[-3:]

You might consider using os.path.splitex t() here, instead of always
assuming that the last three characters are the extension. That way
you'll be consistent even with extensions like .c, .cc, .h, .gz, etc.

Note that os.path.splitex t() does include the extension separator (the
dot), so that you'll need to test against, e.g., ".php" and ".cpp".
Since the majority of your tests will be fairly direct 'extension "XYZ"
means mimetype "aaa/bbb"', this really sounds like a dictionary type
solution is called for.
I strongly agree with this. The vast majority of your cases seem to
be a direct mapping of extension-string to mimetype-string; using a
dictionary (i.e. mapping ;) ) for this is ideal. For those cases
where you can't key off of an extension string (such as makefiles),
you can do special-case processing if the dictionary lookup fails.

if extension.endsw ith("cc"):
return extToMimeDict["cpp"]


If the intent of this is to catch .cc files, it's easy to add an extra
entry into the dict to map '.cc' to the same string as '.cpp'.

Jeff Shannon
Technician/Programmer
Credit International

Jul 18 '05 #5
On Wed, 12 Jan 2005 18:16:23 +0000, Frans Englich <fr***********@ telia.com> wrote:

As continuation to a previous thread, "PyChecker messages", I have a question
regarding code refactoring which the following snippet leads to:
> runner.py:200: Function (detectMimeType ) has too many returns (11)
>
> The function is simply a long "else-if" clause, branching out to
> different return statements. What's wrong? It's simply a "probably ugly
> code" advice?


That is also advice. Generally you use a dict of functions, or some other
structure to lookup what you want to do.


More specifically, my function looks like this:

#--------------------------------------------------------------
def detectMimeType( filename ):

extension = filename[-3:]
basename = os.path.basenam e(filename)

if extension == "php":
return "applicatio n/x-php"

elif extension == "cpp" or extension.endsw ith("cc"):
return "text/x-c++-src"
# etcetera
elif extension == "xsl":
return "text/xsl"

elif basename.find( "Makefile" ) != -1:
return "text/x-makefile"
else:
raise NoMimeError
#--------------------------------------------------------------
(don't bother if the MIME detection looks like stone age, it's temporary until
PyXDG gets support for the XDG mime type spec..)

I'm now wondering if it's possible to write this in a more compact way, such
that the if-clause isn't necessary? Of course, the current code works, but
perhaps it could be prettier.

I'm thinking along the lines of nested lists, but what is the obstacle for me
is that both the test and return statement are simple expressions; not
functions or a particular data type. Any ideas?

I think I would refactor along these lines: (untested)

extensiondict = dict(
php = 'application/x-php',
cpp = 'text/x-c-src',
# etcetera
xsl = 'test/xsl'
)

def detectMimeType( filename):
extension = os.path.splitex t(filename)[1].replace('.', '')
try: return extensiondict[extension]
except KeyError:
basename = os.path.basenam e(filename)
if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
raise NoMimeError

Regards,
Bengt Richter
Jul 18 '05 #6
On Thu, 13 Jan 2005 01:24:29 GMT, Bengt Richter <bo**@oz.net> wrote:
extensiondict = dict(
php = 'application/x-php',
cpp = 'text/x-c-src',
# etcetera
xsl = 'test/xsl'
)

def detectMimeType( filename):
extension = os.path.splitex t(filename)[1].replace('.', '')
try: return extensiondict[extension]
except KeyError:
basename = os.path.basenam e(filename)
if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
raise NoMimeError


Why not use a regexp based approach.
extensionlist = [
(re.compile(r'. *\.php') , "applicatio n/x-crap-language"),
(re.compile(r'. *\.(cpp|c)') , 'text/x-c-src'),
(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
]
for regexp, mimetype in extensionlist:
if regexp.match(fi lename):
return mimetype

if you were really concerned about efficiency, you could use something like:
class SimpleMatch:
def __init__(self, pattern): self.pattern = pattern
def match(self, subject): return subject[-len(self.patter n):] == self.pattern

Regards,
Stephen Thorne
Jul 18 '05 #7
On Thu, 13 Jan 2005 12:19:06 +1000, Stephen Thorne <st************ @gmail.com> wrote:
On Thu, 13 Jan 2005 01:24:29 GMT, Bengt Richter <bo**@oz.net> wrote:
extensiondict = dict(
php = 'application/x-php',
cpp = 'text/x-c-src',
# etcetera
xsl = 'test/xsl'
)

def detectMimeType( filename):
extension = os.path.splitex t(filename)[1].replace('.', '') extension = os.path.splitex t(filename)[1].replace('.', '').lower() # better
try: return extensiondict[extension]
except KeyError:
basename = os.path.basenam e(filename)
if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
raise NoMimeError
Why not use a regexp based approach.

ISTM the dict setup closely reflects the OP's if/elif tests and makes for an efficient substitute
for the functionality when later used for lookup. The regex list is O(n) and the regexes themselves
are at least that, so I don't see a benefit. If you are going to loop through extensionlist, you
might as well write (untested)

flowerew = filename.lower( ).endswith
for ext, mimetype:
if flowerew(ext): return mimetype
else:
if 'makefile' in filename.lower( ): return 'text/x-makefile'
raise NoMimeError

using a lower case extension list including the dot. I think it would run faster
than a regex, and not scare anyone unnecessarily ;-)

The dict eliminates the loop, and is easy to understand, so IMO it's a better choice.
extensionlis t = [
(re.compile(r' .*\.php') , "applicatio n/x-crap-language"),
(re.compile(r' .*\.(cpp|c)') , 'text/x-c-src'),
(re.compile( r'[Mm]akefile') , 'text/x-makefile'),
]
for regexp, mimetype in extensionlist:
if regexp.match(fi lename):
return mimetype

if you were really concerned about efficiency, you could use something like:
class SimpleMatch:
def __init__(self, pattern): self.pattern = pattern
def match(self, subject): return subject[-len(self.patter n):] == self.pattern


I'm not clear on what you are doing here, but if you think you are going to compete
with the timbot's dict efficiency with a casual few lines, I suspect you are PUI ;-)
(Posting Under the Influence ;-)

Regards,
Bengt Richter
Jul 18 '05 #8
On Thu, 13 Jan 2005 05:18:57 GMT, Bengt Richter <bo**@oz.net> wrote:
On Thu, 13 Jan 2005 12:19:06 +1000, Stephen Thorne <st************ @gmail.com> wrote:
On Thu, 13 Jan 2005 01:24:29 GMT, Bengt Richter <bo**@oz.net> wrote:
extensiondict = dict(
php = 'application/x-php',
cpp = 'text/x-c-src',
# etcetera
xsl = 'test/xsl'
)

def detectMimeType( filename):
extension = os.path.splitex t(filename)[1].replace('.', '') extension = os.path.splitex t(filename)[1].replace('.', '').lower() # better
try: return extensiondict[extension]
except KeyError:
basename = os.path.basenam e(filename)
if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
raise NoMimeError


Why not use a regexp based approach.

ISTM the dict setup closely reflects the OP's if/elif tests and makes for an efficient substitute
for the functionality when later used for lookup. The regex list is O(n) and the regexes themselves
are at least that, so I don't see a benefit. If you are going to loop through extensionlist, you
might as well write (untested)

<code snipped>

*shrug*, O(n*m) actually, where n is the number of mime-types and m is
the length of the extension.
extensionlis t = [
(re.compile(r' .*\.php') , "applicatio n/x-crap-language"),
(re.compile(r' .*\.(cpp|c)') , 'text/x-c-src'),
(re.compile( r'[Mm]akefile') , 'text/x-makefile'),
]
for regexp, mimetype in extensionlist:
if regexp.match(fi lename):
return mimetype

if you were really concerned about efficiency, you could use something like:
class SimpleMatch:
def __init__(self, pattern): self.pattern = pattern
def match(self, subject): return subject[-len(self.patter n):] == self.pattern


I'm not clear on what you are doing here, but if you think you are going to compete
with the timbot's dict efficiency with a casual few lines, I suspect you are PUI ;-)
(Posting Under the Influence ;-)


Sorry about that, what I was trying to say was something along the lines of:

extensionlist = [
(re.compile(r'. *\.php') , "applicatio n/x-crap-language"),
(re.compile(r'. *\.(cpp|c)') , 'text/x-c-src'),
(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
]
can be made more efficient by doing something like this:
extensionlist = [
SimpleMatch(".p hp"), "applicatio n/x-crap-language"),
(re.compile(r'. *\.(cpp|c)') , 'text/x-c-src'),
(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
]
Where SimpleMatch uses a slice and a comparison instead of a regular
expression engine. SimpleMatch and re.compile both return an object
that when you call .match(s) returns a value that can be interpreted
as a boolean.

As for the overall efficiency concerns, I feel that talking about any
of this is premature optimisation. The optimisation that is really
required in this situation is the same as with any
large-switch-statement idiom, be it C or Python. First one must do a
frequency analysis of the inputs to the switch statement in order to
discover the optimal order of tests!

Regards,
Stephen Thorne
Jul 18 '05 #9
Stephen Thorne <st************ @gmail.com> wrote:
Why not use a regexp based approach.


Good idea... You could also use sre.Scanner which is supposed to be
fast like this...

import re, sre

scanner = sre.Scanner([
(r"\.php$", "applicatio n/x-php"),
(r"\.(cc|cpp)$" , "text/x-c++-src"),
(r"\.xsl$", "xsl"),
(r"Makefile", "text/x-makefile"),
(r".", None),
])

def detectMimeType( filename ):
t = scanner.scan(fi lename)[0]
if len(t) < 1:
return None
# raise NoMimeError
return t[0]
for f in ("index.php" , "index.php3 ", "prog.cc", "prog.cpp", "flodge.xsl ", "Makefile", "myMakefile ", "potato.123 "):
print f, detectMimeType( f)

....

prints

index.php application/x-php
index.php3 None
prog.cc text/x-c++-src
prog.cpp text/x-c++-src
flodge.xsl xsl
Makefile text/x-makefile
myMakefile text/x-makefile
potato.123 None

--
Nick Craig-Wood <ni**@craig-wood.com> -- http://www.craig-wood.com/nick
Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
1401
by: Nick | last post by:
Given that n = , , ] then the following code produces what I expect for x in n: for y in n: for z in n: print
7
1255
by: kent sin | last post by:
Please help: I was really blocked here. without goto I really do not known how to do it. The problem is to use PyZ3950 to consult a lists of hosts and for each of them to search for a list of targets. Since the network is undetermined, there were always some exceptions: I would like to allow it to retry for 3 times. Moreover, during the query process,
8
3888
by: Juan | last post by:
Hi, Does anyone if is there any refactoring browser (or some ide supporting refactoring) for C++ ? There are two dead projects in sourceforge, but I found nothing more... Regards, - Juancho
0
1978
by: Andre Baresel | last post by:
Hello together, just a year ago I was searching arround for a tool supporting refactoring for c++. I've seen implementations for java and was impressed how an IDE can help with such a feature. Just rename classes / member / parameter with a mouse click. reduce the size of a method by transforming parts of it into a seperate
8
2016
by: Frank Rizzo | last post by:
I keep hearing this term thrown around. What does it mean in the context of code? Can someone provide a definition and example using concrete code? Thanks.
2
1898
by: ShadowOfTheBeast | last post by:
Hello All, i have been cracking my skull on how to seach a particular reg exp match within a string? it does not seem to happen except the whole arbitrary string is the exact match of the regular expression specified....but the question is how do i search for and get all or a number of specified matches within an arbitrary long string. e.g for example this is my reg exp i am using: @"^-*-$" which will match any string within two...
12
2872
by: Chadwick Boggs | last post by:
I need to perform modulo operations on extremely large numbers. The % operator is giving me number out of range errors and the mod(x, y) function simply seems to return the wrong results. Also, my numerator is in the format of a quoted string, which the mod function can't take. Desparately searching for solutions, Chadwick. ---------------------------(end of broadcast)---------------------------
15
4414
by: Simon Cooke | last post by:
Does anyone know of any tools for refactoring header files? We're using a third party codebase at work, and pretty much every file includes a 50Mb precompiled header file. I'm looking for a tool that will let us figure out which header files are actually needed by each .cpp, and allow us to break this up so that we're not including the world in each one. Ideally, the same tool would also recognize where #includes can be replaced with...
28
1609
by: walterbyrd | last post by:
Python seems to have a log of ways to do collections of arbitrary objects: lists, tuples, dictionaries. But what if I want a collection of non-arbitrary objects? A list of records, or something like that?
0
9497
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10169
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10110
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8993
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7517
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6749
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5398
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5534
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4067
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.