473,830 Members | 2,062 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Python Data Utils

In an effort to experiment with open source, I put a couple of my
utility files up <a href="http://github.com/jessald/python_data_uti ls/
tree/master">here</a>. What do you think?
Apr 6 '08 #1
6 1707
En Sun, 06 Apr 2008 01:43:29 -0300, Jesse Aldridge
<Je***********@ gmail.comescrib ió:
In an effort to experiment with open source, I put a couple of my
utility files up <a href="http://github.com/jessald/python_data_uti ls/
tree/master">here</a>. What do you think?
Some names are a bit obscure - "universify "?
Docstrings would help too, and blank lines, and in general following PEP8
style guide.
find_string is a much slower version of the find method of string objects,
same for find_string_las t, contains and others.
And I don't see what you gain from things like:
def count( s, sub ):
return s.count( sub )
it's slower and harder to read (because one has to *know* what S.count
does).
Other functions may be useful but without even a docstring it's hard to
tell what they do.
delete_string, as a function, looks like it should delete some string, not
return a character; I'd use a string constant DELETE_CHAR, or just DEL,
it's name in ASCII.

In general, None should be compared using `is` instead of `==`, and
instead of `type(x) is type(0)` or `type(x) == type(0)` I'd use
`isinstance(x, int)` (unless you use Python 2.1 or older, int, float, str,
list... are types themselves)

Files.py is similar - a lot of more or less common things with a different
name, and a few wheels reinvented :)

Don't feel bad, but I would not use those modules because there is no net
gain, and even a loss in legibility. If you develop your code alone,
that's fine, you know what you wrote and can use it whenever you please.
But for others to use it, it means that they have to learn new ways to say
the same old thing.

--
Gabriel Genellina

Apr 6 '08 #2
On Sun, Apr 6, 2008 at 7:43 AM, Jesse Aldridge <Je***********@ gmail.comwrote:
In an effort to experiment with open source, I put a couple of my
utility files up <a href="http://github.com/jessald/python_data_uti ls/
tree/master">here</a>. What do you think?
Would you search for, install, learn and use these modules if *someone
else* created them?

--
kv
Apr 6 '08 #3
Thanks for the detailed feedback. I made a lot of modifications based
on your advice. Mind taking another look?
Some names are a bit obscure - "universify "?
Docstrings would help too, and blank lines
I changed the name of universify and added a docstrings to every
function.
...PEP8
I made a few changes in this direction, feel free to take it the rest
of the way ;)
find_string is a much slower version of the find method of string objects,*
Got rid of find_string, and contains. What are the others?
And I don't see what you gain from things like:
def count( s, sub ):
* * *return s.count( sub )
Yeah, got rid of that stuff too. I ported these files from Java a
while ago, so there was a bit of junk like this lying around.
delete_string, as a function, looks like it should delete some string, not*
return a character; I'd use a string constant DELETE_CHAR, or just DEL, *
it's name in ASCII.
Got rid of that too :)
In general, None should be compared using `is` instead of `==`, and *
instead of `type(x) is type(0)` or `type(x) == type(0)` I'd use *
`isinstance(x, int)` (unless you use Python 2.1 or older, int, float, str,*
list... are types themselves)
Changed.

So, yeah, hopefully things are better now.

Soon developers will flock from all over the world to build this into
the greatest data manipulation library the world has ever seen! ...or
not...

I'm tired. Making code for other people is too much work :)
Apr 6 '08 #4
On Apr 6, 6:14*am, "Konstantin Veretennicov" <kveretenni...@ gmail.com>
wrote:
On Sun, Apr 6, 2008 at 7:43 AM, Jesse Aldridge <JesseAldri...@ gmail.comwrote:
In an effort to experiment with open source, I put a couple of my
*utility files up <a href="http://github.com/jessald/python_data_uti ls/
*tree/master">here</a>. *What do you think?

Would you search for, install, learn and use these modules if *someone
else* created them?

--
kv
Yes, I would. I searched a bit for a library that offered similar
functionality. I didn't find anything. Maybe I'm just looking in the
wrong place. Any suggestions?
Apr 6 '08 #5
Docstrings go *after* the def statement.

Fixed.
changing "( " to "(" and " )" to ")".
Changed.
I attempted to take out everything that could be trivially implemented
with the standard library.
This has left me with... 4 functions in S.py. 1 one of them is used
internally, and the others aren't terribly awesome :\ But I think the
ones that remain are at least a bit useful :)
The penny drops :-)
yeah, yeah
Not in all places ... look at the ends_with function. BTW, this should
be named something like "fuzzy_ends_wit h".
fixed
fuzzy_match(Non e, None) should return False.
changed
2. make_fuzzy function: first two statements should read "s =
s.replace(..... )" instead of "s.replace(.... .)".
fixed
3. Fuzzy matching functions are specialised to an application; I can't
imagine that anyone would be particularly interested in those that you
provide.
I think it's useful in many cases. I use it all the time. It helps
guard against annoying input errors.
A basic string normalisation-before-comparison function would
usefully include replacing multiple internal whitespace characters by
a single space.
I added this functionality.

5. Casual inspection of your indentation function gave the impression
that it was stuffed
Fixed

Thanks for the feedback.
Apr 7 '08 #6
On Apr 7, 4:22*pm, Jesse Aldridge <JesseAldri...@ gmail.comwrote:
>
changing "( " to "(" and " )" to ")".

Changed.
But then you introduced more.
>
I attempted to take out everything that could be trivially implemented
with the standard library.
This has left me with... 4 functions in S.py. *1 one of them is used
internally, and the others aren't terribly awesome :\ *But I think the
ones that remain are at least a bit useful :)
If you want to look at stuff that can't be implemented trivially using
str/unicode methods, and is more than a bit useful, google for
mxTextTools.
>
A basic string normalisation-before-comparison function would
usefully include replacing multiple internal whitespace characters by
a single space.

I added this functionality.
Not quite. I said "whitespace ", not "space".

The following is the standard Python idiom for removing leading and
trailing whitespace and replacing one or more whitespace characters
with a single space:

def normalise_white space(s):
return ' '.join(s.split( ))

If your data is obtained by web scraping, you may find some people use
'\xA0' aka NBSP to pad out fields. The above code will get rid of
these if s is unicode; if s is str, you need to chuck
a .replace('\xA0' , ' ') in there somewhere.

HTH,
John

Apr 7 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
2418
by: Chris McKeever | last post by:
I am trying to modify the Mailman Python code to stop mapping MIME-types and use the extension of the attachment instead. I am pretty much clueless as to what I need to do here, but I think I have narrowed it down to the Scrubber.py file.. If this seems like a quick step me through, I would be very appreciative, could get you something on your Amazon wish-list (that is me on my knees begging).. From just my basic understanding, it...
1
1671
by: Roman Yakovenko | last post by:
Hi. I need help( or solution :-) ). Problem: my project has 3 packages prj +------A +------B +------Utils The question is: what is the right way to use functionality from Utils in A and B packages ? I know that after importing package I know it's location, but it doesn't help me in any module within package A or B. I can add full path to prj to sys.path but it seems to be the wrong way.
9
1487
by: Tian | last post by:
I want to create a object directory called Context in my program, which is based on a dict to save and retrieve values/objects by string-type name. I have the definition like this: utils.py -------------------- global sysctx class Context: def __init__(self):
2
2263
by: BrianS | last post by:
Hi, I'm trying to learn Python and wanted to play with Tkinter. I couldn't get it to work so I figured it would help if I installed the newest verison of Python. I downloaded the source, compiled it and installed it. No problem. The next time I booted my machine I the following errors when it tried to start CUPS: Traceback (most recent call last): File "/usr/sbin/printconf-backend", line 6, in ?
1
3903
by: praba kar | last post by:
Dear All, In Php we can print RFC 2822 formatted date by date('r') with parameter r. Then it will print the below format date. "Thu, 7 Apr 2005 01:46:36 -0300". I want to print same RFC 2822 format in python. Is it possible in python? . If possible kindly mention the function related to print RFC format date
5
2530
by: Ramon Diaz-Uriarte | last post by:
Dear All, Has anybody tried to use ID Utils (http://www.gnu.org/software/idutils/46) with Python? I've googled, searched the mailing list, and have found nothing. A silly, simple use of IDUtils with Python code does work, using a language map that says *.py files are text files. But I am wondering if someone has done something more sophisticated. (For instance, I get matches to commented out functions, which I'd rather not, lots of...
0
1637
by: Nico Grubert | last post by:
Hi there, I wrote a short python script that sends an email using python's email module and I am using Python 2.3.5. The problem is, that umlauts are not displayed properly in some email clients: + On a windows machine running thunderbird 1.0.2 umlauts are displayed properly. The email header contains "Content-type: text/plain; charset=utf-8"
0
272
by: Kurt B. Kaiser | last post by:
Patch / Bug Summary ___________________ Patches : 423 open ( +2) / 3539 closed ( +9) / 3962 total (+11) Bugs : 960 open ( -3) / 6446 closed (+20) / 7406 total (+17) RFE : 258 open ( +3) / 249 closed ( +3) / 507 total ( +6) New / Reopened Patches ______________________
0
1181
by: rkmr.em | last post by:
the memory usage of a python app keeps growing in a x86 64 linux continuously, whereas in 32 bit linux this is not the case. Python version in both 32 bit and 64 bit linux - 2.6.24.4-64.fc8 Python 2.5.1 (r251:54863, Oct 30 2007, 13:45:26) i isolated the memory leak problem to a function that uses datetime module extensively. i use datetime.datetime.strptime, datetime.timedelta, datetime.datetime.now methods... i tried to get some info...
2
2653
by: Gabriel Rossetti | last post by:
Hello everyone, I'm trying to use python's freeze utility but I'm running into problems. I called it like this : python /usr/share/doc/python2.5/examples/Tools/freeze/freeze.py ~/Documents/Code/Python/src/jester/service.py -m jester then I did : make
0
10769
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10477
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10522
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10197
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6944
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5615
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4408
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3956
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3072
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.