Building Time Based Bins

MCD

Hello, I'm new to python and this group and am trying to build some
bins and was wondering if any of you could kindly help me out. I'm a
bit lost on how to begin.

I have some text files that have a time filed along with 2 other fields
formatted like this >>

1231 23 56
1232 25 79
1234 26 88
1235 22 34
1237 31 85
1239 35 94

This goes on throughout a 12hr. period. I'd like to be able to place
the low and high values of the additional fields in a single line
divided into 5min intervals. So it would look something like this >>

1235 22 88
1240 31 94

I hope that makes sense. Should I be using a module like numarray for
this, or is it possible to just use the native functions? Any ideas
would help me very much.

Thank you - Marcus

Jul 18 '05 #1

Subscribe Post Reply

1483

Michael Spencer

MCD wrote:

Hello, I'm new to python and this group and am trying to build some
bins and was wondering if any of you could kindly help me out. I'm a
bit lost on how to begin.

I have some text files that have a time filed along with 2 other fields
formatted like this >>

1231 23 56
1232 25 79
1234 26 88
1235 22 34
1237 31 85
1239 35 94

This goes on throughout a 12hr. period. I'd like to be able to place
the low and high values of the additional fields in a single line
divided into 5min intervals. So it would look something like this >>

1235 22 88
1240 31 94

I hope that makes sense. Should I be using a module like numarray for
this, or is it possible to just use the native functions? Any ideas
would help me very much.

Thank you - Marcus

This sort of thing would do it:

from itertools import groupby

def splitter(iterable):
"""Takes a line-based iterator, yields a list of values per line
edit this for more sophisticated line-based parsing if required"""
for line in iterable:
yield [int(item) for item in line.split()]

def groupkey(data):
"""Groups times by 5 min resolution. Note this version doesn't work
exactly like the example - so fix if necessary"""
time = data[0]
return time / 100 * 100 + (time % 100) / 5 * 5

def grouper(iterable):
"""Groups and summarizes the lines"""
for time, data in groupby(iterable, groupkey):
data_x = zip(*data) #transform the data from cols to rows
print time, min(data_x[1]), max(data_x[2])

# Exercise it:

source = """1231 23 56
1232 25 79
1234 26 88
1235 22 34
1237 31 85
1239 35 94
"""

grouper(splitter(source.splitlines())) 1230 23 88
1235 22 94

Note this groups by the time at the end of each 5 mins, rather than the
beginning as in your example. If this needs changing, fix groupkey

HTH

Michael

Jul 18 '05 #2

John Machin

On 19 Mar 2005 19:01:05 -0800, "MCD" <mc*******@walla.com> wrote:

Hello, I'm new to python and this group and am trying to build some
bins and was wondering if any of you could kindly help me out. I'm a
bit lost on how to begin.

Are you (extremely) new to computer programming? Is this school
homework? The reason for asking is that the exercise requires no data
structure more complicated than a one-dimensional array of integers
(if one doubts that the times will always be in ascending order), and
*NO* data structures if one is trusting. It can be done easily without
any extra modules or libraries in just about any computer language
ever invented. So, it's not really a Python question. Perhaps you
should be looking at some basic computer programming learning. Python
*is* a really great language for that -- check out the Python website.

Anyway here's one way of doing it -- only the input and output
arrangements are Python-specific. And you don't need iter*.*.* (yet)
:-)

HTH,
John
===========================
C:\junk>type mcd.py
# Look, Ma, no imports!
lines = """\
1231 23 56
1232 25 79
1234 26 88
1235 22 34
1237 31 85
1239 35 94
"""
DUMMY = 9999
bintm = DUMMY
for line in lines.split('\n'): # in practice, open('input_file', 'r'):
if not line: continue
ilist = [int(fld) for fld in line.strip().split()]
print "ilist:", ilist
klock, lo, hi = ilist
newbintm = ((klock + 4) // 5 * 5) % 2400
print "bintm = %d, klock = %d, newbintm = %d" % (bintm, klock,
newbintm)
if newbintm != bintm:
if bintm != DUMMY:
print "==>> %04d %02d %02d" % (bintm, binlo, binhi)
bintm, binlo, binhi = newbintm, lo, hi
else:
binlo = min(binlo, lo)
binhi = max(binhi, hi)
print "end of file ..."
if bintm != DUMMY:
print "==>> %4d %2d %2d" % (bintm, binlo, binhi)

C:\junk>python mcd.py
ilist: [1231, 23, 56]
bintm = 9999, klock = 1231, newbintm = 1235
ilist: [1232, 25, 79]
bintm = 1235, klock = 1232, newbintm = 1235
ilist: [1234, 26, 88]
bintm = 1235, klock = 1234, newbintm = 1235
ilist: [1235, 22, 34]
bintm = 1235, klock = 1235, newbintm = 1235
ilist: [1237, 31, 85]
bintm = 1235, klock = 1237, newbintm = 1240
==>> 1235 22 88
ilist: [1239, 35, 94]
bintm = 1240, klock = 1239, newbintm = 1240
end of file ...
==>> 1240 31 94

C:\junk>
================================

Jul 18 '05 #3

MCD

John Machin wrote:

Are you (extremely) new to computer programming? Is this school
homework?

Lol, yes, I am relatively new to programming... and very new to python.
I have experience working with loops, if thens, and boolean operations,
but I haven't worked with lists or array's as of yet... so this is my
first forray. This isn't homework, I'm long out of school. I've been
wanting to extend my programming abilities and I chose python as the
means to acheiving that goal... so far I really like it :-)

Thank you both for the code. I ended up working with John's because
it's a bit easier for me to get through. I very much appreciate the
code... it taught me quite a few things about how python converts
string's to integers and vice versa. I didn't expect to get thorugh it,
but after looking at it a bit, I did, and was able to modify it so that
I could work with my own files. Yeah!

The only question I have is in regards to being able to sum a field in
a bin. Using sum(hi) returns only the last value... I'm uncertain how
to cumulatively add up the values as the script runs through each line.
Any pointers?

Thank you again for all your help.
Marcus

Jul 18 '05 #4

MCD

Never mind about the summing... I learned that you can do this:

sumhi = 0
sumhi += hi

Cool!

Thanks again.

Jul 18 '05 #5

alessandro -oggei- ogier

MCD wrote:

This goes on throughout a 12hr. period. I'd like to be able to place
the low and high values of the additional fields in a single line
divided into 5min intervals. So it would look something like this >>

1235 22 88
1240 31 94

what about a sane list comprehension madness ? <g>

lines = """\
1231 23 56
1232 25 79
1234 26 88
1235 22 34
1237 31 85
1239 35 94
"""

input = lines.split('\n') # this is your input

div = lambda x: (x-1)/5

l = dict([
(div(x), []) for x,y,z in [
tuple([int(x) for x in x.split()]) for x in input if x
]
])

[
l[x[0]].append(x[1]) for x in
[
[div(x), (x,y,z)] for x,y,z in
[
tuple([int(x) for x in x.split()]) for x in input if x
]
]
]

print [
[max([x[0] for x in l[j]]),
min([x[1] for x in l[j]]),
max([x[2] for x in l[j]])
] for j in dict([
(div(x), []) for x,y,z in [
tuple([int(x) for x in x.split()]) for x in input
if x
]
]).keys()
]
i think it's a bit memory hungry, though

cya,
--
Alessandro "oggei" Ogier <al**************@unimib.it>
gpg --keyserver pgp.mit.edu --recv-keys EEBB4D0D

Jul 18 '05 #6

MCD

Thanks Alessandro... I'll have to try that as well.

I have a modified working version of John's code (thanks John!). I'm
able to output the bins by 5min intervals, sum one of the fields, and
get the high and low of each field. So far I'm really happy with how it
works. Thank you to everybody.

The only thing that I'd like to do, which I've been racking my brain on
how to do in python... is how to keep track of the bins, so that I can
refer back to them. For instance, if I wanted to get "binlo" from two
bins back... in the scripting language I was working with (pascal
based) you could create a counting series:

for binlo = binlo - 1 do
begin

2binlosBack = (binlo - 2)

# if it was 12:00, I'd be looking back to 11:50

I would really appreciat if anyone could explain to me how this could
be accomplished using python grammar... or perhaps some other method
"look back" which I'm unable to conceive of.

Many thanks,
Marcus

Jul 18 '05 #7

Michael Spencer

MCD wrote:

Thanks Alessandro... I'll have to try that as well.

I have a modified working version of John's code (thanks John!). I'm
able to output the bins by 5min intervals, sum one of the fields, and
get the high and low of each field. So far I'm really happy with how it
works. Thank you to everybody.

The only thing that I'd like to do, which I've been racking my brain on
how to do in python... is how to keep track of the bins, so that I can
refer back to them. For instance, if I wanted to get "binlo" from two
bins back... in the scripting language I was working with (pascal
based) you could create a counting series:

for binlo = binlo - 1 do
begin

2binlosBack = (binlo - 2)

# if it was 12:00, I'd be looking back to 11:50

I would really appreciat if anyone could explain to me how this could
be accomplished using python grammar... or perhaps some other method
"look back" which I'm unable to conceive of.

Many thanks,
Marcus

Just append the results to a list as you go:
bins = []

for bin in ... # whichever method you use to get each new bin
bins.append(bin)

Then refer to previous bins using negative index (starting at -1 for the most
recent):
e.g., 2binlosBack = bins[-3]

Michael

Jul 18 '05 #8

MCD

Hi Michael, thanks for responding. I actually don't use a method to get
each bin... the bin outputs are nested in the loop. Here's my code:

data_file = open('G:\file.txt')
DUMMY = 9999
bintm = DUMMY
for line in data_file:
fields = line.strip().split()
if not line: continue
ilist = [int(time), int(a)]
# print "ilist:", ilist
klock, a = ilist
newbintm = ((klock + 4) // 5 * 5 ) % 2400
print "bintm = %d, newbintm = %d, a = %d" % (bintm, newbintm, a)
# the above is the raw data and now the bin loop
if bintm == 9999:
bintm = newbintm
binlo = a
elif bintm == newbintm:
binlo = min(binl, t)
else:
print " ==>> %04d %2d" % (bintm, binl) ## this is the bin
bintm = newbintm
binl = a

#-------------------

the input file is in my first post in this thread, the output looks
like:

bintm = 9999, newbintm = 1235, a = 23
bintm = 1235, newbintm = 1235, a = 25
bintm = 1235, newbintm = 1235, a = 26
bintm = 1235, newbintm = 1240, a = 22
==>> 1235 23
bintm = 1240, newbintm = 1240, a = 31
bintm = 1240, newbintm = 1240, a = 35

#---------------------

I'm not sure where I could create the new list without it getting
overwritten in the bin loop. Confused as to how to add the append
method in a for loop without a defined method for the current bin.
Anyway, I'll keep at it, but I'm not sure how to execute it. Thank you
very much for your suggestion.

Marcus

Jul 18 '05 #9

Michael Spencer

MCD wrote:

Hi Michael, thanks for responding. I actually don't use a method to get
each bin...
That's because you picked the wrong suggestion ;-) No, seriously, you can do it
easily with this approach:
the bin outputs are nested in the loop. Here's my code: data_file = open('G:\file.txt')
DUMMY = 9999
bintm = DUMMY bins = [] for line in data_file:
fields = line.strip().split()
if not line: continue
ilist = [int(time), int(a)] (BTW, there must be more to your code than you have shared for the above line to
execute without raising an exception - where are 'time' and 'a' initially bound?
BTW2, 'time' is the name of a stdlib module, so it's bad practice to use it as
an identifier) # print "ilist:", ilist
klock, a = ilist
newbintm = ((klock + 4) // 5 * 5 ) % 2400
print "bintm = %d, newbintm = %d, a = %d" % (bintm, newbintm, a)
# the above is the raw data and now the bin loop
if bintm == 9999:
bintm = newbintm
binlo = a
elif bintm == newbintm:
binlo = min(binl, t)
else:
print " ==>> %04d %2d" % (bintm, binl) ## this is the bin This is where you've declared that you have a bin, so add it to the bins cache:
bins.append((bintm, binl)) bintm = newbintm
binl = a

Michael

Jul 18 '05 #10

MCD

Ok, thanks Michael, I got it sorted out now. It was just a question of
placing the append statement and the new list in the right place. I
also added a delete command so the list doesn't become too huge,
especially when there's no need to keep it. Here's the corrected code:

if bintm == 9999:
bintm = newbintm
binlo = a
lastbinlo = [binlo] ## new bin creation
elif bintm == newbintm:
binlo = min(binl, t)
else:
if len(lastbinlo) > 1: ## check for append data
del lastbinlo(0) ## delete extras
lastbinlo.append(binlo) ## append new data here
print lastbinlo[-2]
print " ==>> %04d %2d" % (bintm, binl) ## this is the bin
bintm = newbintm
binlo = a

Anyway, many thanks to everyone who helped with this code.

Best regards,
Marcus

Jul 18 '05 #11

MCD

Michael Spencer wrote:

(BTW, there must be more to your code than you have shared for the above line to execute without raising an exception - where are 'time' and 'a' initially bound? BTW2, 'time' is the name of a stdlib module, so it's bad practice to use it as an identifier)
Yes there is more, I was copy/pasting a bit haphazardly as I see now.
You're right about the identifier, I changed it in my current code to
"t".

print " ==>> %04d %2d" % (bintm, binl) ## this is the bin

This is where you've declared that you have a bin, so add it to the

bins cache: bins.append((bintm, binl))
bintm = newbintm
binl = a

Michael

Thanks Michael, I haven't been able to read my mail so I ended up
placing the append a bit differently than the way you described, and
somehow got it working... your way looks much easier :-). I'm going to
try that right now.

I've mostly been racking my brain with this bit of code:

newtm = ((klock + 4) // 5 * 5 ) % 2400

It works ok until you get to the last five minutes of the hour. For
instance, 956 will return 960... oops, that's not gonna work :). I
don't completely understand how this code is doing what it's doing...
I've played around with different values, but it's still a bit of a
mystery in coming up with a solution. My only work around that I've
been able to come up with is to add 40 to newtm when the last 2 digits
are at 60, but I'm still working on how to do that.

Anyway, thanks for your help, mentioning the append function... that
really opened up a lot of solutions/possibilities for me.

Take care,
Marcus

Jul 18 '05 #12

Michael Spencer

MCD wrote:

I've mostly been racking my brain with this bit of code:

newtm = ((klock + 4) // 5 * 5 ) % 2400

You might want to take another look at the first reply I sent you: it contains a
function that does this:

def groupkey(data):
"""Groups times by 5 min resolution. Note this version doesn't work
exactly like the example - so fix if necessary"""
time = data[0]
return time / 100 * 100 + (time % 100) / 5 * 5

# test it:

for i in range(900,959): print groupkey([i]), ...
900 900 900 900 900 905 905 905 905 905 910 910 910 910 910 915 915 915 915
915 920 920 920 920 920 925 925 925 925 925 930 930 930 930 930 935 935 935 935
935 940 940 940 940 940 945 945 945 945 945 950 950 950 950 950 955 955 955 955

It rounds down, for the reason you have come across

Michael

Jul 18 '05 #13

Similar topics

Building a Histogram in JAI

by: Oracle3001 | last post by:

Hi All, I am trying to use JAI to build a histogram of an image i have. I have posted the code below, and the error I get at runtime. I have taken the code from the offical java examples, so I am...

Java

time consuming loops over lists

by: querypk | last post by:

X-No-Archive: yes Can some one help me improve this block of code...this jus converts the list of data into tokens based on the range it falls into...but it takes a long time.Can someone tell me...

Python

XSL-FO -> PCL dynamically selecting printer paper bins / drawers / trays

by: Mark Wheadon | last post by:

Hello, We have a 'standard letters' type app that can produce the letter as an XSL-FO document. We have used FOP to produce PDFs and are quite impressed. We also need to be able to print the...

.NET Framework

execution time becomes unpredictable?!

by: vashwath | last post by:

Might be off topic but I don't know where to post this question.Hope some body clears my doubt. The coding standard of the project which I am working on say's not to use malloc.When I asked my...

C / C++

Building a web application with OO

by: Nick | last post by:

I am doing some research into building web applications using Object Oriented techniques. I have found the excellent patterns section on the MSDN site, but other than that I cannot find any good,...

ASP.NET

building dynamic list of web from controls

by: scottrm | last post by:

Is there a way to generate a list of say textbox controls dynamically at run time, based on say a value coming out of a database which could vary each time the code is run. In traditional asp you...

ASP.NET

Example of retrieving Printer BINS in VB.NET

by: Kevin | last post by:

I'm trying to convert my VB6 program to VB2005. In my VB6 program I'm able to retrieve the paper bins for the selected printer and set the bin when printing. I'm using the DeviceCapabilities API...

Visual Basic .NET

Adding item and updating 'bins' array.

by: Deadvacahead | last post by:

I am rather new to programming and am coming across a problem with my current program. The program shows bins full of parts. It shows the name of the bin and then how many parts are in the bin. The...

C / C++

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++