473,412 Members | 4,196 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,412 software developers and data experts.

Set like feature

Hi,

I have a list of space delimited strings ending in a newline.
Eg: a = ['a sfds sdf s df 34 ew\n', 'df sdf s f s ssf\n']

Now inside each row, I have a space delimited list of fields.

Now I want to compare the fields in each row of the array and see which
fields do not match.

Think of it as a 2 dimensional array of size mn, and comparing each
each element on a column by column basis.

I am using python2.2 so no sets. Can anyone think of an efficient way
to do this?

Thanks,

Hari

Jul 18 '05 #1
6 1387
On 15 Nov 2004 13:01:52 -0800, Hari Pulapaka <ha****@gmail.com> wrote:
Hi,

I have a list of space delimited strings ending in a newline.
Eg: a = ['a sfds sdf s df 34 ew\n', 'df sdf s f s ssf\n']

Now inside each row, I have a space delimited list of fields.

Now I want to compare the fields in each row of the array and see which
fields do not match.

Think of it as a 2 dimensional array of size mn, and comparing each
each element on a column by column basis.

I am using python2.2 so no sets. Can anyone think of an efficient way
to do this?


If I understand the problem correctly, splitting the lines up and sorting
them before comparison _is_ much better than a naive approach, though I
don't know if that's what's best.

--
Mitja
Jul 18 '05 #2
Hari Pulapaka <ha****@gmail.com> wrote:
I have a list of space delimited strings ending in a newline.
Eg: a = ['a sfds sdf s df 34 ew\n', 'df sdf s f s ssf\n']

Now inside each row, I have a space delimited list of fields.

Now I want to compare the fields in each row of the array and see which
fields do not match.

Think of it as a 2 dimensional array of size mn, and comparing each
each element on a column by column basis.

I am using python2.2 so no sets. Can anyone think of an efficient way
to do this?


Do you want to compare corresponding fields? That's the only way I can
read that 'column by column basis', and thus I don't see what sets could
possibly have to do with it.

Do you want to compare each row with every other row? I also note in
your example that the number of fields in each row appear to be
variable, so how do you want to deal with 'missing' fields?

Too many unanswered questions, I guess. But for some specified set of
answers to those question, you might do...:

def compare_fields(i, j, base, other):
for k, f1, f2 in zip(xrange(sys.maxint), base, other):
if f1 != f2:
print 'DIFF', i, j, k, repr(f1), repr(f2)

def lots_of_compares(list_of_strings):
list_of_lists_of_fields = [row.split() for row in list_of_strings]
num_rows = len(list_of_lists_of_fields)
for i in xrange(num_rows):
base_row = list_of_lists_of_fields[i]
for j in xrange(i+1, num_rows):
compare_fields(i, j, base_row, list_of_lists_of_fields[j])

You can do better with enumerate, itertools and other things which 2.2
didn't have, but sets wouldn't help. Now, I hope this clarifies the
many unanswered questions which your 'specs' leave open, so you can work
out exactly what you want.

And, btw: upgrate to 2.4. Sets or no sets, the performance enhancement
by itself will be vastly sufficient to repay whatever inconvenience you
think the upgrade might cause.
Alex
Jul 18 '05 #3
Alex Martelli wrote:

Do you want to compare corresponding fields? That's the only way I can read that 'column by column basis', and thus I don't see what sets could possibly have to do with it.

Do you want to compare each row with every other row? I also note in
your example that the number of fields in each row appear to be
variable, so how do you want to deal with 'missing' fields?
I want to comapre every element in each row with the element in the
remaining rows having the same column position. The rows need not have
the same number of elements, in which case I have to do some more
thinking :)

I was thinking of making each row of the array as a set and then
comparing each row of the array with the compare function being the set
intersection operation.

You have pretty much captured what I was thinking, and my solution is
also similar to what you showed.

Too many unanswered questions, I guess. But for some specified set of answers to those question, you might do...:

def compare_fields(i, j, base, other):
for k, f1, f2 in zip(xrange(sys.maxint), base, other):
if f1 != f2:
print 'DIFF', i, j, k, repr(f1), repr(f2)

def lots_of_compares(list_of_strings):
list_of_lists_of_fields = [row.split() for row in list_of_strings] num_rows = len(list_of_lists_of_fields)
for i in xrange(num_rows):
base_row = list_of_lists_of_fields[i]
for j in xrange(i+1, num_rows):
compare_fields(i, j, base_row, list_of_lists_of_fields[j])
Thanks for your help.

You can do better with enumerate, itertools and other things which 2.2 didn't have, but sets wouldn't help. Now, I hope this clarifies the
many unanswered questions which your 'specs' leave open, so you can work out exactly what you want.

And, btw: upgrate to 2.4. Sets or no sets, the performance enhancement by itself will be vastly sufficient to repay whatever inconvenience you think the upgrade might
Not in my hands.

- Hari

Alex


Jul 18 '05 #4
Hari Pulapaka <ha****@gmail.com> wrote:
I want to comapre every element in each row with the element in the
remaining rows having the same column position. The rows need not have
the same number of elements, in which case I have to do some more
thinking :)

I was thinking of making each row of the array as a set and then
comparing each row of the array with the compare function being the set
intersection operation.


Sets have no order, so that just woudln't work the way you state it.
Rows 'a b' and 'b a' would appear identical, so the "having the same
column position" condition would not be respected.

You could maybe use a set(enumerate(therow.split())) -- but intersecting
such sets would be of dubious utility. Maybe you mean symmetric
difference (union minus intersection), but even then you'd still have to
proceed in order to investigate which item of that difference comes from
which of the two rows (assuming you do care -- hard to tell from here).

I believe gadfly comes with a fast C-coded extension called kjbuckets
which might help with this kind of things (and a Python-coded
'fallback', not all that fast but easily portable, too). You might want
to investigate that, if there's a chance you could get C-coded
extensions installed on your Python 2.2 installation.
Alex
Jul 18 '05 #5
Mitja <nu*@example.com> wrote:
each element on a column by column basis.

I am using python2.2 so no sets. Can anyone think of an efficient way
to do this?


If I understand the problem correctly, splitting the lines up and sorting
them before comparison _is_ much better than a naive approach, though I
don't know if that's what's best.


Splitting, sure. Sorting would destroy the 'column by column basis'.
Alex
Jul 18 '05 #6
On Tue, 16 Nov 2004 00:05:04 +0100, Alex Martelli <al*****@yahoo.com>
wrote:
Mitja <nu*@example.com> wrote:
> each element on a column by column basis.
>
> I am using python2.2 so no sets. Can anyone think of an efficient way
> to do this?


If I understand the problem correctly, splitting the lines up and
sorting
them before comparison _is_ much better than a naive approach, though I
don't know if that's what's best.


Splitting, sure. Sorting would destroy the 'column by column basis'.


I wasn't sure what OP really wanted; I saw both the "column by column"
thing and the bit about sets, which is contradicting, so I assumed he was
after sets-like behavior. [wrongly, as later posts clarified]

--
Mitja
Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: dayzman | last post by:
Hi, I've read somewhere that feature-based analysis can be used to extract the semantic structure of HTML documents. By semantic structure, they mean the model of the rendered view a reader...
5
by: scsharma | last post by:
Hi, I am using .Net on Windows XP Professional. I am facing a unique problem with visual studio where intellisense feature is getting turned off. If i close IDE and reopen my solution the...
4
by: christopher diggins | last post by:
A feature that I find signficantly missing in C# is the ability to write functions in interfaces that can call other functions of the interface. Given an interface ISomeInteface the only way we can...
18
by: Kamen Yotov | last post by:
hi all, i first posted this on http://msdn.microsoft.com/vcsharp/team/language/ask/default.aspx (ask a c# language designer) a couple of days ago, but no response so far... therefore, i am...
7
by: Russell Mangel | last post by:
I was thinking about what IDE feature I would want the most in the next version of VC++? I would definately ask for the C# feature: #region / #endregion. In my opinion this feature was a...
30
by: Raymond Hettinger | last post by:
Proposal -------- I am gathering data to evaluate a request for an alternate version of itertools.izip() with a None fill-in feature like that for the built-in map() function: >>> map(None,...
12
by: Raymond Hettinger | last post by:
I am evaluating a request for an alternate version of itertools.izip() that has a None fill-in feature like the built-in map function: >>> map(None, 'abc', '12345') # demonstrate map's None...
12
by: =?Utf-8?B?RGFyYSBQ?= | last post by:
Would like to know from the crowd that why do we require a Partial Class. By having such feature aren't we going out of the scope of Entity Concept. Help me to understand in what context this...
20
by: Luke R | last post by:
One thing i used to use alot in vb6 was that tiny little button in the bottom left of the code window which allowed you to view either the whole code file, or just the function/sub you are currenly...
10
by: Conrad Lender | last post by:
In a recent thread in this group, I said that in some cases object detection and feature tests weren't sufficient in the development of cross-browser applications, and that there were situations...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.