473,549 Members | 2,573 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

splitting a long string into a list

I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the .split()
function, it doesn't seem to do what I want it to - any ideas?

thanks very much for your help.

r-sr-
longstring = 'Agricultural subsidies; Foreign aidAgriculture;
Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
Childhood Development, Birth Defects; Toxic ChemicalsAntibi otics,
AnimalsAgricult ural Subsidies, Global TradeAgricultur al
SubsidiesBiodiv ersityCitizen ActivismCommuni ty
GardensCooperat ivesDietingAgri culture, CottonAgricultu re, Global
TradePesticides , MonsantoAgricul ture, SeedCoffee, HungerPollution ,
Water, FeedlotsFood PricesAgricultu re, WorkersAnimal Feed, Corn,
PesticidesAquac ultureChemical
WarfareCompostD ebtConsumerismF earPesticides, US, Childhood Development,
Birth DefectsCorporat e Reform, Personhood (Dem. Book)Corporate Reform,
Personhood, Farming (Dem. Book)Crime Rates, Legislation,
EducationDebt, Credit CardsDemocracyP opulation, WorldIncomeDemo cracy,
Corporate Personhood, Porter Township (Dem. Book)Disaster
ReliefDwellings , SlumsEconomics, MexicoEconomy, LocalEducation,
ProtestsEndange red Habitat, RainforestEndan gered SpeciesEndanger ed
Species, Extinctionantib iotics, livestockAgricu ltural subsidies;
Foreign aid;Agriculture ; Sustainable Agriculture - Support; Organic
Agriculture; Pesticides, US, Childhood Development, Birth Defects;
Toxic Chemicals;Antib iotics, Animals;Agricul tural Subsidies, Global
Trade;Agricultu ral Subsidies;Biodi versity;Citizen Activism;Commun ity
Gardens;Coopera tives;Dieting;A griculture, Cotton;Agricult ure, Global
Trade;Pesticide s, Monsanto;Agricu lture, Seed;Coffee, Hunger;Pollutio n,
Water, Feedlots;Food Prices;Agricult ure, Workers;Animal Feed, Corn,
Pesticides;Aqua culture;Chemica l
Warfare;Compost ;Debt;Consumeri sm;Fear;Pestici des, US, Childhood
Development, Birth Defects;Corpora te Reform, Personhood (Dem.
Book);Corporate Reform, Personhood, Farming (Dem. Book);Crime Rates,
Legislation, Education;Debt, Credit Cards;Democracy ;Population,
World;Income;De mocracy, Corporate Personhood, Porter Township (Dem.
Book);Disaster Relief;Dwelling s, Slums;Economics , Mexico;Economy,
Local;Education , Protests;Endang ered Habitat, Rainforest;Enda ngered
Species;Endange red Species, Extinction;anti biotics,
livestock;Pesti cides, Water;Environme nt, Environmentalis t;Food, Hunger,
Agriculture, Aid, World, Development;Agr iculture, Cotton
Trade;Agricultu re, Cotton, Africa;Environm ent, Energy;Fair Trade (Dem.
Book);Farmland, Sprawl;Fast Food, Globalization, Mapping;depress ion,
mental illness, mood disorders;Econo mic Democracy, Corporate
Personhood;Braz il, citizen activism, hope, inspiration, labor
issues;citizen activism, advice, hope;Pharmaceut icals, Medicine,
Drugs;Community Investing;Envir onment, Consumer Waste Reduction,
Consumer Behavior and Taxes;Hunger, US, Poverty;FERTILI TY,
Women;Agricultu ral subsidies; Foreign aid;Agriculture ; Sustainable
Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
Development, Birth Defects; Toxic Chemicals;Antib iotics,
Animals;Agricul tural Subsidies, Global Trade;Agricultu ral
Subsidies;Biodi versity;Citizen Activism;Commun ity
Gardens;Coopera tives;Dieting;A gricultural subsidies; Foreign
aid;Agriculture ; Sustainable Agriculture - Support; Organic
Agriculture; Pesticides, US, Childhood Development, Birth Defects;
Toxic Chemicals;Antib iotics, Animals;Agricul tural Subsidies, Global
Trade;Agricultu ral Subsidies;Biodi versity;Citizen Activism;Commun ity
Gardens;Coopera tives;Dieting;A griculture, Cotton;Agricult ure, Global
Trade;Pesticide s, Monsanto;Agricu lture, Seed;Coffee, Hunger;Pollutio n,
Water, Feedlots;Food Prices;Agricult ure, Workers;Animal Feed, Corn,
Pesticides;Aqua culture;Chemica l
Warfare;Compost ;Debt;Consumeri sm;Fear;Pestici des, US, Childhood
Development, Birth Defects;Corpora te Reform, Personhood (Dem.
Book);Corporate Reform, Personhood, Farming (Dem. Book);Crime Rates,
Legislation, Education;Debt, Credit Cards;'

Nov 28 '06 #1
8 2566
What exactly seems to be the problem?
"ronrsr" <ro****@gmail.c omwrote in message
news:11******** *************@1 4g2000cws.googl egroups.com...
>I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the .split()
function, it doesn't seem to do what I want it to - any ideas?

thanks very much for your help.

r-sr-

Nov 28 '06 #2
ronrsr wrote:
I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the .split()
function, it doesn't seem to do what I want it to - any ideas?
Did you follow the recommendations given to you the last time you asked this
question? What did you try? What results do you want to get?

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Nov 28 '06 #3
still having a heckuva time with this.

here's where it stand - the split function doesn't seem to work the way
i expect it to.
longkw1,type(lo ngkw): Agricultural subsidies; Foreign
aid;Agriculture ; Sustainable Agriculture - Support; Organic
Agriculture; Pesticides, US, Childhood Development, Birth Defects;
<type 'list'1

longkw.replace( ',',';')

Agricultural subsidies; Foreign aid;Agriculture ; Sustainable
Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
Development
kw = longkw.split("; ,") #kw is now a list of len 1

kw,typekw= ['Agricultural subsidies; Foreign aid;Agriculture ;
Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
Childhood Development, Birth Defects; Toxic Chemicals;Antib iotics,
Animals;Agricul tural Subsidies
what I would like is to break the string into a list of the delimited
words, but have had no luck doing that - I thought split wuld do that,
but it doens't.

bests,

-rsr-
Robert Kern wrote:
ronrsr wrote:
I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the .split()
function, it doesn't seem to do what I want it to - any ideas?

Did you follow the recommendations given to you the last time you asked this
question? What did you try? What results do you want to get?

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
Nov 28 '06 #4
"ronrsr" <ro****@gmail.c omwrote:
>I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the .split()
function, it doesn't seem to do what I want it to - any ideas?

thanks very much for your help.

r-sr-
longstring = 'Agricultural subsidies; Foreign aidAgriculture;
Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
Childhood Development, Birth Defects; Toxic ChemicalsAntibi otics,
AnimalsAgricul tural Subsidies, Global TradeAgricultur al
SubsidiesBiodi versityCitizen ActivismCommuni ty...
What do you want out of this? It looks like there are several levels
crammed together here. At first blush, it looks like topics separated by
"; ", so this should get you started:

topics = longstring.spli t("; ")
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Nov 28 '06 #5

ronrsr wrote:
I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the .split()
function, it doesn't seem to do what I want it to - any ideas?

thanks very much for your help.

r-sr-
longstring = 'Agricultural subsidies; Foreign aidAgriculture;
Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
[snip most of VERY long string]
Book);Corporate Reform, Personhood, Farming (Dem. Book);Crime Rates,
Legislation, Education;Debt, Credit Cards;'

Hi ronster,

As far as I recall, without digging in the archives:

We would probably agree (if shown the schema) that the database wasn't
designed. However it seems to have changed. Last time you asked, it
was at least queryable and producing rows, each containing one column
(a string of structure unknown to us and not divulged by you). You were
given extensive advice: how to use split(), plus some questions to
answer about the data e.g. the significance (if any) of semicolon
versus comma. You were also asked about the SQL that was used. You were
asked to explain what you meant by "keywords". All of those questions
were asked so that we could understand your problem, and help you.
Since then, nothing.

Now you have what appears to be something like your previous results
stripped of newlines and smashed together (are the newlines of no
significance at all?), and you appear to be presenting it as a new
problem.

What's going on?

Regards,
John

Nov 28 '06 #6
ronrsr wrote:
still having a heckuva time with this.
You don't seem to get it.
here's where it stand - the split function doesn't seem to work the way
i expect it to.
longkw1,type(lo ngkw): Agricultural subsidies; Foreign
aid;Agriculture ; Sustainable Agriculture - Support; Organic
Agriculture; Pesticides, US, Childhood Development, Birth Defects;
<type 'list'1

longkw.replace( ',',';')
>>sample = "eat, drink; man, woman"
sample.replac e(";", ",")
'eat, drink, man, woman'
>>sample
'eat, drink; man, woman'

Aha, Python doesn't replace in place, it creates a new string instead.
Agricultural subsidies; Foreign aid;Agriculture ; Sustainable
Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
Development
kw = longkw.split("; ,") #kw is now a list of len 1
>>sample = "eat+-drink+man-woman"
sample.split( "+-")
['eat', 'drink+man-woman']
>>sample.split( "+")
['eat', '-drink', 'man-woman']

Aha, Python interprets the complete split() argument as the delimiter, not
each of its characters.

Do you think you can combine these two findings to make your code work? You
will have to replace() first and then split().

Peter
Nov 28 '06 #7
ronrsr wrote:
still having a heckuva time with this.

here's where it stand - the split function doesn't seem to work the way
i expect it to.
longkw1,type(lo ngkw): Agricultural subsidies; Foreign
aid;Agriculture ; Sustainable Agriculture - Support; Organic
Agriculture; Pesticides, US, Childhood Development, Birth Defects;
<type 'list'1

longkw.replace( ',',';')

Agricultural subsidies; Foreign aid;Agriculture ; Sustainable
Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
Development
Here you have discovered that string.replace( ) returns a string and does
NOT modify the original string. Try this for clarification:
>>a="DAWWIJFWA, dwadw;djwkajdw"
a
'DAWWIJFWA,,,,, ,dwadw;djwkajdw '
>>a.replace("," ,";")
'DAWWIJFWA;;;;; ;dwadw;djwkajdw '
>>a
'DAWWIJFWA,,,,, ,dwadw;djwkajdw '
>>b = a.replace(',',' ;')
b
'DAWWIJFWA;;;;; ;dwadw;djwkajdw '
>

kw = longkw.split("; ,") #kw is now a list of len 1
Yes, because it is trying to split longkw wherever it finds the whole
string "; '" and NOT wherever it finds ";" or " " or ",". This has been
stated before by NickV, Duncan Booth, Fredrik Lundh and Paul McGuire
amongst others. You will need to do either:

a.)

# First split on every semicolon
a = longkw.split("; ")
b = []
# Then split those results on whitespace
#(the default action for string.split())
for item in a:
b.append(item.s plit())
# Then split on commas
kw = []
for item in b:
kw.append(item. split(","))

or b.)

# First replace commas with spaces
longkw = longkw.replace( ",", " ")
# Then replace semicolons with spaces
longkw = longkw.replace( ";", " ")
# Then split on white space, (default args)
kw = longkw.split()
Note that we did:
longkw = longkw.replace( ",", " ")
and not just:
longkw.replace( ",", " ")
You will find that method A may give empty strings as some elements of
kw. If so, use method b.
Finally, if you have further problems, please please do the following:

1.) Provide your input data clearly, exactly as you have it.
2.) Show exactly what you want the output to be, including any special
cases.
3.) If something doesn't work the way you expect it to, tell us how you
expect it to work so we know what you mean by "doesn't work how I expect
it to"
4.) Read all the replies carefully and if you don't understand the
reply, ask for clarification.
5.) Read the help functions carefully - what the input parameters have
to be and what the return value will be, and whether or not it changes
the parameters or original object. Strings are usually NOT mutable so
any functions that operate on strings tend to return the result as a new
string and leave the original string intact.

I really hope this helps,

Cameron.
Nov 28 '06 #8
ronrsr wrote:
still having a heckuva time with this.

here's where it stand - the split function doesn't seem to work the way
i expect it to.
longkw1,type(lo ngkw): Agricultural subsidies; Foreign
aid;Agriculture ; Sustainable Agriculture - Support; Organic
Agriculture; Pesticides, US, Childhood Development, Birth Defects;
<type 'list'1

longkw.replace( ',',';')

Agricultural subsidies; Foreign aid;Agriculture ; Sustainable
Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
Development
kw = longkw.split("; ,") #kw is now a list of len 1

kw,typekw= ['Agricultural subsidies; Foreign aid;Agriculture ;
Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
Childhood Development, Birth Defects; Toxic Chemicals;Antib iotics,
Animals;Agricul tural Subsidies
what I would like is to break the string into a list of the delimited
words, but have had no luck doing that - I thought split wuld do that,
but it doens't.

bests,

-rsr-
>>import SE # http://cheeseshop.python.org/pypi/SE/2.3
>>Split_Marke r = SE.SE (' ,=| ;=| ') # Translates both ',' and
';' into an arbitrary split mark ('|')
>>for item in Split_Marker (longstring).sp lit ('|'): print item
Agricultural subsidies
Foreign aidAgriculture
Sustainable Agriculture - Support
Organic Agriculture

.... etc.

To get rid of the the leading space on some lines simply add
corresponding replacements. SE does any number of substitutions in one
pass. Defining them is a simple matter of writing them up in one single
string from which the translator object is made:
>>Split_Marke r = SE.SE (' ,=| ;=| ", =|" "; =|" ')
for item in Split_Marker (longstring).sp lit ('|'): print item
Agricultural subsidies
Foreign aidAgriculture
Sustainable Agriculture - Support
Organic Agriculture
Regards

Frederic
Nov 29 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
2050
by: robsom | last post by:
Hi, I have a problem with a small python program I'm trying to write and I hope somebody may help me. I'm working on tables of this kind: CGA 1988 06 21 13 48 G500-050 D 509.62 J.. R1 1993 01 28 00 00 880006 CGA 1988 06 21 14 04 G500-051 D 550.62 J.. R1 1993 01 28 00 00 880007 I have to read each line of the table and put it into...
3
2183
by: Piet | last post by:
Hello, I have a very strange problem with regular expressions. The problem consists of analyzing the properties of columns of a MySQL database. When I request the column type, I get back a string with the following composition: vartype(width|list) further variable attributes. vartype is a simple string(varchar, tinyint ...) which might be...
6
1916
by: qwweeeit | last post by:
Splitting with RE has (for me!) misterious behaviour! I want to get the words from this string: s= 'This+(that)= a.string!!!' in a list like that considering "a.string" as a word. Python 2.3.4 (#2, Aug 19 2004, 15:49:40) on linux2
3
5545
by: Patrick Coleman | last post by:
Hi, I'm looking for a function to split urls into their component parts, ie protocol, host, path, filename, extension. I'm really only looking for path and hostname (so I can download a webpage over sockets using c++). Something equivilent to PHP's 'explode' function would be fine, or even better PHP's 'spliturl' function :). ...
7
2224
by: qwweeeit | last post by:
Hi all, I am writing a script to visualize (and print) the web references hidden in the html files as: '<a href="web reference"> underlined reference</a>' Optimizing my code, I found that an essential step is: splitting on a word (in this case 'href'). I am asking if there is some alternative (more pythonic...): # SplitMultichar.py
20
2999
by: Ed | last post by:
I am running Access 2002 and just ran the built in Access wizard for splitting a database into a back end (with tables) and front end (with queries, forms, modules, etc.). After running the wizard, I opened the table relationship view and noticed that all the relationships are missing. Is this supposed to happen? If so, why? I've noticed...
2
1763
by: Jenny | last post by:
Hello All! I have a long XML file that I should transmit to other computer using http. Problem is that the whole XML Document is too large for one transmitting. What is the nicest way to split XML document into smaller pieces e.g. to 10 pieces? XML document is same kind what comes to itäs tags.
1
328
by: ronrsr | last post by:
I have a single long string - I'd like to split it into a list of unique keywords. Sadly, the database wasn't designed to do this, so I must do this in Python - I'm having some trouble using the .split() function, it doesn't seem to do what I want it to - any ideas? thanks very much for your help. r-sr-
2
3256
by: shadow_ | last post by:
Hi i m new at C and trying to write a parser and a string class. Basicly program will read data from file and splits it into lines then lines to words. i used strtok function for splitting data to lines it worked quite well but srttok isnot working for multiple blank or commas. Can strtok do this kind of splitting if it cant what should i use...
37
1817
by: xyz | last post by:
I have a string 16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168 for example lets say for the above string 16:23:18.659343 -- time 131.188.37.230 -- srcaddress 22 --srcport 131.188.37.59 --destaddress 1398 --destport tcp --protocol
0
7527
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7459
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7726
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7967
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
7819
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
3505
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3488
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1064
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
772
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.