I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the .split()
function, it doesn't seem to do what I want it to - any ideas?
thanks very much for your help.
r-sr-
longstring = 'Agricultural subsidies; Foreign aidAgriculture;
Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
Childhood Development, Birth Defects; Toxic ChemicalsAntibiotics,
AnimalsAgricultural Subsidies, Global TradeAgricultural
SubsidiesBiodiversityCitizen ActivismCommunity
GardensCooperativesDietingAgriculture, CottonAgriculture, Global
TradePesticides, MonsantoAgriculture, SeedCoffee, HungerPollution,
Water, FeedlotsFood PricesAgriculture, WorkersAnimal Feed, Corn,
PesticidesAquacultureChemical
WarfareCompostDebtConsumerismFearPesticides, US, Childhood Development,
Birth DefectsCorporate Reform, Personhood (Dem. Book)Corporate Reform,
Personhood, Farming (Dem. Book)Crime Rates, Legislation,
EducationDebt, Credit CardsDemocracyPopulation, WorldIncomeDemocracy,
Corporate Personhood, Porter Township (Dem. Book)Disaster
ReliefDwellings, SlumsEconomics, MexicoEconomy, LocalEducation,
ProtestsEndangered Habitat, RainforestEndangered SpeciesEndangered
Species, Extinctionantibiotics, livestockAgricultural subsidies;
Foreign aid;Agriculture; Sustainable Agriculture - Support; Organic
Agriculture; Pesticides, US, Childhood Development, Birth Defects;
Toxic Chemicals;Antibiotics, Animals;Agricultural Subsidies, Global
Trade;Agricultural Subsidies;Biodiversity;Citizen Activism;Community
Gardens;Cooperatives;Dieting;Agriculture, Cotton;Agriculture, Global
Trade;Pesticides, Monsanto;Agriculture, Seed;Coffee, Hunger;Pollution,
Water, Feedlots;Food Prices;Agriculture, Workers;Animal Feed, Corn,
Pesticides;Aquaculture;Chemical
Warfare;Compost;Debt;Consumerism;Fear;Pesticides, US, Childhood
Development, Birth Defects;Corporate Reform, Personhood (Dem.
Book);Corporate Reform, Personhood, Farming (Dem. Book);Crime Rates,
Legislation, Education;Debt, Credit Cards;Democracy;Population,
World;Income;Democracy, Corporate Personhood, Porter Township (Dem.
Book);Disaster Relief;Dwellings, Slums;Economics, Mexico;Economy,
Local;Education, Protests;Endangered Habitat, Rainforest;Endangered
Species;Endangered Species, Extinction;antibiotics,
livestock;Pesticides, Water;Environment, Environmentalist;Food, Hunger,
Agriculture, Aid, World, Development;Agriculture, Cotton
Trade;Agriculture, Cotton, Africa;Environment, Energy;Fair Trade (Dem.
Book);Farmland, Sprawl;Fast Food, Globalization, Mapping;depression,
mental illness, mood disorders;Economic Democracy, Corporate
Personhood;Brazil, citizen activism, hope, inspiration, labor
issues;citizen activism, advice, hope;Pharmaceuticals, Medicine,
Drugs;Community Investing;Environment, Consumer Waste Reduction,
Consumer Behavior and Taxes;Hunger, US, Poverty;FERTILITY,
Women;Agricultural subsidies; Foreign aid;Agriculture; Sustainable
Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
Development, Birth Defects; Toxic Chemicals;Antibiotics,
Animals;Agricultural Subsidies, Global Trade;Agricultural
Subsidies;Biodiversity;Citizen Activism;Community
Gardens;Cooperatives;Dieting;Agricultural subsidies; Foreign
aid;Agriculture; Sustainable Agriculture - Support; Organic
Agriculture; Pesticides, US, Childhood Development, Birth Defects;
Toxic Chemicals;Antibiotics, Animals;Agricultural Subsidies, Global
Trade;Agricultural Subsidies;Biodiversity;Citizen Activism;Community
Gardens;Cooperatives;Dieting;Agriculture, Cotton;Agriculture, Global
Trade;Pesticides, Monsanto;Agriculture, Seed;Coffee, Hunger;Pollution,
Water, Feedlots;Food Prices;Agriculture, Workers;Animal Feed, Corn,
Pesticides;Aquaculture;Chemical
Warfare;Compost;Debt;Consumerism;Fear;Pesticides, US, Childhood
Development, Birth Defects;Corporate Reform, Personhood (Dem.
Book);Corporate Reform, Personhood, Farming (Dem. Book);Crime Rates,
Legislation, Education;Debt, Credit Cards;' 8 2558
What exactly seems to be the problem?
"ronrsr" <ro****@gmail.comwrote in message
news:11*********************@14g2000cws.googlegrou ps.com...
>I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the .split()
function, it doesn't seem to do what I want it to - any ideas?
thanks very much for your help.
r-sr-
ronrsr wrote:
I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the .split()
function, it doesn't seem to do what I want it to - any ideas?
Did you follow the recommendations given to you the last time you asked this
question? What did you try? What results do you want to get?
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
still having a heckuva time with this.
here's where it stand - the split function doesn't seem to work the way
i expect it to.
longkw1,type(longkw): Agricultural subsidies; Foreign
aid;Agriculture; Sustainable Agriculture - Support; Organic
Agriculture; Pesticides, US, Childhood Development, Birth Defects;
<type 'list'1
longkw.replace(',',';')
Agricultural subsidies; Foreign aid;Agriculture; Sustainable
Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
Development
kw = longkw.split("; ,") #kw is now a list of len 1
kw,typekw= ['Agricultural subsidies; Foreign aid;Agriculture;
Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
Childhood Development, Birth Defects; Toxic Chemicals;Antibiotics,
Animals;Agricultural Subsidies
what I would like is to break the string into a list of the delimited
words, but have had no luck doing that - I thought split wuld do that,
but it doens't.
bests,
-rsr-
Robert Kern wrote:
ronrsr wrote:
I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the .split()
function, it doesn't seem to do what I want it to - any ideas?
Did you follow the recommendations given to you the last time you asked this
question? What did you try? What results do you want to get?
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
"ronrsr" <ro****@gmail.comwrote:
>I have a single long string - I'd like to split it into a list of unique keywords. Sadly, the database wasn't designed to do this, so I must do this in Python - I'm having some trouble using the .split() function, it doesn't seem to do what I want it to - any ideas?
thanks very much for your help.
r-sr-
longstring = 'Agricultural subsidies; Foreign aidAgriculture; Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood Development, Birth Defects; Toxic ChemicalsAntibiotics, AnimalsAgricultural Subsidies, Global TradeAgricultural SubsidiesBiodiversityCitizen ActivismCommunity...
What do you want out of this? It looks like there are several levels
crammed together here. At first blush, it looks like topics separated by
"; ", so this should get you started:
topics = longstring.split("; ")
--
Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
ronrsr wrote:
I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the .split()
function, it doesn't seem to do what I want it to - any ideas?
thanks very much for your help.
r-sr-
longstring = 'Agricultural subsidies; Foreign aidAgriculture;
Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
[snip most of VERY long string]
Book);Corporate Reform, Personhood, Farming (Dem. Book);Crime Rates,
Legislation, Education;Debt, Credit Cards;'
Hi ronster,
As far as I recall, without digging in the archives:
We would probably agree (if shown the schema) that the database wasn't
designed. However it seems to have changed. Last time you asked, it
was at least queryable and producing rows, each containing one column
(a string of structure unknown to us and not divulged by you). You were
given extensive advice: how to use split(), plus some questions to
answer about the data e.g. the significance (if any) of semicolon
versus comma. You were also asked about the SQL that was used. You were
asked to explain what you meant by "keywords". All of those questions
were asked so that we could understand your problem, and help you.
Since then, nothing.
Now you have what appears to be something like your previous results
stripped of newlines and smashed together (are the newlines of no
significance at all?), and you appear to be presenting it as a new
problem.
What's going on?
Regards,
John
ronrsr wrote:
still having a heckuva time with this.
You don't seem to get it.
here's where it stand - the split function doesn't seem to work the way
i expect it to.
longkw1,type(longkw): Agricultural subsidies; Foreign
aid;Agriculture; Sustainable Agriculture - Support; Organic
Agriculture; Pesticides, US, Childhood Development, Birth Defects;
<type 'list'1
longkw.replace(',',';')
>>sample = "eat, drink; man, woman" sample.replace(";", ",")
'eat, drink, man, woman'
>>sample
'eat, drink; man, woman'
Aha, Python doesn't replace in place, it creates a new string instead.
Agricultural subsidies; Foreign aid;Agriculture; Sustainable
Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
Development
kw = longkw.split("; ,") #kw is now a list of len 1
>>sample = "eat+-drink+man-woman" sample.split("+-")
['eat', 'drink+man-woman']
>>sample.split("+")
['eat', '-drink', 'man-woman']
Aha, Python interprets the complete split() argument as the delimiter, not
each of its characters.
Do you think you can combine these two findings to make your code work? You
will have to replace() first and then split().
Peter
ronrsr wrote:
still having a heckuva time with this.
here's where it stand - the split function doesn't seem to work the way
i expect it to.
longkw1,type(longkw): Agricultural subsidies; Foreign
aid;Agriculture; Sustainable Agriculture - Support; Organic
Agriculture; Pesticides, US, Childhood Development, Birth Defects;
<type 'list'1
longkw.replace(',',';')
Agricultural subsidies; Foreign aid;Agriculture; Sustainable
Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
Development
Here you have discovered that string.replace() returns a string and does
NOT modify the original string. Try this for clarification:
>>a="DAWWIJFWA,dwadw;djwkajdw" a
'DAWWIJFWA,,,,,,dwadw;djwkajdw'
>>a.replace(",",";")
'DAWWIJFWA;;;;;;dwadw;djwkajdw'
>>a
'DAWWIJFWA,,,,,,dwadw;djwkajdw'
>>b = a.replace(',',';') b
'DAWWIJFWA;;;;;;dwadw;djwkajdw'
>
kw = longkw.split("; ,") #kw is now a list of len 1
Yes, because it is trying to split longkw wherever it finds the whole
string "; '" and NOT wherever it finds ";" or " " or ",". This has been
stated before by NickV, Duncan Booth, Fredrik Lundh and Paul McGuire
amongst others. You will need to do either:
a.)
# First split on every semicolon
a = longkw.split(";")
b = []
# Then split those results on whitespace
#(the default action for string.split())
for item in a:
b.append(item.split())
# Then split on commas
kw = []
for item in b:
kw.append(item.split(","))
or b.)
# First replace commas with spaces
longkw = longkw.replace(",", " ")
# Then replace semicolons with spaces
longkw = longkw.replace(";", " ")
# Then split on white space, (default args)
kw = longkw.split()
Note that we did:
longkw = longkw.replace(",", " ")
and not just:
longkw.replace(",", " ")
You will find that method A may give empty strings as some elements of
kw. If so, use method b.
Finally, if you have further problems, please please do the following:
1.) Provide your input data clearly, exactly as you have it.
2.) Show exactly what you want the output to be, including any special
cases.
3.) If something doesn't work the way you expect it to, tell us how you
expect it to work so we know what you mean by "doesn't work how I expect
it to"
4.) Read all the replies carefully and if you don't understand the
reply, ask for clarification.
5.) Read the help functions carefully - what the input parameters have
to be and what the return value will be, and whether or not it changes
the parameters or original object. Strings are usually NOT mutable so
any functions that operate on strings tend to return the result as a new
string and leave the original string intact.
I really hope this helps,
Cameron.
ronrsr wrote:
still having a heckuva time with this.
here's where it stand - the split function doesn't seem to work the way
i expect it to.
longkw1,type(longkw): Agricultural subsidies; Foreign
aid;Agriculture; Sustainable Agriculture - Support; Organic
Agriculture; Pesticides, US, Childhood Development, Birth Defects;
<type 'list'1
longkw.replace(',',';')
Agricultural subsidies; Foreign aid;Agriculture; Sustainable
Agriculture - Support; Organic Agriculture; Pesticides, US, Childhood
Development
kw = longkw.split("; ,") #kw is now a list of len 1
kw,typekw= ['Agricultural subsidies; Foreign aid;Agriculture;
Sustainable Agriculture - Support; Organic Agriculture; Pesticides, US,
Childhood Development, Birth Defects; Toxic Chemicals;Antibiotics,
Animals;Agricultural Subsidies
what I would like is to break the string into a list of the delimited
words, but have had no luck doing that - I thought split wuld do that,
but it doens't.
bests,
-rsr-
>>import SE # http://cheeseshop.python.org/pypi/SE/2.3
>>Split_Marker = SE.SE (' ,=| ;=| ') # Translates both ',' and
';' into an arbitrary split mark ('|')
>>for item in Split_Marker (longstring).split ('|'): print item
Agricultural subsidies
Foreign aidAgriculture
Sustainable Agriculture - Support
Organic Agriculture
.... etc.
To get rid of the the leading space on some lines simply add
corresponding replacements. SE does any number of substitutions in one
pass. Defining them is a simple matter of writing them up in one single
string from which the translator object is made:
>>Split_Marker = SE.SE (' ,=| ;=| ", =|" "; =|" ') for item in Split_Marker (longstring).split ('|'): print item
Agricultural subsidies
Foreign aidAgriculture
Sustainable Agriculture - Support
Organic Agriculture
Regards
Frederic This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: robsom |
last post by:
Hi, I have a problem with a small python program I'm trying to write
and I hope somebody may help me. I'm working on tables of this kind:
CGA 1988 06 21 13 48 G500-050 D 509.62 J.. R1 1993 01...
|
by: Piet |
last post by:
Hello,
I have a very strange problem with regular expressions. The problem
consists of analyzing the properties of columns of a MySQL database.
When I request the column type, I get back a string...
|
by: qwweeeit |
last post by:
Splitting with RE has (for me!) misterious behaviour!
I want to get the words from this string:
s= 'This+(that)= a.string!!!'
in a list like that
considering "a.string" as a word.
Python...
|
by: Patrick Coleman |
last post by:
Hi,
I'm looking for a function to split urls into their component parts, ie
protocol, host, path, filename, extension. I'm really only looking for
path and hostname (so I can download a webpage...
|
by: qwweeeit |
last post by:
Hi all,
I am writing a script to visualize (and print)
the web references hidden in the html files as:
'<a href="web reference"> underlined reference</a>'
Optimizing my code, I found that an...
| |
by: Ed |
last post by:
I am running Access 2002 and just ran the built in Access wizard for
splitting a database into a back end (with tables) and front end (with
queries, forms, modules, etc.).
After running the...
|
by: Jenny |
last post by:
Hello All!
I have a long XML file that I should transmit to other computer using http.
Problem is that the whole XML Document is too large for one
transmitting.
What is the nicest way to...
|
by: ronrsr |
last post by:
I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the...
|
by: shadow_ |
last post by:
Hi i m new at C and trying to write a parser and a string class.
Basicly program will read data from file and splits it into lines then
lines to words. i used strtok function for splitting data to...
|
by: xyz |
last post by:
I have a string
16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168
for example lets say for the above string
16:23:18.659343 -- time
131.188.37.230 -- srcaddress
22 ...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
| |
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The...
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |