473,487 Members | 2,671 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

splitting a string into 2 new strings

Hi,
I have a string e.g. 'C6 H12 O6' that I wish to split up to give 2
strings
'C H O' and '6 12 6'. I have played with string.split() and the re module -
but can't quite get there.

Any help would be greatly appreciated.

Thanks,

Mark.


Jul 18 '05 #1
7 4272
trp
Mark Light wrote:
Hi,
I have a string e.g. 'C6 H12 O6' that I wish to split up to give 2
strings
'C H O' and '6 12 6'. I have played with string.split() and the re module
- but can't quite get there.

Any help would be greatly appreciated.

Thanks,

Mark.


I'm, assuming that these are chemical compounds, so you're not limited to
one-character symbols.

Here's how I'd do it

import re

re_pat = re.compile('([A-Z]+)(\d+)')
text = 'C6 H12 O6'

# find each component, returns list of tuples (e.g. [('C', '6'), ...]
component = re_pat.findall(text)

#split into separate lists
symbols, counts = zip(*component)

# create the strings
symbols = ' '.join(symbols)
counts = ' '.join(counts)

--Andy

Jul 18 '05 #2
that works great - many thanks.

"trp" <tr*@smyrncable.net> wrote in message
news:vg************@corp.supernews.com...
Mark Light wrote:
Hi,
I have a string e.g. 'C6 H12 O6' that I wish to split up to give 2
strings
'C H O' and '6 12 6'. I have played with string.split() and the re module - but can't quite get there.

Any help would be greatly appreciated.

Thanks,

Mark.


I'm, assuming that these are chemical compounds, so you're not limited to
one-character symbols.

Here's how I'd do it

import re

re_pat = re.compile('([A-Z]+)(\d+)')
text = 'C6 H12 O6'

# find each component, returns list of tuples (e.g. [('C', '6'), ...]
component = re_pat.findall(text)

#split into separate lists
symbols, counts = zip(*component)

# create the strings
symbols = ' '.join(symbols)
counts = ' '.join(counts)

--Andy


Jul 18 '05 #3
P
Mark Light wrote:
Hi,
I have a string e.g. 'C6 H12 O6' that I wish to split up to give 2
strings
'C H O' and '6 12 6'. I have played with string.split() and the re module -
but can't quite get there.

Any help would be greatly appreciated.


import re

molecule_re = re.compile("(.+?)([0-9]+)")
def processMolecule(molecule):
elements=[]
numbers=[]

for item in molecule.split():
element, number = molecule_re.findall(item)[0]
elements.append(element)
numbers.append(number)

elements = ' '.join(elements)
numbers = ' '.join(numbers)

return (elements, numbers)

print processMolecule('C6 H12 O6')

Jul 18 '05 #4
trp:
I'm, assuming that these are chemical compounds, so you're not limited to
one-character symbols.
The problem is underspecified. Usually 2-character (or 3-character for some
elements with high atomic number, and not assuming the newer IUPAC names
like "Dubnium", which was also called Unnilpentium (Unp) or, depending on
your political persuasion, Joliotium (Jl) or Hahnium (Ha)) have the first
letter
capitalized and the rest in lower case.
re_pat = re.compile('([A-Z]+)(\d+)')


So this should be written ([A-Z][A-Za-z]*)(\d+), where I explicitly allow
both lower and upper case trailing letters to be more accepting. (In some
systems, "CU" is "1 carbon + 1 uranium" and in others it's an alternate way
to
write "1 copper". Though I suspect it's not allowed in the OP's problem.)

Andrew
da***@dalkescientific.com
Jul 18 '05 #5
Anton Vredegoor:
The issue seems to be resolved already, but I haven't seen the split
and strip combination:

from string import letters,digits


Use "ascii_letters" instead of "letters". The latter is based on the locale
so
might not work on some machines where "C" (or rather, byte 67) isn't
a letter in the local alphabet.

Andrew
da***@dalkescientific.com
Jul 18 '05 #6
trp:
I'm, assuming that these are chemical compounds, so you're not limited to
one-character symbols.
The problem is underspecified. Usually 2-character (or 3-character for some
elements with high atomic number, and not assuming the newer IUPAC names
like "Dubnium", which was also called Unnilpentium (Unp) or, depending on
your political persuasion, Joliotium (Jl) or Hahnium (Ha)) have the first
letter
capitalized and the rest in lower case.
re_pat = re.compile('([A-Z]+)(\d+)')


So this should be written ([A-Z][A-Za-z]*)(\d+), where I explicitly allow
both lower and upper case trailing letters to be more accepting. (In some
systems, "CU" is "1 carbon + 1 uranium" and in others it's an alternate way
to
write "1 copper". Though I suspect it's not allowed in the OP's problem.)

Andrew
da***@dalkescientific.com
Jul 18 '05 #7
Anton Vredegoor:
The issue seems to be resolved already, but I haven't seen the split
and strip combination:

from string import letters,digits


Use "ascii_letters" instead of "letters". The latter is based on the locale
so
might not work on some machines where "C" (or rather, byte 67) isn't
a letter in the local alphabet.

Andrew
da***@dalkescientific.com
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
2179
by: Piet | last post by:
Hello, I have a very strange problem with regular expressions. The problem consists of analyzing the properties of columns of a MySQL database. When I request the column type, I get back a string...
3
1658
by: Aaron Walker | last post by:
I have a feeling this going to end up being something so stupid, but right now I'm confused as hell. I'm trying to code a function, that given a string and a delimiter char, returns a vector of...
9
14686
by: Dr. StrangeLove | last post by:
Greetings, Let say we want to split column 'list' in table lists into separate rows using the comma as the delimiter. Table lists id list 1 aa,bbb,c 2 e,f,gggg,hh 3 ii,kk 4 m
5
2940
by: fatted | last post by:
I'm trying to write a function which splits a string (possibly multiple times) on a particular character and returns the strings which has been split. What I have below is kind of (oh dear!)...
2
2496
by: Trint Smith | last post by:
Ok, My program has been formating .txt files for input into sql server and ran into a problem...the .txt is an export from an accounting package and is only supposed to contain comas (,) between...
20
3658
by: Opettaja | last post by:
I am new to c# and I am currently trying to make a program to retrieve Battlefield 2 game stats from the gamespy servers. I have got it so I can retrieve the data but I do not know how to cut up...
13
1946
by: Pedro Pinto | last post by:
Hi there. I'm trying to do the following. I have a string, and i want to separate it into other halves. This is how it should be: char string = "test//test2//test3"; were // is the part...
2
3245
by: shadow_ | last post by:
Hi i m new at C and trying to write a parser and a string class. Basicly program will read data from file and splits it into lines then lines to words. i used strtok function for splitting data to...
4
2522
by: techusky | last post by:
I am making a website for a newspaper, and I am having difficulty figuring out how to take a string (the body of an article) and break it up into three new strings so that I can display them in the...
4
1631
by: Eyes Of Madness | last post by:
I'm doing a program for a class of mine and I am having trouble splitting my strings up. I know you can do something like: a = '012345' a returns 012 but I am inputing strings of varying...
0
7106
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
6967
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7137
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
5442
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
4874
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4565
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3071
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1381
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
267
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.