473,396 Members | 2,036 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

identifying and parsing string in text file

I have a large file that has many lines like this,

<element tag="300a,0014" vr="CS" vm="1" len="4"
name="DoseReferenceStructureType">SITE</element>

I would like to identify the line by the tag (300a,0014) and then grab
the name (DoseReferenceStructureType) and value (SITE).

I would like to create a file that would have the structure,

DoseReferenceStructureType = Site
...
...

Also, there is a possibility that there are multiple lines with the
same tag, but different values. These all need to be recorded.

So far, I have a little bit of code to look at everything that is
available,

for line in open(str(sys.argv[1])):
i_line = line.split()
if i_line:
if i_line[0] == "<element":
a = i_line[1]
b = i_line[5]
print "%s | %s" %(a, b)

but do not see a clever way of doing what I would like.

Any help or guidance would be appreciated.

Bryan
Mar 8 '08 #1
2 1017
Br***********@gmail.com wrote:
I have a large file that has many lines like this,

<element tag="300a,0014" vr="CS" vm="1" len="4"
name="DoseReferenceStructureType">SITE</element>

I would like to identify the line by the tag (300a,0014) and then grab
the name (DoseReferenceStructureType) and value (SITE).

I would like to create a file that would have the structure,

DoseReferenceStructureType = Site
...
...
You should try with Regular Expressions or if it is something like xml there
is for sure a library you can you to parse it ...
anyway you can try something simpler like this:

elem_dic=dict()
for line in open(str(sys.argv[1])):
line_splitted=line.split()
for item in line_splitted:
item_splitted=item.split("=")
if len(item_splitted)>1:
elem_dic[item_splitted[0]]=item_splitted[1]

.... then you have to retrieve from the dict the items you need, for example,
with the line you posted you obtain these items splitted:

['<element']
['tag', '"300a,0014"']
['vr', '"CS"']
['vm', '"1"']
['len', '"4"']
['name', '"DoseReferenceStructureType">SITE</element>']

and elem_dic will contain the last five, with the keys
'tag','vr','vm','len','name' and teh values 300a,0014 etc etc
i.e. this:

{'vr': '"CS"', 'tag': '"300a,0014"', 'vm': '"1"', 'len': '"4"', 'name': '"DoseReferenceStructureType">SITE</element>'}


--
Age is not a particularly interesting subject. Anyone can get old. All
you have to do is live long enough.

Mar 8 '08 #2
On 8 mar, 20:49, "Bryan.Fodn...@gmail.com" <Bryan.Fodn...@gmail.com>
wrote:
I have a large file that has many lines like this,

<element tag="300a,0014" vr="CS" vm="1" len="4"
name="DoseReferenceStructureType">SITE</element>

I would like to identify the line by the tag (300a,0014) and then grab
the name (DoseReferenceStructureType) and value (SITE).
It's obviously an XML file, so use a XML parser - there are SAX and
DOM parsers in the stdlib, as well as the ElementTree module.

Mar 9 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Michael Hogan | last post by:
I want to pars a playlist file for three different varibles, so I can save them as mp3 files. I am using: strTEMPURL = GetUrlSource(Text1.Text) to put the entire .pls file into a strTEMPURL...
7
by: Kylotan | last post by:
I have a text file where the fields are delimited in various different ways. For example, strings are terminated with a tilde, numbers are terminated with whitespace, and some identifiers are...
1
by: hokiegal99 | last post by:
This is not really a Python-centric question, however, I am using Python to solve this problem (as of now) so I thought it appropiate to pose the question here. I have some functions that search...
4
by: Gert Van den Eynde | last post by:
Hi all, Could you give me some pointers on how to parse a text input file in C++? Most will be config-file style input (keyword = data), but some maybe 'structures' like material{ name = n,...
4
by: ralphNOSPAM | last post by:
Is there a function or otherwise some way to pull out the target text within an XML tag? For example, in the XML tag below, I want to pull out 'CALIFORNIA'. ...
3
by: Eric Lilja | last post by:
Hello, I'm creating a small utility for an online game. It involves parsing a text file of "tradesskill recipes" and inserting these recipes in a gui tree widget (similar to gui file browsers if...
11
by: .Net Sports | last post by:
In VB.net, I'm trying to do a couple of things in a couple of different blocks of code. I need to take the first 25 characters of a text file, then append at the end some ellipses and a MORE link...
3
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...
3
by: Anup Daware | last post by:
Hi Group, I am facing a strange problem here: I am trying to read xml response from a servlet using XmlTextWriter. I am able to read the read half of the xml and suddenly an exception:...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.