Hi list,
I have a very simple SAX script from which I get results like
'Title1:Description','Title2:Description'. I want to split each result
on the colon, using the two resulting elements as key/value pairs in a
dictionary. I've tried a couple different approaches with lists etc,
but I keep getting an 'IndexError: list index out of range' when I go
to split the results. Probably an easy fix but it's my first hack at
SAX/XML. Thank you!
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
class reportHandler(ContentHandler):
def __init__(self):
self.isReport = 0
def startElement(self, name, attrs):
if name == 'title':
self.isReport = 1
self.reportText = ''
def characters(self, ch):
if self.isReport:
self.reportText += ch
def endElement(self, name):
if name == 'title':
self.isReport = 0
print self.reportText
parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/') 6 1393
IamIan wrote:
I have a very simple SAX script from which I get results like
'Title1:Description','Title2:Description'. I want to split each result
on the colon, using the two resulting elements as key/value pairs in a
dictionary. I've tried a couple different approaches with lists etc,
but I keep getting an 'IndexError: list index out of range' when I go
to split the results. Probably an easy fix but it's my first hack at
SAX/XML. Thank you!
Sounds like a problem with the data to me rather than SAX.
However, SAX tends to make things much more complex than necessary, so you
loose the sight on the real problems. Try a library like ElementTree or lxml
to make your life easier. You might especially like lxml.objectify. http://effbot.org/zone/element.htm http://effbot.org/zone/element-iterparse.htm http://codespeak.net/lxml/dev/ http://codespeak.net/lxml/dev/objectify.html
Stefan
Well SAX isn't the problem... maybe I should repost this with a
different title. The SAX part works just as I want, but the results I
get back need to be manipulated. No matter what I try I can't split a
result like 'Title 1:Description' on the colon without getting an
IndexError. Ideas anyone?
On 6/8/07, IamIan <ia****@gmail.comwrote:
Well SAX isn't the problem... maybe I should repost this with a
different title. The SAX part works just as I want, but the results I
get back need to be manipulated. No matter what I try I can't split a
result like 'Title 1:Description' on the colon without getting an
IndexError. Ideas anyone?
I don't think you've showed us any examples of the code you're having
trouble with. I don't see anything in your original post that tries
to split strings. If you just want to know how split works, here's an
example:
>>t = 'Title1:Description' key, value = t.split(':') print key
Title1
>>print value
Description
>>>
If that doesn't help, show us a sample of some of the data you're
working with, what you've
tried so far, and what the end result is supposed to look like.
--
Jerry
I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2:Description'}.
The XML data looks like:
<item>
<title>Title1:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
<item>
<title>Title2:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
tracker = [] # Option 1
tracker = {} # Option 2
class reportHandler(ContentHandler):
def __init__(self):
self.isReport = 0
def startElement(self, name, attrs):
if name == 'title':
self.isReport = 1
self.reportText = ''
def characters(self, ch):
if self.isReport:
self.reportText += ch
tracker.append(ch) # Option 1
key, value = ch.split (':') # Option 2
tracker[key] = value
def endElement(self, name):
if name == 'title':
self.isReport = 0
print self.reportText
parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')
print tracker
Option 1 returns a list with the markup included, looking like:
[u'Title1:", u'\n', u'Description ', u'\n', u'\t\t\t', u'Title2:',
u'\n', u'Description ', u'\n', u'\t\t\t', etc]
Option 2 fails with the traceback:
File "C:\test.py", line 21, in characters
key, value = ch.split(':')
ValueError: need more than 1 value to unpack
Thank you for the help!
En Tue, 12 Jun 2007 16:16:45 -0300, IamIan <ia****@gmail.comescribió:
I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2:Description'}.
The XML data looks like:
<item>
<title>Title1:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
<item>
<title>Title2:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):
Forget about SAX. Use ElementTree instead
pyimport xml.etree.cElementTree as ET
pyf = open("x.xml","r")
pytree = ET.parse(f)
pyfor item in tree.getiterator('item'):
.... print item.findtext('title')
....
Title1:Description
Title2:Description
ElementTree is infinitely more flexible and easier to use.
See <http://effbot.org/zone/element-index.htm>
--
Gabriel Genellina
Gabriel Genellina wrote:
Forget about SAX. Use ElementTree instead
ElementTree is infinitely more flexible and easier to use.
See <http://effbot.org/zone/element-index.htm>
That's what I told him/her already :)
Rephrasing a famous word:
Being faced with an XML problem, you might think "Ok, I'll just use SAX". And
now you have two problems.
SAX is a great way to hide your real problems behind a wall of unreadable
code. If you want my opinion, lxml is currently the straightest way to get XML
work done in Python.
Stefan This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: qwweeeit |
last post by:
Hi all,
I am writing a script to visualize (and print)
the web references hidden in the html files as:
'<a href="web reference"> underlined reference</a>'
Optimizing my code, I found that an...
|
by: Earl Anderson |
last post by:
I have a A97/XP applet I've developed for my own use in my department. My
boss "suggests" that since I built it, I share it with and instruct the
other 6 members of my department on its use. I've...
|
by: fatted |
last post by:
I'm trying to write a function which splits a string (possibly multiple
times) on a particular character and returns the strings which has been
split. What I have below is kind of (oh dear!)...
|
by: ronrsr |
last post by:
I'm trying to break up the result tuple into keyword phrases. The
keyword phrases are separated by a ; -- the split function is not
working the way I believe it should be. Can anyone see what I"m...
|
by: =?Utf-8?B?R3VoYW5hdGg=?= |
last post by:
Hi,
Please suggest a solution for the following problem.
We get a input xml that need to be processed for grouping and sorting.
The example of one such xml
<Requests>
<Request>
......
|
by: ryjfgjl |
last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
| |