Splitting SAX results

IamIan

Hi list,

I have a very simple SAX script from which I get results like
'Title1:Description','Title2:Description'. I want to split each result
on the colon, using the two resulting elements as key/value pairs in a
dictionary. I've tried a couple different approaches with lists etc,
but I keep getting an 'IndexError: list index out of range' when I go
to split the results. Probably an easy fix but it's my first hack at
SAX/XML. Thank you!

from xml.sax import make_parser
from xml.sax.handler import ContentHandler

class reportHandler(ContentHandler):
def __init__(self):
self.isReport = 0

def startElement(self, name, attrs):
if name == 'title':
self.isReport = 1
self.reportText = ''

def characters(self, ch):
if self.isReport:
self.reportText += ch

def endElement(self, name):
if name == 'title':
self.isReport = 0
print self.reportText

parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')

Jun 7 '07 #1

Subscribe Post Reply

1393

Stefan Behnel

IamIan wrote:

I have a very simple SAX script from which I get results like
'Title1:Description','Title2:Description'. I want to split each result
on the colon, using the two resulting elements as key/value pairs in a
dictionary. I've tried a couple different approaches with lists etc,
but I keep getting an 'IndexError: list index out of range' when I go
to split the results. Probably an easy fix but it's my first hack at
SAX/XML. Thank you!

Sounds like a problem with the data to me rather than SAX.

However, SAX tends to make things much more complex than necessary, so you
loose the sight on the real problems. Try a library like ElementTree or lxml
to make your life easier. You might especially like lxml.objectify.

http://effbot.org/zone/element.htm
http://effbot.org/zone/element-iterparse.htm

http://codespeak.net/lxml/dev/
http://codespeak.net/lxml/dev/objectify.html

Stefan

Jun 7 '07 #2

IamIan

Well SAX isn't the problem... maybe I should repost this with a
different title. The SAX part works just as I want, but the results I
get back need to be manipulated. No matter what I try I can't split a
result like 'Title 1:Description' on the colon without getting an
IndexError. Ideas anyone?

Jun 8 '07 #3

Jerry Hill

On 6/8/07, IamIan <ia****@gmail.comwrote:

Well SAX isn't the problem... maybe I should repost this with a
different title. The SAX part works just as I want, but the results I
get back need to be manipulated. No matter what I try I can't split a
result like 'Title 1:Description' on the colon without getting an
IndexError. Ideas anyone?

I don't think you've showed us any examples of the code you're having
trouble with. I don't see anything in your original post that tries
to split strings. If you just want to know how split works, here's an
example:

>>t = 'Title1:Description'
key, value = t.split(':')
print key

Title1

>>print value

Description

>>>

If that doesn't help, show us a sample of some of the data you're
working with, what you've
tried so far, and what the end result is supposed to look like.

--
Jerry

Jun 8 '07 #4

IamIan

I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2:Description'}.

The XML data looks like:
<item>
<title>Title1:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
<item>
<title>Title2:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>

I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):

from xml.sax import make_parser
from xml.sax.handler import ContentHandler

tracker = [] # Option 1
tracker = {} # Option 2

class reportHandler(ContentHandler):

def __init__(self):
self.isReport = 0

def startElement(self, name, attrs):
if name == 'title':
self.isReport = 1
self.reportText = ''

def characters(self, ch):
if self.isReport:
self.reportText += ch
tracker.append(ch) # Option 1
key, value = ch.split (':') # Option 2
tracker[key] = value

def endElement(self, name):
if name == 'title':
self.isReport = 0
print self.reportText

parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')

print tracker
Option 1 returns a list with the markup included, looking like:
[u'Title1:", u'\n', u'Description ', u'\n', u'\t\t\t', u'Title2:',
u'\n', u'Description ', u'\n', u'\t\t\t', etc]

Option 2 fails with the traceback:
File "C:\test.py", line 21, in characters
key, value = ch.split(':')
ValueError: need more than 1 value to unpack

Thank you for the help!

Jun 12 '07 #5

Gabriel Genellina

En Tue, 12 Jun 2007 16:16:45 -0300, IamIan <ia****@gmail.comescribió:

I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2:Description'}.

The XML data looks like:
<item>
<title>Title1:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
<item>
<title>Title2:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>

I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):

Forget about SAX. Use ElementTree instead

pyimport xml.etree.cElementTree as ET
pyf = open("x.xml","r")
pytree = ET.parse(f)
pyfor item in tree.getiterator('item'):
.... print item.findtext('title')
....
Title1:Description
Title2:Description

ElementTree is infinitely more flexible and easier to use.
See <http://effbot.org/zone/element-index.htm>

--
Gabriel Genellina

Jun 13 '07 #6

Stefan Behnel

Gabriel Genellina wrote:

Forget about SAX. Use ElementTree instead
ElementTree is infinitely more flexible and easier to use.
See <http://effbot.org/zone/element-index.htm>

That's what I told him/her already :)

Rephrasing a famous word:

Being faced with an XML problem, you might think "Ok, I'll just use SAX". And
now you have two problems.

SAX is a great way to hide your real problems behind a wall of unreadable
code. If you want my opinion, lxml is currently the straightest way to get XML
work done in Python.

Stefan

Jun 20 '07 #7

Similar topics

Splitting on a word

by: qwweeeit | last post by:

Hi all, I am writing a script to visualize (and print) the web references hidden in the html files as: '<a href="web reference"> underlined reference</a>' Optimizing my code, I found that an...

Python

Splitting Database Question

by: Earl Anderson | last post by:

I have a A97/XP applet I've developed for my own use in my department. My boss "suggests" that since I built it, I share it with and instruct the other 6 members of my department on its use. I've...

Microsoft Access / VBA

splitting a string (and memory allocation)

by: fatted | last post by:

I'm trying to write a function which splits a string (possibly multiple times) on a particular character and returns the strings which has been split. What I have below is kind of (oh dear!)...

C / C++

Why isn't SPLIT splitting my strings

by: ronrsr | last post by:

I'm trying to break up the result tuple into keyword phrases. The keyword phrases are separated by a ; -- the split function is not working the way I believe it should be. Can anyone see what I"m...

Python

Splitting and Renaming Xml Nodes

by: =?Utf-8?B?R3VoYW5hdGg=?= | last post by:

Hi, Please suggest a solution for the following problem. We get a input xml that need to be processed for grouping and sorting. The example of one such xml <Requests> <Request> ......

.NET Framework

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++