By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
448,652 Members | 1,694 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 448,652 IT Pros & Developers. It's quick & easy.

Calling pcre with ctypes

P: n/a
Recently I discovered the re module doesn't support POSIX character
classes:

Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>import re
r = re.compile('[:alnum:]+')
print r.match('123')
None

So I thought I'd try out pcre through ctypes, to recreate pcredemo.c
in python. The c code is at:
http://vcs.pcre.org/viewvc/code/trun....c?view=markup

Partial Python code is below

I'm stuck, from what I can tell a c array, such as:
int ovector[OVECCOUNT];

translates to
ovector = ctypes.c_int * OVECOUNT

but when I pass ovector to a function I get the traceback
$ python pcredemo.py [a-z] fred
Traceback (most recent call last):
File "pcredemo.py", line 65, in <module>
compiled_re, None, subject, len(subject), 0, 0, ovector, OVECOUNT
ctypes.ArgumentError: argument 7: <type 'exceptions.TypeError'>: Don't
know how to convert parameter 7

What is the correct way to construct and pass ovector?

With thanks, Alex

# PCRE through ctypes demonstration program

import ctypes
import getopt
import sys

import pcre_const

OVECOUNT = 30 # Should be a multiple of 3

pcre = ctypes.cdll.LoadLibrary('libpcre.so')

compiled_re = None
error = ctypes.c_char_p()
pattern = ''
subject = ''
name_table = ctypes.c_ubyte()
erroffset = ctypes.c_int()
find_all = 0
namecount = 0
name_entry_size = 0
ovector = ctypes.c_int * OVECOUNT
options = 0

# First, sort out the command line. There is only one possible option
at
# the moment, "-g" to request repeated matching to find all
occurrences,
# like Perl's /g option. We set the variable find_all to a non-zero
value
# if the -g option is present. Apart from that, there must be exactly
two
# arguments.

opts, args = getopt.getopt(sys.argv[1:], 'g')
for o, v in opts:
if o == '-g': find_all = 1

# After the options, we require exactly two arguments, which are the
# pattern, and the subject string.

if len(args) != 2:
print 'Two arguments required: a regex and a subject string'
sys.exit(1)

pattern = args[0]
subject = args[1]
subject_length = len(subject)

# Now we are going to compile the regular expression pattern, and
handle
# and errors that are detected.

compiled_re = pcre.pcre_compile(
pattern, options, ctypes.byref(error),
ctypes.byref(erroffset),
None
)

Jun 27 '08 #1
Share this Question
Share on Google+
1 Reply


P: n/a
moreati schrieb:
Recently I discovered the re module doesn't support POSIX character
classes:

Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>import re
r = re.compile('[:alnum:]+')
print r.match('123')
None

So I thought I'd try out pcre through ctypes, to recreate pcredemo.c
in python. The c code is at:
http://vcs.pcre.org/viewvc/code/trun....c?view=markup

Partial Python code is below

I'm stuck, from what I can tell a c array, such as:
int ovector[OVECCOUNT];

translates to
ovector = ctypes.c_int * OVECOUNT

but when I pass ovector to a function I get the traceback
$ python pcredemo.py [a-z] fred
Traceback (most recent call last):
File "pcredemo.py", line 65, in <module>
compiled_re, None, subject, len(subject), 0, 0, ovector, OVECOUNT
ctypes.ArgumentError: argument 7: <type 'exceptions.TypeError'>: Don't
know how to convert parameter 7

What is the correct way to construct and pass ovector?
'ctypes.c_int * OVECOUNT' does not create an array *instance*, it creates an array *type*,
comparable to a typedef in C. You can create an array instance by calling the type, with
zero or more initializers:

ovector = (ctypes.c_int * OVECOUNT)(0, 1, 2, 3, ...)

Thomas
Jun 27 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.