473,407 Members | 2,629 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

matching a string to extract substrings for which some functionreturns true

Hello All,

say you have some string: "['a', 'b', 1], foobar ['d', 4, ('a', 'e')]"
Now i want to extract all substrings for which
"isinstance(eval(substr), list)" is "True" .
now one way is to walk through the whole sample string and check the
condition, I
was wondering if there is any smarter way of doing the same, may be
using regular-expressions.

Thanks,
amit.

----
Endless the world's turn, endless the sun's spinning
Endless the quest;
I turn again, back to my own beginning,
And here, find rest.
Nov 22 '05 #1
8 1859
On Tue, 22 Nov 2005 16:57:41 +0530, Amit Khemka wrote:
Hello All,

say you have some string: "['a', 'b', 1], foobar ['d', 4, ('a', 'e')]"
Now i want to extract all substrings for which
"isinstance(eval(substr), list)" is "True" .
That's an awfully open-ended question. Is there some sort of structure to
the string? What defines a substring? What should you get if you extract
from this string?

"[[[[[]]]]]"

Is that one list or five?

now one way is to walk through the whole sample string and check the
condition,


Yes. Where does the string come from? Can a hostile user pass bad strings
to you and crash your code?
--
Steven.

Nov 22 '05 #2
Well actually the problem is I have a list of tuples which i cast as
string and then
put in a html page as the value of a hidden variable. And when i get
the string again,
i want to cast it back as list of tuples:
ex:
input: "('foo', 1, 'foobar', (3, 0)), ('foo1', 2, 'foobar1', (3, 1)),
('foo2', 2, 'foobar2', (3, 2))"
output: [('foo', 1, 'foobar', (3, 0)), ('foo1', 2, 'foobar1', (3, 1)),
('foo2', 2, 'foobar2', (3, 2))]

I hope that explains it better...

cheers,
On 11/22/05, Steven D'Aprano <st***@removethiscyber.com.au> wrote:
On Tue, 22 Nov 2005 16:57:41 +0530, Amit Khemka wrote:
Hello All,

say you have some string: "['a', 'b', 1], foobar ['d', 4, ('a', 'e')]"
Now i want to extract all substrings for which
"isinstance(eval(substr), list)" is "True" .


That's an awfully open-ended question. Is there some sort of structure to
the string? What defines a substring? What should you get if you extract
from this string?

"[[[[[]]]]]"

Is that one list or five?

now one way is to walk through the whole sample string and check the
condition,


Yes. Where does the string come from? Can a hostile user pass bad strings
to you and crash your code?
--
Steven.

--
http://mail.python.org/mailman/listinfo/python-list

--
----
Endless the world's turn, endless the sun's spinning
Endless the quest;
I turn again, back to my own beginning,
And here, find rest.
Nov 22 '05 #3
Amit Khemka wrote:
Well actually the problem is I have a list of tuples which i cast as
string and then put in a html page as the value of a hidden variable.
And when i get the string again, i want to cast it back as list of tuples:
ex:
input: "('foo', 1, 'foobar', (3, 0)), ('foo1', 2, 'foobar1', (3, 1)),
('foo2', 2, 'foobar2', (3, 2))"
output: [('foo', 1, 'foobar', (3, 0)), ('foo1', 2, 'foobar1', (3, 1)),
('foo2', 2, 'foobar2', (3, 2))]

I hope that explains it better...


what do you think happens if the user manipulates the field values
so they contain, say

os.system('rm -rf /')

or

"'*'*1000000*2*2*2*2*2*2*2*2*2"

or something similar?

if you cannot cache session data on the server side, I'd
recommend inventing a custom record format, and doing your
own parsing. turning your data into e.g.

"foo:1:foobar:3:0+foo1:2:foobar1:3:1+foo2:2:foobar 2:3:2"

is trivial, and the resulting string can be trivially parsed by a couple
of string splits and int() calls.

to make things a little less obvious, and make it less likely that some
character in your data causes problems for the HTML parser, you can
use base64.encodestring on the result (this won't stop a hacker, of
course, so you cannot put sensitive data in this field).

</F>

Nov 22 '05 #4
Fredrik, thanks for your suggestion. Though the html page that are
generated are for internal uses and input is verified before
processing.

And more than just a solution in current context, actually I was a
more curious about how can one do so in Python.

cheers,
amit.

On 11/22/05, Fredrik Lundh <fr*****@pythonware.com> wrote:
Amit Khemka wrote:
Well actually the problem is I have a list of tuples which i cast as
string and then put in a html page as the value of a hidden variable.
And when i get the string again, i want to cast it back as list of tuples:
ex:
input: "('foo', 1, 'foobar', (3, 0)), ('foo1', 2, 'foobar1', (3, 1)),
('foo2', 2, 'foobar2', (3, 2))"
output: [('foo', 1, 'foobar', (3, 0)), ('foo1', 2, 'foobar1', (3, 1)),
('foo2', 2, 'foobar2', (3, 2))]

I hope that explains it better...


what do you think happens if the user manipulates the field values
so they contain, say

os.system('rm -rf /')

or

"'*'*1000000*2*2*2*2*2*2*2*2*2"

or something similar?

if you cannot cache session data on the server side, I'd
recommend inventing a custom record format, and doing your
own parsing. turning your data into e.g.

"foo:1:foobar:3:0+foo1:2:foobar1:3:1+foo2:2:foobar 2:3:2"

is trivial, and the resulting string can be trivially parsed by a couple
of string splits and int() calls.

to make things a little less obvious, and make it less likely that some
character in your data causes problems for the HTML parser, you can
use base64.encodestring on the result (this won't stop a hacker, of
course, so you cannot put sensitive data in this field).

</F>

--
http://mail.python.org/mailman/listinfo/python-list

--
----
Endless the world's turn, endless the sun's spinning
Endless the quest;
I turn again, back to my own beginning,
And here, find rest.
Nov 22 '05 #5
Amit Khemka <kh********@gmail.com> writes:
Well actually the problem is I have a list of tuples which i cast as
string and then
put in a html page as the value of a hidden variable. And when i get
the string again,
i want to cast it back as list of tuples:
ex:
input: "('foo', 1, 'foobar', (3, 0)), ('foo1', 2, 'foobar1', (3, 1)),
('foo2', 2, 'foobar2', (3, 2))"
output: [('foo', 1, 'foobar', (3, 0)), ('foo1', 2, 'foobar1', (3, 1)),
('foo2', 2, 'foobar2', (3, 2))]

I hope that explains it better...


This is a serious security risk, as you can't trust the data not to do
arbitrary things to your system when eval'ed.

I'd look into pickling the list of tuples to get the string. You'll
want to use mode 0, and may need to encode the string in any
case. You'll also want to investigate the seecurity implications of
using pickle.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 22 '05 #6
Mike Meyer <mw*@mired.org> writes:
put in a html page as the value of a hidden variable. And when i get
the string again, i want to cast it back as list of tuples:...

This is a serious security risk, as you can't trust the data not to do
arbitrary things to your system when eval'ed.
I'd look into pickling the list of tuples to get the string.


The whole scheme of putting the stuff on the html page and then
getting it back from the client is ill-advised. Keep the info on the
server and just have the client send back some token (session ID
usually) saying where to find it on the server. If you absolutely
have to put this sort of data on the client, append a cryptographic
authentication code using the hmac module, and don't believe the data
unless the authentication verifies.
Nov 22 '05 #7
I wrote:
if you cannot cache session data on the server side, I'd
recommend inventing a custom record format, and doing your
own parsing. turning your data into e.g.

"foo:1:foobar:3:0+foo1:2:foobar1:3:1+foo2:2:foobar 2:3:2"

is trivial, and the resulting string can be trivially parsed by a couple
of string splits and int() calls.


on the other hand, the "myeval" function I posted here

http://article.gmane.org/gmane.comp....general/433160

should be able to deal with your data, as well as handle data from
malevolent sources without bringing down your computer.

just add

if token[1] == "(":
out = []
token = src.next()
while token[1] != ")":
out.append(_parse(src, token))
token = src.next()
if token[1] == ",":
token = src.next()
return tuple(out)

after the corresponding "[" part, and call it like:

data = myeval("[" + input + "]")

</F>

Nov 22 '05 #8
thanks for you suggestions :-) ..

cheers,

On 11/23/05, Fredrik Lundh <fr*****@pythonware.com> wrote:
I wrote:
if you cannot cache session data on the server side, I'd
recommend inventing a custom record format, and doing your
own parsing. turning your data into e.g.

"foo:1:foobar:3:0+foo1:2:foobar1:3:1+foo2:2:foobar 2:3:2"

is trivial, and the resulting string can be trivially parsed by a couple
of string splits and int() calls.


on the other hand, the "myeval" function I posted here

http://article.gmane.org/gmane.comp....general/433160

should be able to deal with your data, as well as handle data from
malevolent sources without bringing down your computer.

just add

if token[1] == "(":
out = []
token = src.next()
while token[1] != ")":
out.append(_parse(src, token))
token = src.next()
if token[1] == ",":
token = src.next()
return tuple(out)

after the corresponding "[" part, and call it like:

data = myeval("[" + input + "]")

</F>

--
http://mail.python.org/mailman/listinfo/python-list

--
----
Endless the world's turn, endless the sun's spinning
Endless the quest;
I turn again, back to my own beginning,
And here, find rest.
Nov 23 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Tom Warren | last post by:
I found a c program called similcmp on the net and converted it to vba if anybody wants it. I'll post the technical research on it if there is any call for it. It looks like it could be a useful...
1
by: Patrick Sullivan | last post by:
I am trying to extract two parts of a number from an array element. Numbers are in the format of 1.10, 2.50, 11.10, etc. Floor and ceiling won't work right because close to 1.00, I get a zero, and...
5
by: olaufr | last post by:
Hi, I'd need to perform simple pattern matching within a string using a list of possible patterns. For example, I want to know if the substring starting at position n matches any of the string I...
2
by: Digital Fart | last post by:
following code would split a string "a != b" into 2 strings "a" and "b". but is there a way to know what seperator was used? string charSeparators = { "=", ">=", "<=" , "!=" }; string s1 =...
7
by: Kevin CH | last post by:
Hi, I'm currently running into a confusion on regex and hopefully you guys can clear it up for me. Suppose I have a regular expression (0|(1(01*0)*1))* and two test strings: 110_1011101_ and...
3
by: Girish Sahani | last post by:
Given a length k string,i want to search for 2 substrings (overlap possible) in a list consisting of length k-1 strings. These 2 substrings when 'united' give the original string. e.g given...
7
by: DennyLoi | last post by:
Hi everyone, Here is my problem, I have a partially decrypted piece string which would appear something like. Partially deycrpted: the?anage??esideshe?e Plain text: themanagerresideshere ...
6
by: lisong | last post by:
Hi All, I have problem to split a string like this: 'abc.defg.hij.klmnop' and I want to get all substrings with only one '.' in mid. so the output I expect is : 'abc.defg', 'defg.hij',...
8
by: SMJT | last post by:
Does anyone know why the string contains function always returns true if the token is an empty string? I expected it to return false. "AnyOldText".Contains("") or...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.