473,396 Members | 2,002 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

splitting a words of a line

Hi ,
I am trying to splitt a Line whihc is below of format ,

AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd
cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/
SelectProducts.aspx?
p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]

Here all the string whihc i want to split is
---------------------------------
AzAccept
PLYSSTM01
[23/Sep/2005:16:14:28 -0500]
162.44.245.32
CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=customer,DC=rxcorp,DC=com"
GET
/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc
d4b62ca2-09a0-4334622b-0e1c-03c42ba5
0
--------------------------------

i am trying to use re.split() method to split them , But unable to get
the exact result .

Any help on this is highly appriciated .

Thanks
Sumit
Dec 6 '07 #1
2 1322
On Dec 7, 2:21 am, Sumit <sumit.na...@gmail.comwrote:
Hi ,
I am trying to splitt a Line whihc is below of format ,

AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd
cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/
SelectProducts.aspx?
p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]
Because lines are mangled in transmission, it is rather difficult to
guess exactly what you have in your input and what your expected
results are.

Also you don't show exactly what you have tried.

At the end is a small script that contains my guess as to your input
and expected results, shows an example of what the re.VERBOSE flag is
intended for, and how you might debug your results.

So that you don't get your homework done 100% for free, I haven't
corrected the last mistake I made.

As usual, re may not be the best way of doing this exercise. Your
*single* piece of evidence may not be enough. It appears to be a
horrid conglomeration of instances of different things, each with its
own grammar. You may find that something like PyParsing would be more
legible and more robust.
>
Here all the string whihc i want to split is
---------------------------------
AzAccept
PLYSSTM01
[23/Sep/2005:16:14:28 -0500]
162.44.245.32
CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=customer,DC=rxcorp,DC=com"
GET
/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc
d4b62ca2-09a0-4334622b-0e1c-03c42ba5
0
--------------------------------

i am trying to use re.split() method to split them , But unable to get
the exact result .
C:\junk>type sumit.py
import re

textin = \
"""AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32
CN=dddd """ \
"""cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk """ \
"""Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/
performance/""" \
"""SelectProducts.aspx?""" \
"""p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5]
[0]"""

expected = [
"AzAccept",
"PLYSSTM01",
"23/Sep/2005:16:14:28 -0500",
"162.44.245.32",
"CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=custom
er,DC=rxcorp,DC=com",
"plysmhc03zp",
"GET",
"/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc",
"d4b62ca2-09a0-4334622b-0e1c-03c42ba5",
"0",
]

pattern = r"""
(\S+) # AzAccept
\s+
(\S+) # PLYSSTM01
\s+\[
([^]]+) # 23/Sep/2005:16:14:28 -0500
]\s+"
(\S+) # 162.44.245.32
\s+
([^"]+) # CN=dddd cojack (890),OU=1, etc etc,DC=rxcorp,DC=com
"\s+"
(\S+) # plysmhc03zp
\s+
(\S+) # GET
\s+
(\S+) # /mci/performance/ ... menu=adhoc
\s+\[
([^]]+) # d4b62ca2-09a0-4334622b-0e1c-03c42ba5
]\s+\[
([^]]+) # 0
]$
"""

mobj = re.match(pattern, textin, re.VERBOSE)
if not mobj:
print "Bzzzt!"
else:
result = mobj.groups()
print "len check", len(result) == len(expected), len(result),
len(expected)
for a, b in zip(result, expected):
print a == b, repr(a), repr(b)

C:\junk>python sumit.py
len check True 10 10
True 'AzAccept' 'AzAccept'
True 'PLYSSTM01' 'PLYSSTM01'
True '23/Sep/2005:16:14:28 -0500' '23/Sep/2005:16:14:28 -0500'
True '162.44.245.32' '162.44.245.32'
True 'CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=custo
mer,DC=rxcorp,DC=com' 'CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kk
k Secure,DC=customer,DC=rxcorp,DC=com'
True 'plysmhc03zp' 'plysmhc03zp'
True 'GET' 'GET'
False '/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc"'
'/mci/perf
ormance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc'
True 'd4b62ca2-09a0-4334622b-0e1c-03c42ba5'
'd4b62ca2-09a0-4334622b-0e1c-03c42ba
5'
True '0' '0'

C:\junk>
Dec 6 '07 #2
On Dec 6, 9:21 am, Sumit <sumit.na...@gmail.comwrote:
Hi ,
I am trying to splitt a Line whihc is below of format ,

AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd
cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/
SelectProducts.aspx?
p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]
As John Machin mentioned, pyparsing may be helpful to you. Here is a
simple version:

data = """AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500]
"162.44.245.32 CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kkk Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /
mci/performance/SelectProducts.aspx?
p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]"""

# Version 1 - simple
from pyparsing import *
LBRACK,RBRACK,COMMA = map(Suppress,"[],")
num = Word(nums)
date = Combine(num+"/"+Word(alphas)+"/"+num+":"+num+":"+num+":"+num) +
\
oneOf("+ -") + num
date.setParseAction(keepOriginalText)
uuid = delimitedList(Word(hexnums),"-",combine=True)
logString = Word(alphas,alphanums) + Word(alphas,alphanums) + \
LBRACK + date + RBRACK + quotedString + quotedString + \
LBRACK + uuid + RBRACK + LBRACK + Word(nums) + RBRACK

print logString.parseString(data)

Prints out:
['AzAccept', 'PLYSSTM01', '23/Sep/2005:16:14:28 -0500',
'"162.44.245.32 CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kkk Secure,DC=customer,DC=rxcorp,DC=com"', '"plysmhc03zp GET /
mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc"',
'd4b62ca2-09a0-4334622b-0e1c-03c42ba5', '0']
And here is a slightly fancier version, which parses the quoted
strings (uses the pprint pretty-printing module to show structure of
the parsed results):

# Version 2 - fancy
from pyparsing import *
LBRACK,RBRACK,COMMA = map(Suppress,"[],")
num = Word(nums)
date = Combine(num+"/"+Word(alphas)+"/"+num+":"+num+":"+num+":"+num) +
\
oneOf("+ -") + num
date.setParseAction(keepOriginalText)
uuid = delimitedList(Word(hexnums),"-",combine=True)

ipAddr = delimitedList(Word(nums),".",combine=True)
keyExpr=Word(alphas.upper())
valExpr=CharsNotIn(',')
qs1Expr = ipAddr + Group(delimitedList(Combine(keyExpr + '=' +
valExpr)))
def parseQS1(t):
return qs1Expr.parseString(t[0])
def parseQS2(t):
return t[0].split()

qs1 = quotedString.copy().setParseAction(removeQuotes, parseQS1)
qs2 = quotedString.copy().setParseAction(removeQuotes, parseQS2)

logString = Word(alphas,alphanums) + Word(alphas,alphanums) + \
LBRACK + date + RBRACK + qs1 + qs2 + \
LBRACK + uuid + RBRACK + LBRACK + Word(nums) + RBRACK

from pprint import pprint
pprint(logString.parseString(data).asList())

Prints:
['AzAccept',
'PLYSSTM01',
'23/Sep/2005:16:14:28 -0500',
'162.44.245.32',
['CN=dddd cojack (890)',
'OU=1',
'OU=Customers',
'OU=ISM-Users',
'OU=kkk Secure',
'DC=customer',
'DC=rxcorp',
'DC=com'],
'plysmhc03zp',
'GET',
'/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc',
'd4b62ca2-09a0-4334622b-0e1c-03c42ba5',
'0']

Find more about pyparsing at http://pyparsing.wikispaces.com.

-- Paul
Dec 7 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Joe | last post by:
hi, i have a search form that will only search the whole string when searching a query. i would like to have the search string split into separate words so that each word could be used as part...
20
by: Opettaja | last post by:
I am new to c# and I am currently trying to make a program to retrieve Battlefield 2 game stats from the gamespy servers. I have got it so I can retrieve the data but I do not know how to cut up...
15
by: Daren | last post by:
Hi, I need to be able to split large string variables into an array of lines, each line can be no longer than 70 chars. The string variables are text, so I would additionally like the lines...
7
by: Anat | last post by:
Hi, What regex do I need to split a string, using javascript's split method, into words-array? Splitting accroding to whitespaces only is not enough, I need to split according to whitespace,...
2
by: Anat | last post by:
Hi, I need a little help on performing string manipulation: I want to take a given string, and make certain words hyperlinks. For example: "Hello world, this is a wonderful day!" I'd like the...
12
by: Simon | last post by:
Well, the title's pretty descriptive; how would I be able to take a line of input like this: getline(cin,mostrecentline); And split into an (flexible) array of strings. For example: "do this...
17
by: Qiangning Hong | last post by:
I've got some strings to split. They are main words, but some words are inside a pair of brackets and should be considered as one unit. I prefer to use re.split, but haven't written a working one...
9
by: conspireagainst | last post by:
I'm having quite a time with this particular problem: I have users that enter tag words as form input, let's say for a photo or a topic of discussion. They are allowed to delimit tags with spaces...
2
by: shadow_ | last post by:
Hi i m new at C and trying to write a parser and a string class. Basicly program will read data from file and splits it into lines then lines to words. i used strtok function for splitting data to...
4
by: techusky | last post by:
I am making a website for a newspaper, and I am having difficulty figuring out how to take a string (the body of an article) and break it up into three new strings so that I can display them in the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.