splitting a words of a line

Sumit

Hi ,
I am trying to splitt a Line whihc is below of format ,

AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd
cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/
SelectProducts.aspx?
p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]

Here all the string whihc i want to split is
---------------------------------
AzAccept
PLYSSTM01
[23/Sep/2005:16:14:28 -0500]
162.44.245.32
CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=customer,DC=rxcorp,DC=com"
GET
/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc
d4b62ca2-09a0-4334622b-0e1c-03c42ba5
0
--------------------------------

i am trying to use re.split() method to split them , But unable to get
the exact result .

Any help on this is highly appriciated .

Thanks
Sumit

Dec 6 '07 #1

Subscribe Post Reply

1322

John Machin

On Dec 7, 2:21 am, Sumit <sumit.na...@gmail.comwrote:

Hi ,
I am trying to splitt a Line whihc is below of format ,

AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd
cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/
SelectProducts.aspx?
p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]

Because lines are mangled in transmission, it is rather difficult to
guess exactly what you have in your input and what your expected
results are.

Also you don't show exactly what you have tried.

At the end is a small script that contains my guess as to your input
and expected results, shows an example of what the re.VERBOSE flag is
intended for, and how you might debug your results.

So that you don't get your homework done 100% for free, I haven't
corrected the last mistake I made.

As usual, re may not be the best way of doing this exercise. Your
*single* piece of evidence may not be enough. It appears to be a
horrid conglomeration of instances of different things, each with its
own grammar. You may find that something like PyParsing would be more
legible and more robust.

>
Here all the string whihc i want to split is
---------------------------------
AzAccept
PLYSSTM01
[23/Sep/2005:16:14:28 -0500]
162.44.245.32
CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=customer,DC=rxcorp,DC=com"
GET
/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc
d4b62ca2-09a0-4334622b-0e1c-03c42ba5
0
--------------------------------

i am trying to use re.split() method to split them , But unable to get
the exact result .

C:\junk>type sumit.py
import re

textin = \
"""AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32
CN=dddd """ \
"""cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk """ \
"""Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/
performance/""" \
"""SelectProducts.aspx?""" \
"""p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5]
[0]"""

expected = [
"AzAccept",
"PLYSSTM01",
"23/Sep/2005:16:14:28 -0500",
"162.44.245.32",
"CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=custom
er,DC=rxcorp,DC=com",
"plysmhc03zp",
"GET",
"/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc",
"d4b62ca2-09a0-4334622b-0e1c-03c42ba5",
"0",
]

pattern = r"""
(\S+) # AzAccept
\s+
(\S+) # PLYSSTM01
\s+\[
([^]]+) # 23/Sep/2005:16:14:28 -0500
]\s+"
(\S+) # 162.44.245.32
\s+
([^"]+) # CN=dddd cojack (890),OU=1, etc etc,DC=rxcorp,DC=com
"\s+"
(\S+) # plysmhc03zp
\s+
(\S+) # GET
\s+
(\S+) # /mci/performance/ ... menu=adhoc
\s+\[
([^]]+) # d4b62ca2-09a0-4334622b-0e1c-03c42ba5
]\s+\[
([^]]+) # 0
]$
"""

mobj = re.match(pattern, textin, re.VERBOSE)
if not mobj:
print "Bzzzt!"
else:
result = mobj.groups()
print "len check", len(result) == len(expected), len(result),
len(expected)
for a, b in zip(result, expected):
print a == b, repr(a), repr(b)

C:\junk>python sumit.py
len check True 10 10
True 'AzAccept' 'AzAccept'
True 'PLYSSTM01' 'PLYSSTM01'
True '23/Sep/2005:16:14:28 -0500' '23/Sep/2005:16:14:28 -0500'
True '162.44.245.32' '162.44.245.32'
True 'CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=custo
mer,DC=rxcorp,DC=com' 'CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kk
k Secure,DC=customer,DC=rxcorp,DC=com'
True 'plysmhc03zp' 'plysmhc03zp'
True 'GET' 'GET'
False '/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc"'
'/mci/perf
ormance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc'
True 'd4b62ca2-09a0-4334622b-0e1c-03c42ba5'
'd4b62ca2-09a0-4334622b-0e1c-03c42ba
5'
True '0' '0'

C:\junk>

Dec 6 '07 #2

Paul McGuire

On Dec 6, 9:21 am, Sumit <sumit.na...@gmail.comwrote:

Hi ,
I am trying to splitt a Line whihc is below of format ,

AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd
cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/
SelectProducts.aspx?
p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]

As John Machin mentioned, pyparsing may be helpful to you. Here is a
simple version:

data = """AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500]
"162.44.245.32 CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kkk Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /
mci/performance/SelectProducts.aspx?
p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]"""

# Version 1 - simple
from pyparsing import *
LBRACK,RBRACK,COMMA = map(Suppress,"[],")
num = Word(nums)
date = Combine(num+"/"+Word(alphas)+"/"+num+":"+num+":"+num+":"+num) +
\
oneOf("+ -") + num
date.setParseAction(keepOriginalText)
uuid = delimitedList(Word(hexnums),"-",combine=True)
logString = Word(alphas,alphanums) + Word(alphas,alphanums) + \
LBRACK + date + RBRACK + quotedString + quotedString + \
LBRACK + uuid + RBRACK + LBRACK + Word(nums) + RBRACK

print logString.parseString(data)

Prints out:
['AzAccept', 'PLYSSTM01', '23/Sep/2005:16:14:28 -0500',
'"162.44.245.32 CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kkk Secure,DC=customer,DC=rxcorp,DC=com"', '"plysmhc03zp GET /
mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc"',
'd4b62ca2-09a0-4334622b-0e1c-03c42ba5', '0']
And here is a slightly fancier version, which parses the quoted
strings (uses the pprint pretty-printing module to show structure of
the parsed results):

# Version 2 - fancy
from pyparsing import *
LBRACK,RBRACK,COMMA = map(Suppress,"[],")
num = Word(nums)
date = Combine(num+"/"+Word(alphas)+"/"+num+":"+num+":"+num+":"+num) +
\
oneOf("+ -") + num
date.setParseAction(keepOriginalText)
uuid = delimitedList(Word(hexnums),"-",combine=True)

ipAddr = delimitedList(Word(nums),".",combine=True)
keyExpr=Word(alphas.upper())
valExpr=CharsNotIn(',')
qs1Expr = ipAddr + Group(delimitedList(Combine(keyExpr + '=' +
valExpr)))
def parseQS1(t):
return qs1Expr.parseString(t[0])
def parseQS2(t):
return t[0].split()

qs1 = quotedString.copy().setParseAction(removeQuotes, parseQS1)
qs2 = quotedString.copy().setParseAction(removeQuotes, parseQS2)

logString = Word(alphas,alphanums) + Word(alphas,alphanums) + \
LBRACK + date + RBRACK + qs1 + qs2 + \
LBRACK + uuid + RBRACK + LBRACK + Word(nums) + RBRACK

from pprint import pprint
pprint(logString.parseString(data).asList())

Prints:
['AzAccept',
'PLYSSTM01',
'23/Sep/2005:16:14:28 -0500',
'162.44.245.32',
['CN=dddd cojack (890)',
'OU=1',
'OU=Customers',
'OU=ISM-Users',
'OU=kkk Secure',
'DC=customer',
'DC=rxcorp',
'DC=com'],
'plysmhc03zp',
'GET',
'/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc',
'd4b62ca2-09a0-4334622b-0e1c-03c42ba5',
'0']

Find more about pyparsing at http://pyparsing.wikispaces.com.

-- Paul

Dec 7 '07 #3

by: Joe | last post by:

hi, i have a search form that will only search the whole string when searching a query. i would like to have the search string split into separate words so that each word could be used as part...

ASP / Active Server Pages

Splitting up a string

by: Opettaja | last post by:

I am new to c# and I am currently trying to make a program to retrieve Battlefield 2 game stats from the gamespy servers. I have got it so I can retrieve the data but I do not know how to cut up...

C# / C Sharp

Splitting a large string variable into lines <= 70 chars

by: Daren | last post by:

Hi, I need to be able to split large string variables into an array of lines, each line can be no longer than 70 chars. The string variables are text, so I would additionally like the lines...

Visual Basic .NET

Splitting string into word array - regular expression

by: Anat | last post by:

Hi, What regex do I need to split a string, using javascript's split method, into words-array? Splitting accroding to whitespaces only is not enough, I need to split according to whitespace,...

Javascript

String splitting without loosing punctuation marks

by: Anat | last post by:

Hi, I need a little help on performing string manipulation: I want to take a given string, and make certain words hyperlinks. For example: "Hello world, this is a wonderful day!" I'd like the...

Javascript

Splitting a string into an array words

by: Simon | last post by:

Well, the title's pretty descriptive; how would I be able to take a line of input like this: getline(cin,mostrecentline); And split into an (flexible) array of strings. For example: "do this...

C / C++

splitting words with brackets

by: Qiangning Hong | last post by:

I've got some strings to split. They are main words, but some words are inside a pair of brackets and should be considered as one unit. I prefer to use re.split, but haven't written a working one...

Python

Regex needed for splitting on commas (not inside quotes)

by: conspireagainst | last post by:

I'm having quite a time with this particular problem: I have users that enter tag words as form input, let's say for a photo or a topic of discussion. They are allowed to delimit tags with spaces...

PHP

Splitting function

by: shadow_ | last post by:

Hi i m new at C and trying to write a parser and a string class. Basicly program will read data from file and splits it into lines then lines to words. i used strtok function for splitting data to...

C / C++

Splitting a string into three parts

by: techusky | last post by:

I am making a website for a newspaper, and I am having difficulty figuring out how to take a string (the body of an article) and break it up into three new strings so that I can display them in the...

PHP

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

splitting a words of a line

Similar topics