473,405 Members | 2,279 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Regex: How do I match...

3
...all but the first x chars in a string of arbitrary length?

Apologies if this is the wrong forum; I wasn't sure the best place to post about Regex.

Background: I am new to Regex for pattern matching.
I have started using Yahoo Pipes to manipulate RSS feeds. Yahoo Pipes includes a regex gadget to do string manipulation, but no "stateful" programming.

My goal is to truncate a string of arbitrary length to a fixed length (of, say, the first 15 chars), to produce something like:
"Man bites dog..." from a starting string that looks like "Man bites dog in dark alley."
Counting from the end of the string and doing it from that direction is a problem, because each string will be of a different length.
The Yahoo Pipes regex parser only allows the replacement of static text (including deletions) for matched text.

So I can find the first 10 chars as ^.{10} but I'm struggling with negation and other tricks in order to match all but the first x chars!

Is this an easy problem for a resident regex expert?
Thanks in advance to anyone who can crack this for me!

Paul M.
Apr 23 '07 #1
6 3708
bartonc
6,596 Expert 4TB
...all but the first x chars in a string of arbitrary length?

Apologies if this is the wrong forum; I wasn't sure the best place to post about Regex.

Background: I am new to Regex for pattern matching.
I have started using Yahoo Pipes to manipulate RSS feeds. Yahoo Pipes includes a regex gadget to do string manipulation, but no "stateful" programming.

My goal is to truncate a string of arbitrary length to a fixed length (of, say, the first 15 chars), to produce something like:
"Man bites dog..." from a starting string that looks like "Man bites dog in dark alley."
Counting from the end of the string and doing it from that direction is a problem, because each string will be of a different length.
The Yahoo Pipes regex parser only allows the replacement of static text (including deletions) for matched text.

So I can find the first 10 chars as ^.{10} but I'm struggling with negation and other tricks in order to match all but the first x chars!

Is this an easy problem for a resident regex expert?
Thanks in advance to anyone who can crack this for me!

Paul M.
I just go a great book: Mastering Regular Expressions. I'll look and see if you can match the end of a string that has no newline at the end.

PS Sorry your not a Pythoneer.
Apr 23 '07 #2
ghostdog74
511 Expert 256MB
can you provide samples of input strings and your desired output.? I have problem understanding your requirement. At first i thought you could simply get the first x chars off , like
Expand|Select|Wrap|Line Numbers
  1. >>> s = "Man bites dog in dark alley"
  2. >>> s[15:]
  3. 'n dark alley'
  4.  
then use whatever regexp you need on this result. but i suspect its not so simple , right?
Apr 23 '07 #3
PaulM
3
can you provide samples of input strings and your desired output.? I have problem understanding your requirement. At first i thought you could simply get the first x chars off , like
Expand|Select|Wrap|Line Numbers
  1. >>> s = "Man bites dog in dark alley"
  2. >>> s[15:]
  3. 'n dark alley'
  4.  
then use whatever regexp you need on this result. but i suspect its not so simple , right?
OK, real examples include the following RSS feed titles from boardgamenews.com:

Test Your Gaming Knowledge
More About Chicago Poker from Bruno Faidutti
Gone Cardboard News: BattleLore Call to Arms Rules Available for Downloading

The desired output (assuming 16 chars kept) would be:

Test Your Gaming...
More About Chica...
Gone Cardboard N...

(BTW, don't worry about the '...'; this can be easily appended to the resulting string once everything but the first 16 chars is deleted.)

The trouble is that the length of each string is arbitrary, and the regex widget does not allow me to count the chars and then work from this; ie., it is effectively stateless.
So no obvious scripting is possible, such as s=right(s,16,(length(s)-16)), or s=substr(s,16,$) etc. But basically, I want to achieve the same effect in a single regex statement.
Apr 23 '07 #4
ghostdog74
511 Expert 256MB
well, you don't need regular expression in this case as its overkill. If you have those sample data in a file, called file, you can do this:
Expand|Select|Wrap|Line Numbers
  1. for line in open("file"):
  2.     print line[:16]
  3.  
output:
Expand|Select|Wrap|Line Numbers
  1. # ./test.py
  2. Test Your Gaming
  3. More About Chica
  4. Gone Cardboard N
  5.  
If you are inclined to regular expression,
Expand|Select|Wrap|Line Numbers
  1. import re
  2. for line in open("file"):
  3.     print re.findall("^.{16}",line)[0]
  4.  
output:
Expand|Select|Wrap|Line Numbers
  1. # ./test.py
  2. Test Your Gaming
  3. More About Chica
  4. Gone Cardboard N
  5.  
Apr 23 '07 #5
PaulM
3
Thanks GhostDog, I appreciate the reply, although that doesn't seem solve the specific issue I've had.
The challenge is that Yahoo Pipes does not provide the depth of functionality that any other scripting language might (which means I can't actually carry out the simple function you describe).

However, in the meantime I have discovered more about the regex match-replace widget that is not (yet) documented, and this has in fact allowed me to solve my problem.

In particular, it seems one can parenthesise tokens in the match query and replace them with $1 in the replace field.
So match (^.{16}).*$ replace $1...
for string: "Gone Cardboard News: BattleLore Call to Arms Rules Available for Downloading"
returns:
"Gone Cardboard N..."
which is exactly the result I was after, and perhaps even does the same thing internally that you describe (it was just non-intuitive and undocumented).

Furthermore, match (^.{25}).*(\([^(]*\)$) replace $1... $2
for string: "Gone Cardboard News: BattleLore Call to Arms Rules Available for Downloading (Boardgamenews, 2007-4-22)"
returns:
"Gone Cardboard News: Batt... (Boardgamenews, 2007-4-22)"

...which is even better.

Thanks again,
Paul M.
Apr 23 '07 #6
Hey Paul!

I'm a Yahoo Pipes person as well, found your posts while googling for regex answers on the same problem you had. Couldn't get your string to work for me; any chance you can post your Pipe link here, and I could poke around the source?

thanks much in advance...!
d
Jul 10 '07 #7

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Daniel Billingsley | last post by:
First, if MSFT is listening I'll say IMO the MSDN material is sorely lacking in this area... it's just a whole bunch of information thrown at you and you're left to yourself as to organizing it in...
3
by: Jeff McPhail | last post by:
I am using Regex.Match in a large application and the memory is growing out of control. I have tried several ways to try and release the memory and none of them work. Here are some similar examples...
20
by: jeevankodali | last post by:
Hi I have an .Net application which processes thousands of Xml nodes each day and for each node I am using around 30-40 Regex matches to see if they satisfy some conditions are not. These Regex...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
5
by: Kofi | last post by:
Any takers? Got a string of DNA as an input sequence GGATGGATG, apply the simple regex "GGATG" as in Regex r = new Regex("GGATG", (RegexOptions.Compiled)); MatchCollection matches =...
3
by: spamsickle | last post by:
I have a Perl background, so some of what I know in other contexts is probably getting in the way of what I need to learn now. With that said, I'm having a problem getting my regex to work as I...
3
by: aspineux | last post by:
My goal is to write a parser for these imaginary string from the SMTP protocol, regarding RFC 821 and 1869. I'm a little flexible with the BNF from these RFC :-) Any comment ? tests= def...
1
by: jonnyboy6969 | last post by:
Hi All Really hoping someone can help me out here with my deficient regex skills :) I have a function which takes a string of HTML and replaces a term (word or phrase) with a link. The pupose...
1
by: al.moorthi | last post by:
the below program is working in Suse and not working on Cent 5: can any body have the solution ? #include <regex.h> #include <stdlib.h> #include <stdio.h> int main(){ char cool =...
4
by: seberino | last post by:
I'm looking over the docs for the re module and can't find how to "NOT" an entire regex. For example..... How make regex that means "contains regex#1 but NOT regex#2" ? Chris
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.