468,514 Members | 977 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,514 developers. It's quick & easy.

Regular Expression 100% cpu - please help

I have an expression that when run uses 100% cpu for over 1minute.
I can change the expression so this does not happen, but couldsome one explain why this happens so that I don't do it again
expression -->
Departing:</td>.*?</span>(?<departingAirportName>.*?)\(
(?<airportCode>\w+)\).*?<li>
(?<departingCity>[\w\s]+),\s*
(?<departingCountry>[\w\s]+).*?
(?<departingTimeHours>\d+):
(?<departingTimeMins>\d+).*?Arriving:.*?</span>
(?<arrivalAirportName>.*?)\(
(?<arrivelAirportCode>\w+)\).*?<li>
(?<arrivalCity>[\w\s]+),\s*
(?<arrivalCountry>[\w\s]+).*?
(?<arrivalTimeHours>\d+):
(?<arrivalTimeMins>\d+).*?href=".*?\(
(?<linkURL>.*?)\).*?>
(?<carrier>[\w\s]+)\(
(?<flightNumber>.*?)\).*?
text to search -->
l10 bb2"><span class="textBold">Wed 16 March 05</span>, 1stop(s)</td>
</tr>
<tr class="h32 dotsbottom canvas">
<td class="text l10">Duration:</td>
<td class="textBold l10">14h00</td>
</tr>
<tr height="60">
<td class="text l10 t15 vtop">Departing:</td>
<td class="text l10 t15 vtop">
<ul class="list">
<li>
<span class="bold">
</span>
Newcastle Int'l (NCL),</li><li>Newcastle, United Kingdom
</li>
<li>
<span class="bold">12:05</span> Wed
</li>
</ul>
</td>
</tr>
<tr height="60">
<td class="text l10 t15 vtop">Arriving:</td>
<td class="text l10 t15 vtop">
<ul class="list">
<li>
<span class="bold">
Terminal 1,
</span>
Heathrow (LHR),</li><li>London, United Kingdom
</li>
<li>
<span class="bold">13:20</span> Wed
</li>
</ul>
</td>
</tr>
<tr>
<td class="textBold l10 t10 b10 vtop"><imgsrc='/images/en/FE/BE/Tailfin/smBA.gif' alt="tailfin" width="30"height="25" alt="" /></td>
<td class="text l10 t10 b10">
<ul class="list">
<li>
Non-stop
</li>
<li>
<ahref="javascript:popupWithNoReturn('/otpbvpl/Jsp/opodo/FlifoInfoServlet?BV_SessionID=@@@@1016786011.11105 64120@@@@&BV_EngineID=ccdeaddediigiijcefecenhdhhld fnk.0&locale=en_GB&FLIGHT_NUMBER=1327&AIRLINE_CODE =BA&B_DATE=200503161205', 'opodo', 700, 450)"class="link">British Airways (BA 1327) ></a>
</li>
<li>
Airplane type - 320
</li>
<li>
Economy restricted
</li>
<li>
<script type="text/javascript" language="JavaScript">
// work around for netscape 7.0.1/2 encoded characters in link
var eticketURL ='http://www.opodo.co.uk:80/otpbvpl/Global/Page/logObs_FS.jsp?locale=en_GB&FrameSetRequired=Yes&sL oc=https%3A%2F%2Fopodouk.custhelp.com%2Fcgi-bin%2Fopodouk.cfg%2Fphp%2Fenduser%2Fstd_alp.php%3F p_prod_lvl1%3D1%26p_prod_lvl2%3D30&sURLType=RightN ow';
document.write('<ahref="javascript:popupWithNoRetu rn(eticketURL,\\'faq\\',750,600)" class="link">e-ticket available ></a>');
</script>
</li>
<li>
</li>
</ul>
</td>
</tr>
<tr class="h32">
<td class="textBold l10 beigeBG dotstop bb2"colspan="2">Connection:</td>
</tr>
<tr height="152">
<td class="text l10 t10 b10 beigeBG dotsbottom vtop">
<ul class="list">
<li>

</li>
<li>

</li>
<li>

</li>
<li>

</li>
</ul>
</td>
<td class="text l10 t10 b10 beigeBG dotsbottom vtop"width="100%">
<ul class="list">
<li><span class="bold">13:20</span> Wed - <spanclass="bold">14:35</span> Wed</li>
<li>
Change plane
</li>
<li class="pt10">
Stop-over duration: 1h15
</li>
</ul>
</td>
</tr>
<tr height="60">
<td class="text l10 t15 vtop">Departing:</td>
<td class="text l10 t15 vtop">
<ul class="list">
<li>
<span class="bold">
Terminal 1,
</span>
Heathrow (LHR),</li><li>London, United Kingdom
</li>
<li>
<span class="bold">14:35</span> Wed
</li>
</ul>
</td>
</tr>
<tr height="60">
<td class="text l10 t15 vtop">Arriving:</td>
<td class="text l10 t15 vtop">
<ul class="list">
<li>
<span class="bold">
Terminal 1,
</span>
Narita (NRT),</li><li>Tokyo, Japan
</li>
<li>
<span class="bold">11:05</span> Thu
</li>
</ul>
</td>
</tr>

<tr>
<td class="textBold l10 t10 b10 vtop"><imgsrc='/images/en/FE/BE/Tailfin/smBA.gif' alt="tailfin" width="30"height="25" alt="" /></td>
<td class="text l10 t10 b10">
<ul class="list">
<li>
Non-stop
</li>
<li>
<ahref="javascript:popupWithNoReturn('/otpbvpl/Jsp/opodo/FlifoInfoServlet?BV_SessionID=@@@@1016786011.11105 64120@@@@&BV_EngineID=ccdeaddediigiijcefecenhdhhld fnk.0&locale=en_GB&FLIGHT_NUMBER=7&AIRLINE_CODE=BA &B_DATE=200503161435', 'opodo', 700, 450)"class="link">British Airways (BA 7) ></a>
</li>
<li>
Airplane type - 744

</li>
<li>
World Traveller Plus
</li>
<li>
<script type="text/javascript" language="JavaScript">
// work around for netscape 7.0.1/2 encoded characters in link
var eticketURL ='http://www.opodo.co.uk:80/otpbvpl/Global/Page/logObs_FS.jsp?locale=en_GB&FrameSetRequired=Yes&sL oc=https%3A%2F%2Fopodouk.custhelp.com%2Fcgi-bin%2Fopodouk.cfg%2Fphp%2Fenduser%2Fstd_alp.php%3F p_prod_lvl1%3D1%26p_prod_lvl2%3D30&sURLType=RightN ow';
document.write('<ahref="javascript:popupWithNoRetu rn(eticketURL,\\'faq\\',750,600)" class="link">e-ticket available ></a>');
</script>
</li>
<li>
</li>
</ul>
</td>
</tr>

--------------------------------
From: Gareth James

-----------------------
Posted by a user from .NET 247 (http://www.dotnet247.com/)

<Id>OM9NFlkZgkaV6dNmUKo5Ag==</Id>
Jul 21 '05 #1
1 1670
"Gareth James via .NET 247" <an*******@dotnet247.com> wrote in
news:ev**************@TK2MSFTNGP10.phx.gbl...
I have an expression that when run uses 100% cpu for over 1 minute.
I can change the expression so this does not happen, but could some one
explain why this > happens so that I don't do it again

I just entered your expression and your sample string into expresso, and it
didn't take a second to run it. I guess it takes forever on *other* input
string, am I right? e.g. if the expression can't find a match, right?

I'm not 100% certain, but I think things like this : "(...[\w\s]+).*?" will
make it take forever: The [\w\s]+ part will first attempt to match all
word/space characters that follow (like "United Kingdom"), then ".*?" will
eat up any number of characters until the next subexpression matches. Now,
if any subexpression after this one doesn't match, the regex will have to
backtrack: It'll match "United Kingdo" for [\w\s]+, and try all the
subexpressions after this one again, and so on. Now, if you have more than
one expression of that kind, every possible combination will be tried, which
will take some time...

If you can, don't use ".*" or ".*?" at all, because they often lead to much
more possible combinations than you really want. Use some more suitable
character class if there is one. Also, you can (I think) forbid backtracking
on many of those subexpressions: use a greedy subexpression like
(?>[\w\s]+).
or
(?>Departing:</td>.*?</span>(?<departingAirportName>.*?)\()
This way the engine won't look at these subexpressions once it has found a
match for them.

Niki
Jul 21 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

6 posts views Thread by paulsmith5 | last post: by
3 posts views Thread by giulio santorini | last post: by
4 posts views Thread by Buddy | last post: by
4 posts views Thread by Egyd Csaba | last post: by
1 post views Thread by Gareth James via .NET 247 | last post: by
2 posts views Thread by Brian Kitt | last post: by
3 posts views Thread by LordHog | last post: by
7 posts views Thread by hellbent4u | last post: by
reply views Thread by NPC403 | last post: by
1 post views Thread by fmendoza | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.