473,387 Members | 1,650 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

ereg - regexp for NOT matching certain filename extensions



Is there some way of using ereg to detect when certain filename extensions
are supplied and to return false if so, WITHOUT using the ! operator
before ereg () ?

I have an API that allows as an input a regular expression, enabling the
administrator to ensure a file upload matches a certain pattern. For
instance, supplying the string

'.exe$|.com$|.bat$|.zip$|.doc$'

means that the file must end with any of these five extensions.

Is there a way that the regexp could be rewritten to say that the file
must NOT end with any of these, without changing the ereg to !ereg - I
can't do the latter because it's within the class.

Any ideas?
Martin

Jul 17 '05 #1
3 20957
On Mon, 26 Jan 2004 18:12:59 +0000, Martin Lucas-Smith <mv***@cam.ac.uk> wrote:
Is there some way of using ereg to detect when certain filename extensions
are supplied and to return false if so, WITHOUT using the ! operator
before ereg () ?

I have an API that allows as an input a regular expression, enabling the
administrator to ensure a file upload matches a certain pattern. For
instance, supplying the string

'.exe$|.com$|.bat$|.zip$|.doc$'

means that the file must end with any of these five extensions.

Is there a way that the regexp could be rewritten to say that the file
must NOT end with any of these, without changing the ereg to !ereg - I
can't do the latter because it's within the class.


Not neatly; that'd require a negative lookahead assertion, which is only
supported in Perl-compatible regexes. Or just using ! ... ;-p

I suppose you could take the perverse approach of enumerating all other
three-letter extensions, except those. So, have a series of three character
classes containing all but the 1st, 2nd then 3rd character of each extension.
But you could only check one extension at a time; if you had an alternation,
it'd always match (if it doesn't match the complement of one extension's three
characters, then it must match on of the other patterns).

e.g. for matching extensions except .exe, letting 0,1,2 and 4+ letter
extensions through:

\.[^eE][^xX][^eE]$|\..{0,2}$|\..{4,}$

(yuk!)

--
Andy Hassall <an**@andyh.co.uk> / Space: disk usage analysis tool
<http://www.andyh.co.uk> / <http://www.andyhsoftware.co.uk/space>
Jul 17 '05 #2
Andy Hassall wrote:
On Mon, 26 Jan 2004 18:12:59 +0000, Martin Lucas-Smith <mv***@cam.ac.uk> wrote:
Is there some way of using ereg to detect when certain filename extensions
are supplied and to return false if so, WITHOUT using the ! operator
before ereg () ?


Not neatly; that'd require a negative lookahead assertion, which is only
supported in Perl-compatible regexes.


But wouldn't it require more than one assertion? You can't merely
apply one negative lookahead assertion to the characters following a
FULL STOP, because if the filename contains more than one FULL STOP,
and the characters after the last FULL STOP constitute a forbidden
extension, the pattern would match. For example, imagining "exe" is
the only forbidden extension, then

$string = 'foo.bar.exe';
if (preg_match('`\.(?!exe$)`i',$string))

would return true, since there is present a FULL STOP that isn't
immediately followed by the anchored character sequence "exe".

What you'd need to do if you want to check filename extensions would
be to apply two assertions: one positive lookahead assertion, making
sure the characters following the FULL STOP are at the end of the
string, ensuring that you're dealing with the filename extension and
not another part of the filename; and one negative lookahead
assertion, making sure those characters don't constitute a forbidden
extension. Now, for one- to four-letter extensions,

$string = 'foo.bar.exe';
if (preg_match('`\.(?=[a-z]{1,4}$)(?!exe)`i',$string))

, where the character class denotes possible characters in filename
extensions, will return false.

That's all hypothetical of course, because we're saved by the NOT
operator. Please castigate me for any errors.

--
Jock
Jul 17 '05 #3
On Mon, 26 Jan 2004 22:14:57 -0000, John Dunlop <jo*********@johndunlop.info>
wrote:
Andy Hassall wrote:
On Mon, 26 Jan 2004 18:12:59 +0000, Martin Lucas-Smith <mv***@cam.ac.uk> wrote:
>Is there some way of using ereg to detect when certain filename extensions
>are supplied and to return false if so, WITHOUT using the ! operator
>before ereg () ?
Not neatly; that'd require a negative lookahead assertion, which is only
supported in Perl-compatible regexes.


But wouldn't it require more than one assertion? You can't merely
apply one negative lookahead assertion to the characters following a
FULL STOP, because if the filename contains more than one FULL STOP,
and the characters after the last FULL STOP constitute a forbidden
extension, the pattern would match. For example, imagining "exe" is
the only forbidden extension, then

$string = 'foo.bar.exe';
if (preg_match('`\.(?!exe$)`i',$string))

would return true, since there is present a FULL STOP that isn't
immediately followed by the anchored character sequence "exe".

What you'd need to do if you want to check filename extensions would
be to apply two assertions: one positive lookahead assertion, making
sure the characters following the FULL STOP are at the end of the
string, ensuring that you're dealing with the filename extension and
not another part of the filename; and one negative lookahead
assertion, making sure those characters don't constitute a forbidden
extension. Now, for one- to four-letter extensions,

$string = 'foo.bar.exe';
if (preg_match('`\.(?=[a-z]{1,4}$)(?!exe)`i',$string))

, where the character class denotes possible characters in filename
extensions, will return false.


Indeed :-) Perhaps even, removing the 1-4 char restriction:

/\.(?=[^.]+$)(?!bad$|worse$|evil$)/i

i.e. a '.' followed by a sequence of one or more non-dots up to the end of the
string, where that sequence is not any of 'bad', 'evil' or 'worse', each
followed by end of string.

So putting it all together:

<pre>
<?php
$goodExts = array('c', 'h', 'jpeg', 'png', 'torrent', 'xyz', 'z');
$badExts = array('exe', 'com', 'bat', 'doc', 'vbscript', 'x', 'zyx');

$re = '/\.(?=[^.]+$)(?!' .
join('|',
array_map(create_function('$a', 'return $a."$";'),
$badExts)) .
')/i';

print("regex = $re\n\n");

$allExts = array_merge($goodExts, $badExts);
$fileNames = array('thingy', 'foo', 'weasel', 'earwig');

for ($i=0; $i<42; $i++) {
$str = $fileNames[array_rand($fileNames)];

for ($j=0; $j < mt_rand(1,3); $j++)
$str .= '.' . $allExts[array_rand($allExts)];

$matched = preg_match($re, $str);

printf("%-64s %s\n",
$str,
$matched ? 'match' : '<b>no match</b>');
}

?>
</pre>

It rejects files without an extension, though.
That's all hypothetical of course, because we're saved by the NOT
operator. Please castigate me for any errors.


A single ! character vs. the insanity above... hmm.

--
Andy Hassall <an**@andyh.co.uk> / Space: disk usage analysis tool
<http://www.andyh.co.uk> / <http://www.andyhsoftware.co.uk/space>
Jul 17 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Stefan Gangefors | last post by:
I'm trying to figure out what I'm doing wrong when using ereg(). This is my regexp: ereg("^]$", "]"); and that does'n work, but this does: ereg("^$", "[");
19
by: Magnus Lie Hetland | last post by:
I'm working on a project (Atox) where I need to match quite a few regular expressions (several hundred) in reasonably large text files. I've found that this can easily get rather slow. (There are...
3
by: Jane Doe | last post by:
Hello, I need to browse a list of hyperlinks, each followed by an author, and remove the links only for certain authors. 1. I searched the archives on Google, but didn't find how to tell the...
3
by: news | last post by:
I'm trying to make sure a form has only one or two digits in either of two fields. I looked at php.net and http://www.regular-expressions.info/reference.html and this is what I put together, but...
3
by: jasonkester | last post by:
Just a heads up for anybody that comes across this in the future. Noticed a strange behavior in RegExp.test() today. Check out the following code. It will alternately display "chokes" and null,...
18
by: yawnmoth | last post by:
Say I have the following script: <? $string = 'test'; if (eregi("^+$",$string)) { echo 'matches!'; } else {
6
by: millw0rm | last post by:
why i m getting error here??? Warning: ereg(): REG_BADBR code: if(!ereg("^(.){15,400}$",$string)) { $errormsg = "- Must be more then 15 Characters & less then 400 Characters"; }
5
by: gentsquash | last post by:
In a setting where I can specify only a JS regular expression, but not the JS code that will use it, I seek a regexp component that matches a string of letters, ignoring case. E.g, for "cat" I'd...
2
by: ahgan | last post by:
Hi I'm new here. I'm trying to put some code to copy files with specific patterns/keywords from a target location. I encountered a pattern matching (regexp) issue where the filename didn't...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.