By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,067 Members | 1,818 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,067 IT Pros & Developers. It's quick & easy.

Image and text: howto see the difference

P: n/a
Hi,
Depending on if I get an image or a text of a certain URL, I want to do
something different. I don't know in advance whether I'll get an image
or a text.

This is a URL that returns an image:
http://indicator.amessage.info/indic...mp;param4=.png

This is one that returns text:
http://indicator.amessage.info/indic...mp;param4=.png
How could I see the difference between the 2 with PHP code?
Hoping that sb. can get me out of this,
greetings,
Mattias

Jul 17 '05 #1
Share this Question
Share on Google+
13 Replies


P: n/a
Mattias Campe wrote:

Hi,

Depending on if I get an image or a text of a certain URL, I want to do
something different. I don't know in advance whether I'll get an image
or a text.

This is a URL that returns an image:
http://indicator.amessage.info/indic...mp;param4=.png

This is one that returns text:
http://indicator.amessage.info/indic...mp;param4=.png

How could I see the difference between the 2 with PHP code?


Use fsockopen and a regex to search for Content-Type in the header.

Regards,
Shawn
--
Shawn Wilson
sh***@glassgiant.com
http://www.glassgiant.com
Jul 17 '05 #2

P: n/a
The link look the same, only difference i there is two "o" in coobnet i nthe
second .. ???.... if so, you can do if ($_GET["param1"] ==
"coobnet%40jabber.org") {...} else {}

If it was just a mistaping then use the Content-Type in header();

Savut

"Mattias Campe" <Ma******************************@UGent.be> wrote in message
news:bq**********@gaudi2.UGent.be...
Hi,
Depending on if I get an image or a text of a certain URL, I want to do
something different. I don't know in advance whether I'll get an image
or a text.

This is a URL that returns an image:
http://indicator.amessage.info/indic...mp;param4=.png
This is one that returns text:
http://indicator.amessage.info/indic...mp;param4=.png

How could I see the difference between the 2 with PHP code?
Hoping that sb. can get me out of this,
greetings,
Mattias

Jul 17 '05 #3

P: n/a
Shawn Wilson wrote:
Mattias Campe wrote:
Hi,

Depending on if I get an image or a text of a certain URL, I want to do
something different. I don't know in advance whether I'll get an image
or a text.

This is a URL that returns an image:
http://indicator.amessage.info/indic...mp;param4=.png

This is one that returns text:
http://indicator.amessage.info/indic...mp;param4=.png

How could I see the difference between the 2 with PHP code?

Use fsockopen and a regex to search for Content-Type in the header.


thx a lot! After figuring out how fsockopen worked, I managed to make it
work like this:

$fp=fsockopen("indicator.amessage.info",80, $errno, $errstr);
if (!$fp) {
echo "$errstr ($errno)<br>\n";
} else {
fputs ($fp, "HEAD
/indicator.php?param1=cobnet%40jabber.org&amp;param 2=bounce&amp;param3=http%3A%2F%2Fstudent.ugent.be% 2Fastrid%2Fpics%2Fjabber%2F&amp;param4=.png
HTTP/1.0\r\nHost: indicator.amessage.info\r\n\r\n");
$string = "";
while (!feof($fp)) {
$string = $string.fgets ($fp,128);
}
echo $string;
strstr($string,"Content-Type: image/png");
if (strpos($string,"Content-Type: image/png") != "")
echo "We have an image";
else
echo "We don't have an image";
fclose ($fp);
}
Do you think it looks good? I don't want to do "an attack" on that
server by a stupid mistake :-)

Greetings,
Mattias Campe

Jul 17 '05 #4

P: n/a
Savut wrote:
The link look the same, only difference i there is two "o" in coobnet i nthe
second .. ???.... if so, you can do if ($_GET["param1"] ==
"coobnet%40jabber.org") {...} else {}


Well, it could be that coobnet%40jabber.org is also correct :-). Maybe I
wasn't too clear, but I don't know by looking at the param1 whether
there will be an image yes or no.

Still, thx for the remark!

[...]

Jul 17 '05 #5

P: n/a
Mattias Campe wrote:

Shawn Wilson wrote:
Mattias Campe wrote:
Hi,

Depending on if I get an image or a text of a certain URL, I want to do
something different. I don't know in advance whether I'll get an image
or a text.

This is a URL that returns an image:
http://indicator.amessage.info/indic...mp;param4=.png

This is one that returns text:
http://indicator.amessage.info/indic...mp;param4=.png

How could I see the difference between the 2 with PHP code?

Use fsockopen and a regex to search for Content-Type in the header.


thx a lot! After figuring out how fsockopen worked, I managed to make it
work like this:

$fp=fsockopen("indicator.amessage.info",80, $errno, $errstr);
if (!$fp) {
echo "$errstr ($errno)<br>\n";
} else {
fputs ($fp, "HEAD
/indicator.php?param1=cobnet%40jabber.org&amp;param 2=bounce&amp;param3=http%3A%2F%2Fstudent.ugent.be% 2Fastrid%2Fpics%2Fjabber%2F&amp;param4=.png
HTTP/1.0\r\nHost: indicator.amessage.info\r\n\r\n");
$string = "";
while (!feof($fp)) {
$string = $string.fgets ($fp,128);
}
echo $string;
strstr($string,"Content-Type: image/png");
if (strpos($string,"Content-Type: image/png") != "")
echo "We have an image";
else
echo "We don't have an image";
fclose ($fp);
}

Do you think it looks good? I don't want to do "an attack" on that
server by a stupid mistake :-)


Just a few observations:

There is nothing in that code that would constitute an attack in itself. If,
however, you put that code in a poorly though-out loop, you could inadvertently
"attack" a site. But I don't know what you're doing with the code, so this may
be moot.
With the code shown, any page with the text "Content-Type: image/png" in it will
claim it's an image. This could be a problem if you're building a bot to crawl
the web. Websites with code examples like php.net, Google groups, devshed, etc.
would occasionally have that string in the text. Again, I don't know the
intended use, but if it's similar to that just described, you might want to
adjust your code to something like the following (The regular expression may or
may not work, I haven't tested it). It should display only the first instance
of the Content-Type text. It also has the advantage that it doesn't go through
the entire file. Just enough to determine the type.

while (!feof($fp)) {
$string .= fgets ($fp,128);
if (preg_match("/^Content-Type: ([^\r\n]+)[\r|\n]/", $string, $matches)) {
echo "File is type: " . $matches[1];
fclose ($fp);
exit();
}
}
fclose ($fp);

I don't know if the headers are case- and whitespace- sensitive or not. In
other words, if it's possible to have

"content-type: image/png"

you'll have to adjust the regex.
And it's a little thing, but you can use "$variable .= $somestring" instead of
"$variable = $variable . $somestring"

Regards,
Shawn

--
Shawn Wilson
sh***@glassgiant.com
http://www.glassgiant.com
Jul 17 '05 #6

P: n/a
Shawn Wilson wrote:
[...]
Just a few observations:

There is nothing in that code that would constitute an attack in itself. If,
however, you put that code in a poorly though-out loop, you could inadvertently
"attack" a site. But I don't know what you're doing with the code, so this may
be moot.
It will come in a loop, but I'll take care that it's not an infinite one
;) ...
With the code shown, any page with the text "Content-Type: image/png" in it will
claim it's an image.
That should be "any *heading* of a page" ;p, because I'm not asking for
a GET but for HEAD...

Still I like the principle of your code more :D, only one problem: the
regexp doesn't work (it never matches) and I can't figure out why :-s.

Another question: is the exit() obligated?

[...] while (!feof($fp)) {
$string .= fgets ($fp,128);
if (preg_match("/^Content-Type: ([^\r\n]+)[\r|\n]/", $string, $matches)) {
echo "File is type: " . $matches[1];
fclose ($fp);
exit();
}
}
fclose ($fp);

I don't know if the headers are case- and whitespace- sensitive or not. In
other words, if it's possible to have

"content-type: image/png"

you'll have to adjust the regex.
Normally it's like "Content-Type: text/html"
And it's a little thing, but you can use "$variable .= $somestring" instead of
"$variable = $variable . $somestring"
thx for the hint!
Regards,
Shawn


Greetings,
Mattias

Jul 17 '05 #7

P: n/a
> [...]
With the code shown, any page with the text "Content-Type: image/png" in it will
claim it's an image.
That should be "any *heading* of a page" ;p, because I'm not asking for
a GET but for HEAD...


Whoops, didn't notice that. My mistake.
Still I like the principle of your code more :D, only one problem: the
regexp doesn't work (it never matches) and I can't figure out why :-s.
Yeah, I wasn't really confident in it. Looking at it again, I put the "^" in
there, which means it would only match if the first header was Content-Type.
Take that out and it may work (too busy to test).
Another question: is the exit() obligated?
You can take that or leave it. I had it in there so the entire page wouldn't be
opened, while I was thinking you were using GET. You might want to break the
while loop, though, to save a bit of time.
[...]
while (!feof($fp)) {
$string .= fgets ($fp,128);
if (preg_match("/^Content-Type: ([^\r\n]+)[\r|\n]/", $string, $matches)) {
echo "File is type: " . $matches[1];
fclose ($fp);
exit();
}
}
fclose ($fp);


Regards,
Shawn
--
Shawn Wilson
sh***@glassgiant.com
http://www.glassgiant.com
Jul 17 '05 #8

P: n/a
Shawn Wilson wrote:
[...]
With the code shown, any page with the text "Content-Type: image/png" in it will
claim it's an image.


That should be "any *heading* of a page" ;p, because I'm not asking for
a GET but for HEAD...


Whoops, didn't notice that. My mistake.


np ;-)
Still I like the principle of your code more :D, only one problem: the
regexp doesn't work (it never matches) and I can't figure out why :-s.


Yeah, I wasn't really confident in it. Looking at it again, I put the "^" in
there, which means it would only match if the first header was Content-Type.
Take that out and it may work (too busy to test).


Indeed, it works, great!!! But it seems that the Content-Type can
*sometimes* be "text/plain;charset=UTF-8", but then I only need
"text/plain", I tried "/Content-Type: ([^\r\n]+)[\r|\n];/", but
apperently I don't understand much of those regexps :-s.

If you don't have the time to look for the ";", no problem, I'm already
very(, very) glad that you could help this far!
Another question: is the exit() obligated?


You can take that or leave it. I had it in there so the entire page wouldn't be
opened, while I was thinking you were using GET. You might want to break the
while loop, though, to save a bit of time.


Okay, ic, you're right. But I changed it to "break;" because there still
some code after the while. It still saves a bit of time :).

Greetings,
Mattias

Jul 17 '05 #9

P: n/a
Just grab the file and see what you have. Image files will inevitably have
chr(0) while text files (the exception being UCS16 encode ones) will never
have it. Hence the following:

$url1 =
"http://indicator.amessage.info/indicator.php?param1=cobnet%40jabber.org&amp
;param2=bounce&amp;param3=http%3A%2F%2Fstudent.uge nt.be%2Fastrid%2Fpics%2Fja
bber%2F&amp;param4=.png";
$url2 =
"http://indicator.amessage.info/indicator.php?param1=coobnet%40jabber.org&am
p;param2=bounce&amp;param3=http%3A%2F%2Fstudent.ug ent.be%2Fastrid%2Fpics%2Fj
abber%2F&amp;param4=.png";

function TasteTest($url) {
$data = file_get_contents($url);
return strchr($data, "\x00") ? "Image" : "Text";
}

echo TasteTest($url1); echo "<br>";
echo TasteTest($url2); echo "<br>";

Uzytkownik "Mattias Campe" <Ma******************************@UGent.be>
napisal w wiadomosci news:bq**********@gaudi2.UGent.be...
Hi,
Depending on if I get an image or a text of a certain URL, I want to do
something different. I don't know in advance whether I'll get an image
or a text.

This is a URL that returns an image:
http://indicator.amessage.info/indic...abber.org&amp;
param2=bounce&amp;param3=http%3A%2F%2Fstudent.ugen t.be%2Fastrid%2Fpics%2Fjab
ber%2F&amp;param4=.png
This is one that returns text:
http://indicator.amessage.info/indic...jabber.org&amp
;param2=bounce&amp;param3=http%3A%2F%2Fstudent.uge nt.be%2Fastrid%2Fpics%2Fja
bber%2F&amp;param4=.png

How could I see the difference between the 2 with PHP code?
Hoping that sb. can get me out of this,
greetings,
Mattias

Jul 17 '05 #10

P: n/a
Chung Leong wrote:
Just grab the file and see what you have. Image files will inevitably have
chr(0) while text files (the exception being UCS16 encode ones) will never
have it. Hence the following: [...] $data = file_get_contents($url);


Damn: although this is the most short solution, it appears that I need
PHP 4 >= 4.3.0 for file_get_contents and I have 4.1.2 :-s (which I can't
change, because I don't have the rights). Thx anyway!

Greetings,
Mattias

Jul 17 '05 #11

P: n/a
> >>Still I like the principle of your code more :D, only one problem: the
regexp doesn't work (it never matches) and I can't figure out why :-s.


Yeah, I wasn't really confident in it. Looking at it again, I put the "^" in
there, which means it would only match if the first header was Content-Type.
Take that out and it may work (too busy to test).


Indeed, it works, great!!! But it seems that the Content-Type can
*sometimes* be "text/plain;charset=UTF-8", but then I only need
"text/plain", I tried "/Content-Type: ([^\r\n]+)[\r|\n];/", but
apperently I don't understand much of those regexps :-s.

If you don't have the time to look for the ";", no problem, I'm already
very(, very) glad that you could help this far!


Try "/Content-Type: ([^\r\n;]+)(;[^\r\n]*)?[\r|\n];/"
or "/Content-Type: ([^\r\n\;]+)(;[^\r\n]*)?[\r|\n];/"
I think one of those should work, though they're untested. I can't remember if
you have to escape the ";" or not...

Regards,
Shawn

--
Shawn Wilson
sh***@glassgiant.com
http://www.glassgiant.com
Jul 17 '05 #12

P: n/a
Shawn Wilson wrote:
Still I like the principle of your code more :D, only one problem: the
regexp doesn't work (it never matches) and I can't figure out why :-s.

Yeah, I wasn't really confident in it. Looking at it again, I put the "^" in
there, which means it would only match if the first header was Content-Type.
Take that out and it may work (too busy to test).


Indeed, it works, great!!! But it seems that the Content-Type can
*sometimes* be "text/plain;charset=UTF-8", but then I only need
"text/plain", I tried "/Content-Type: ([^\r\n]+)[\r|\n];/", but
apperently I don't understand much of those regexps :-s.

If you don't have the time to look for the ";", no problem, I'm already
very(, very) glad that you could help this far!

Try "/Content-Type: ([^\r\n;]+)(;[^\r\n]*)?[\r|\n];/"
or "/Content-Type: ([^\r\n\;]+)(;[^\r\n]*)?[\r|\n];/"
I think one of those should work, though they're untested. I can't remember if
you have to escape the ";" or not...


It doesn't work, but that's okay, it will only be "less good code" ;),
but it will work. Thanks a lot for the help you offered!!!

Greetings,
Mattias

Jul 17 '05 #13

P: n/a
Shawn Wilson wrote:
I don't know if the headers are case- and whitespace- sensitive or not.


Parts of them might be. In the case of the Content-Type header, only
parameter attribute values may be case sensitive; all other
components, such as MIME media types and subtypes, and parameter
attribute names are case insensitive. Header field names are always
case insensitive.

But I'm afraid it's even more complicated than that. Whitespace, or
rather LWS (Linear White Space -- a CRLF sequence followed by either
one or more spaces or tabs), can appear in certain places and must
not appear in others. The ABNF of RFC2616, your handy authoritative
source, explains it all. It's spread out over numerous sections
though, so there'll be a lot of flicking back and forth.

Here's what I came up with to match Content-Type headers:

$LWS = '(?:(?:\r\n)?(?:\x20|\x9)+)';
$CHAR = '[\x-\x7f]';
$TEXT = "[^\x-\x1f\x7f]|$LWS";
$TOKEN = '[^\x-\x1f()<>@,;:\\/\"[\]?={}\x20\x9]+';
$QDTEXT = '(?:[^\x-\x1f\\x7f"]|$LWS)';
$QUOTEDPAIR = '\\$CHAR';
$QUOTEDSTRING = "(?:\"(?:$QDTEXT|$QUOTEDPAIR)*\")";

preg_match_all(
"`^content-type: $LWS* $TOKEN/$TOKEN $LWS*
(?:;$LWS*$TOKEN=(?:$TOKEN|$QUOTEDSTRING)$LWS*)*`mi x",
$string,
$matches);

This does strip some whitespace in some circumstances. Although the
captured string may not be identical to the actual header in terms
of whitespace, the semantics will be the same.

There are likely to be ghastly solecisms in the above as I haven't
thoroughly read through it, tested it, or taken any great time to
throw it together. But the reader should get the general idea. ;-)

--
Jock
Jul 17 '05 #14

This discussion thread is closed

Replies have been disabled for this discussion.