By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,053 Members | 1,125 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,053 IT Pros & Developers. It's quick & easy.

Array problems

P: n/a
Hello,

How I can realize that?
I have this code:

<?php
$url = "http://www.URL.com;
$content = file($url);
foreach($content as $line){
$pattern =
"/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/](\.(html|php|shtml|htm|xhtml|xml)))/i";

$count = 0;
if(preg_match_all($pattern,$line,$urls_back_array) ){
foreach($urls_back_array[0] as $url_back){
$count++;
echo $url_back;
}
}
}
?>

Now I want to make a loop - My script should count all links of all my *html
sites. But the script are not allowed to count double! Also the script shall
count all links on html sites correctly!
Example:
Home
|-Web
|-Forum
|--Site 1
|--Site 2
|--Site 3
|-Download
It should count 7 and list me all! =)

Gretting from Germany.
Jul 17 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a

"Sven Dzepina" <ma**@styleswitch.de> wrote in message
news:3f***********************@newsread4.arcor-online.net...
Hello,

How I can realize that?
I have this code:

<?php
$url = "http://www.URL.com;
$content = file($url);
foreach($content as $line){
$pattern =
"/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/](\.(html|php|shtml|htm|xhtml|xml)))/i";
$count = 0;
if(preg_match_all($pattern,$line,$urls_back_array) ){
foreach($urls_back_array[0] as $url_back){
$count++;
echo $url_back;
}
}
}
?>

Now I want to make a loop - My script should count all links of all my *html sites. But the script are not allowed to count double! Also the script shall count all links on html sites correctly!
Example:
Home
|-Web
|-Forum
|--Site 1
|--Site 2
|--Site 3
|-Download
It should count 7 and list me all! =)

Gretting from Germany.


I'm playing around here trying to do what you want to do... I'm not good
with my regular expressions using preg tools but I am using a mixture of
implode and explode to get at the url of each link (ie the "href=" bit in
the <A HREF" tag)... Once I have the website address that the link is
targeted at, I plan on using a mix of parse_url() and pathinfo() to identify
html type files. And in order to avoid duplices, the address will be
written in to an array which I will then run against array_unique.

Do these ideas help any?
Jul 17 '05 #2

P: n/a
Sven Dzepina wrote:
[...]
$count = 0;
if(preg_match_all($pattern,$line,$urls_back_array) ){
foreach($urls_back_array[0] as $url_back){
$count++;
echo $url_back;

[...]

I didn't check your regex.

I'd do it somewhat differently:

after preg_match_all() put the URLs into the index part of an array
### should this be __1__ ?
foreach ($urls_back_array[0] as $url_back) {
$large_url_array[$url_back]++;
## no echo
}
} ## if
} ## foreach

## echo now!
$total_count = 0;
$unique_urls = 0;
foreach ($large_url_array as $url=>$count) {
echo $url, ' : appears ', $count, ' times<br />';
$total_count += $count;
$unique_urls++;
}
echo '<br />Unique URLs: ', $unique_urls, '<br />';
echo '<br />Total links: ', $total_count, '<br />';


NOTE: This was typed directly in the editor and not tested.

--
I have a spam filter working.
To mail me include "urkxvq" (with or without the quotes)
in the subject line, or your mail will be ruthlessly discarded.
Jul 17 '05 #3

P: n/a
Hello Rondell,

perhaps I've explained my aim imprecisely.
I want to count all Sites, which are linked on a homepage and list them.
My earlier solution was, that I have scanned all links and then I have
listed them all in a database.
But, it was a loop and so I fetched all links from the database to scann
them, too!
I didn't thought on this problem:
If I scann all links and insert them into a database, and I fetch them in
the same loop - Then I get always the same links.

Gretting.

"Randell D." <yo**************************@yahoo.com> schrieb im Newsbeitrag
news:NQ_ib.98843$6C4.43373@pd7tw1no...

"Sven Dzepina" <ma**@styleswitch.de> wrote in message
news:3f***********************@newsread4.arcor-online.net...
Hello,

How I can realize that?
I have this code:

<?php
$url = "http://www.URL.com;
$content = file($url);
foreach($content as $line){
$pattern =

"/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/](\.(html|php|shtml|htm|xhtml|xml)))/i";

$count = 0;
if(preg_match_all($pattern,$line,$urls_back_array) ){
foreach($urls_back_array[0] as $url_back){
$count++;
echo $url_back;
}
}
}
?>

Now I want to make a loop - My script should count all links of all my

*html
sites. But the script are not allowed to count double! Also the script

shall
count all links on html sites correctly!
Example:
Home
|-Web
|-Forum
|--Site 1
|--Site 2
|--Site 3
|-Download
It should count 7 and list me all! =)

Gretting from Germany.


I'm playing around here trying to do what you want to do... I'm not good
with my regular expressions using preg tools but I am using a mixture of
implode and explode to get at the url of each link (ie the "href=" bit in
the <A HREF" tag)... Once I have the website address that the link is
targeted at, I plan on using a mix of parse_url() and pathinfo() to

identify html type files. And in order to avoid duplices, the address will be
written in to an array which I will then run against array_unique.

Do these ideas help any?

Jul 17 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.