Connecting Tech Pros Worldwide Help | Site Map

Array problems

 
LinkBack Thread Tools Search this Thread
  #1  
Old July 17th, 2005, 12:32 AM
Sven Dzepina
Guest
 
Posts: n/a
Default Array problems

Hello,

How I can realize that?
I have this code:

<?php
$url = "http://www.URL.com;
$content = file($url);
foreach($content as $line){
$pattern =
"/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/](\.(html|php|shtml|htm|xhtml|xml)))/i";

$count = 0;
if(preg_match_all($pattern,$line,$urls_back_array) ){
foreach($urls_back_array[0] as $url_back){
$count++;
echo $url_back;
}
}
}
?>

Now I want to make a loop - My script should count all links of all my *html
sites. But the script are not allowed to count double! Also the script shall
count all links on html sites correctly!
Example:
Home
|-Web
|-Forum
|--Site 1
|--Site 2
|--Site 3
|-Download
It should count 7 and list me all! =)

Gretting from Germany.



  #2  
Old July 17th, 2005, 12:32 AM
Randell D.
Guest
 
Posts: n/a
Default Re: Array problems


"Sven Dzepina" <mail@styleswitch.de> wrote in message
news:3f8c1b25$0$11958$9b4e6d93@newsread4.arcor-online.net...[color=blue]
> Hello,
>
> How I can realize that?
> I have this code:
>
> <?php
> $url = "http://www.URL.com;
> $content = file($url);
> foreach($content as $line){
> $pattern =
>[/color]
"/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/](\.(html|php|shtml|htm|xhtml|xml)))/i";[color=blue]
>
> $count = 0;
> if(preg_match_all($pattern,$line,$urls_back_array) ){
> foreach($urls_back_array[0] as $url_back){
> $count++;
> echo $url_back;
> }
> }
> }
> ?>
>
> Now I want to make a loop - My script should count all links of all my[/color]
*html[color=blue]
> sites. But the script are not allowed to count double! Also the script[/color]
shall[color=blue]
> count all links on html sites correctly!
> Example:
> Home
> |-Web
> |-Forum
> |--Site 1
> |--Site 2
> |--Site 3
> |-Download
> It should count 7 and list me all! =)
>
> Gretting from Germany.
>
>[/color]

I'm playing around here trying to do what you want to do... I'm not good
with my regular expressions using preg tools but I am using a mixture of
implode and explode to get at the url of each link (ie the "href=" bit in
the <A HREF" tag)... Once I have the website address that the link is
targeted at, I plan on using a mix of parse_url() and pathinfo() to identify
html type files. And in order to avoid duplices, the address will be
written in to an array which I will then run against array_unique.

Do these ideas help any?


  #3  
Old July 17th, 2005, 12:32 AM
Pedro
Guest
 
Posts: n/a
Default Re: Array problems

Sven Dzepina wrote:
[...][color=blue]
> $count = 0;
> if(preg_match_all($pattern,$line,$urls_back_array) ){
> foreach($urls_back_array[0] as $url_back){
> $count++;
> echo $url_back;[/color]
[...]

I didn't check your regex.

I'd do it somewhat differently:

after preg_match_all() put the URLs into the index part of an array


### should this be __1__ ?
foreach ($urls_back_array[0] as $url_back) {
$large_url_array[$url_back]++;
## no echo
}
} ## if
} ## foreach

## echo now!
$total_count = 0;
$unique_urls = 0;
foreach ($large_url_array as $url=>$count) {
echo $url, ' : appears ', $count, ' times<br />';
$total_count += $count;
$unique_urls++;
}
echo '<br />Unique URLs: ', $unique_urls, '<br />';
echo '<br />Total links: ', $total_count, '<br />';




NOTE: This was typed directly in the editor and not tested.

--
I have a spam filter working.
To mail me include "urkxvq" (with or without the quotes)
in the subject line, or your mail will be ruthlessly discarded.
  #4  
Old July 17th, 2005, 12:33 AM
Sven Dzepina
Guest
 
Posts: n/a
Default Re: Array problems

Hello Rondell,

perhaps I've explained my aim imprecisely.
I want to count all Sites, which are linked on a homepage and list them.
My earlier solution was, that I have scanned all links and then I have
listed them all in a database.
But, it was a loop and so I fetched all links from the database to scann
them, too!
I didn't thought on this problem:
If I scann all links and insert them into a database, and I fetch them in
the same loop - Then I get always the same links.

Gretting.

"Randell D." <you.can.email.me.at.randelld@yahoo.com> schrieb im Newsbeitrag
news:NQ_ib.98843$6C4.43373@pd7tw1no...[color=blue]
>
> "Sven Dzepina" <mail@styleswitch.de> wrote in message
> news:3f8c1b25$0$11958$9b4e6d93@newsread4.arcor-online.net...[color=green]
> > Hello,
> >
> > How I can realize that?
> > I have this code:
> >
> > <?php
> > $url = "http://www.URL.com;
> > $content = file($url);
> > foreach($content as $line){
> > $pattern =
> >[/color]
>[/color]
"/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/](\.(html|php|shtml|htm|xhtml|xml)))/i";[color=blue][color=green]
> >
> > $count = 0;
> > if(preg_match_all($pattern,$line,$urls_back_array) ){
> > foreach($urls_back_array[0] as $url_back){
> > $count++;
> > echo $url_back;
> > }
> > }
> > }
> > ?>
> >
> > Now I want to make a loop - My script should count all links of all my[/color]
> *html[color=green]
> > sites. But the script are not allowed to count double! Also the script[/color]
> shall[color=green]
> > count all links on html sites correctly!
> > Example:
> > Home
> > |-Web
> > |-Forum
> > |--Site 1
> > |--Site 2
> > |--Site 3
> > |-Download
> > It should count 7 and list me all! =)
> >
> > Gretting from Germany.
> >
> >[/color]
>
> I'm playing around here trying to do what you want to do... I'm not good
> with my regular expressions using preg tools but I am using a mixture of
> implode and explode to get at the url of each link (ie the "href=" bit in
> the <A HREF" tag)... Once I have the website address that the link is
> targeted at, I plan on using a mix of parse_url() and pathinfo() to[/color]
identify[color=blue]
> html type files. And in order to avoid duplices, the address will be
> written in to an array which I will then run against array_unique.
>
> Do these ideas help any?
>
>[/color]


 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,989 network members.