471,049 Members | 1,909 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,049 software developers and data experts.

Importing an external relative link to your page

shishisu
I figure this might be a fun challenge to some of you out there...
I appreciate all the help. Thanks in advance.

Here we go...
I have 2 web page (on different servers) I am working on. Here, we'll call them Page-A and Page-B.

I want to save some work by only having to maintain 1 web page and the changes made to source page will get reflect onto the other one. In this case, Page-A is the source and Page-B should just have a content grabbing PHP script.

I thought of using ob_get_contents(); grabbing Page-A's HTML code and insert them into Page-B. However, doing that means I will have to create a whole mirror set of files+images on both Page-A's server and Page-B's server.

Since Page-A has both links to other domains (absolute URLs) and links to local page/files (relative URLs). The codes needed here will have:

1. A way to find all the URLs on Page-A

2. Determine if it is a relative URL or absolute URL:

3a. If it is an absolute URL, Page-B just go ahead and grab that URL,

3b. If it is a relative link, Page-B will have to convert that link into absolue URL (adding Server-B's domain name to the URL).
Jun 6 '07 #1
5 1856
pbmods
5,821 Expert 4TB
Changed thread title: Removed superfluous 'fun challenge'.

Heya, shishisu. Welcome to TSDN!

Sounds like an ambitious project. How far have you gotten?
Jun 6 '07 #2
jx2
228 100+
well its not difficult all u need its some regularexpresions :-)
but i dont have your code and it would take more then a while :-)

good luck
Jun 6 '07 #3
hahah thanks, I thought most challenges should be fun.

what I have so far...

I can now import sections of the HTML contents from Page-A and display onto Page-B by using the output-buffer.

I figured out how to explode a URL and append the server domain.

Now my trouble is just writing the code to:
1. find all the href and img tags
2. grab these link paths
3. figure out if they are local links (relative) or external links (absolute).
Jun 7 '07 #4
pbmods
5,821 Expert 4TB
Ok. Here's what I'm thinking:
  1. preg_split your HTML by '/<a|img/' (this assumes well-formed HTML; you might need to get a little tricky... '/<\s*(a|img)/i'). Then you know where your a|img tags are.
  2. preg_match each element in the resulting array (starting with the 1st [not 0th] element) for... hm. Something like '/(http:\/\/www\..+?\.\w+)?([^"\s])*/'. This is a REALLY rough (and inefficient!) regexp; you can probably come up with a better one.
  3. In terms of figuring out which is which... hm. You could preg_match_all the original string again to determine the order of the a|img tags. You could check the extension of the file in the URL (not much of a guarantee these days). I'm out of ideas. Assuming, of course, that knowing which are HREFs and which are SRCs is important.
Jun 7 '07 #5
I am thinking after finding the tags, I will do a check to see if it is a '/' or '.' <-- shows that they are relative links.

if they are relative, then I will combine it with Server-B's domain name.
Jun 7 '07 #6

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

reply views Thread by Philipp Lenssen | last post: by
2 posts views Thread by Catherine Lynn Wood | last post: by
2 posts views Thread by Aarono Brown | last post: by
8 posts views Thread by Nicolás Lichtmaier | last post: by
3 posts views Thread by js5895 | last post: by
1 post views Thread by simonZ | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.