If this is not the right group for this question, please advise me of a
better one.
I have a collection of simple HTML files, many of which just contain a
paragraph or two of text. Some contain just an IMG and a one line
caption. I would like to find a tool that will load the first page
(call it toc.html) and anywhere it finds a link, it should replace the
link with the BODY of the linked page. If that BODY contains further
links, it should be similarly processed. (I don't care if the file is
processed recursively or if I have to run several passes of the tool.)
For example, suppose I have 4 files like so (I'm not showing the HTML
tags):
toc.html
-------
Table of Contents
Article 1
[link to article1.html]
Article 2
[link to article2.html]
article1.html
-----------
This is article 1. It is very short.
article2.html
-----------
This is article 2. It contains a tip.
[link to tip.html]
tip.html
-------
Don't put things in your ears.
I want to be able to process toc.html and end up with an HTML file like
this:
newtoc.html
----------
Table of Contents
Article 1
This is article 1. It is very short.
Article 2
This is article 2. It contains a tip.
Don't put things in your ears.
In my case, the nesting goes a few levels deeper. Some articles have 20
categories of tips, each category page has one or more tips linked by
the title of the tip. I would like to retain the HTML markup (it is
just very simple headings, paragraphs, italics, etc.) so that I can
process the combined file with html2latex and end up with a nice
looking PDF I can print and read away from a computer.
I am sure I am not the first person wanting to do something like this,
but so far I have not been able to come up with the right input to
Google to find a premade tool.
Mark 8 2408 ma*****@gmail.c om wrote: If this is not the right group for this question, please advise me of a better one.
I have a collection of simple HTML files, many of which just contain a paragraph or two of text. Some contain just an IMG and a one line caption. I would like to find a tool that will load the first page (call it toc.html) and anywhere it finds a link, it should replace the link with the BODY of the linked page. If that BODY contains further links, it should be similarly processed. (I don't care if the file is processed recursively or if I have to run several passes of the tool.)
For example, suppose I have 4 files like so (I'm not showing the HTML tags):
toc.html ------- Table of Contents Article 1 [link to article1.html] Article 2 [link to article2.html]
article1.html ----------- This is article 1. It is very short.
article2.html ----------- This is article 2. It contains a tip. [link to tip.html]
tip.html ------- Don't put things in your ears.
I want to be able to process toc.html and end up with an HTML file like this:
newtoc.html ---------- Table of Contents Article 1 This is article 1. It is very short. Article 2 This is article 2. It contains a tip. Don't put things in your ears.
In my case, the nesting goes a few levels deeper. Some articles have 20 categories of tips, each category page has one or more tips linked by the title of the tip. I would like to retain the HTML markup (it is just very simple headings, paragraphs, italics, etc.) so that I can process the combined file with html2latex and end up with a nice looking PDF I can print and read away from a computer.
I am sure I am not the first person wanting to do something like this, but so far I have not been able to come up with the right input to Google to find a premade tool.
Do you really want to use links, as in <a href=...>, or are you just
looking for a method? If you have PHP installed on the server, you can
just call these other files like so: <?php include "filename.e xt" ?>
Any serverside language will do though, and then there's Server Side
Includes (SSI) (which I know nothing about :-) )
--
Els http://locusmeus.com/
Sonhos vem. Sonhos vão. O resto é imperfeito.
- Renato Russo -
Now playing: Pearl Jam - Dirty Frank
I already have the files, and they do use links. I didn't make them
this way -- I'm just trying to make the best of a bad situation. I
think they were probably made this way because they are from the early
1990s and were probably accessed over slow dialup connections.
I forgot to mention in my first post that I am using Mac OS X, but I
have access to Linux, Windows, and DOS platforms too.
Mark ma*****@gmail.c om wrote: I already have the files, and they do use links. I didn't make them this way -- I'm just trying to make the best of a bad situation. I think they were probably made this way because they are from the early 1990s and were probably accessed over slow dialup connections.
I forgot to mention in my first post that I am using Mac OS X, but I have access to Linux, Windows, and DOS platforms too.
I have a good memory, and I do remember my post of half an hour ago,
and even yours (although not literally). But not everybody who sees
your message has seen or remembered the previous one in the thread.
So, please quote the relevant bits of the post you are replying to,
and reply underneath.
Back to your question: am I understanding you correctly, that
basically you want to change a bunch of regular links to in-page
anchor links?
I think with a bit of good thinking and some regex in combination with
includes you can actually do that. Personally I'd call a friend with
programming skills to do it for me ;-)
If you just want to have them all in one file, as if the links were
replaced by the files, then why not just replace
<a href="pageX.htm l">Page X</a> with <?php include "pageX.html " ?> ?
--
Els http://locusmeus.com/
Sonhos vem. Sonhos vão. O resto é imperfeito.
- Renato Russo -
Now playing: Pearl Jam - Alive (live)
On 24 Mar 2005 11:39:36 -0800, ma*****@gmail.c om wrote: I would like to find a tool that will load the first page (call it toc.html) and anywhere it finds a link, it should replace the link with the BODY of the linked page.
This is one reason I always author as XHTML. This would be pretty
easy with XSLT.
You should be able to do it with Perl, or most other scripting
languages. The ease of doing it depends on how "clean" the original
code is.
Andy Dingley wrote: This is one reason I always author as XHTML. This would be pretty easy with XSLT.
And this could be done also very easily (I should say - in an easier way
:o) ) using HTML and any programmation language including a RegExp or
DOM API, and there's a lot of them. ma*****@gmail.c om wrote: I would like to find a tool that will load the first page (call it toc.html) and anywhere it finds a link, it should replace the link with the BODY of the linked page.
What you describe is the third example on the page describing markup
macros in mod_publisher. At its simplest you'd use
MLMacro a replace url @href
to replace all <a ...> links with the contents of a page referenced in
the href attribute.
Don't forget that if you're inserting HTML, you need to preprocess it
to remove everything that isn't body contents. To do that you'd apply
several macros to the included page:
MLMacro html replace start ""
MLMacro html replace end ""
MLMacro head hide
MLMacro body replace start "<div class=\"include d\">"
MLMacro body replace end </div>
If you're processing badly broken markup, you might also need
to apply MLExtendedFixup s. But for anything half-decent, the
above should be sufficient. http://apache.webthing.com/mod_publisher/
--
Nick Kew ma*****@gmail.c om wrote: If this is not the right group for this question, please advise me of
a better one.
I have a collection of simple HTML files, many of which just contain
a paragraph or two of text. Some contain just an IMG and a one line caption. I would like to find a tool that will load the first page (call it toc.html) and anywhere it finds a link, it should replace
the link with the BODY of the linked page. If that BODY contains further links, it should be similarly processed. (I don't care if the file is processed recursively or if I have to run several passes of the
tool.)
This would definitely be (reasonably) easy in most modern scripting
languages. My personal favourite is Python. There is a Python HTML
parser called beautiful soup that would almost make this trivial.
Are the links relative (i.e. should they be loaded from a filesystem)
or absolute URLs (loaded from the internet) ?
I've already used BeautifulSoup to write a link checker that crawls all
URLs within a single domain. See http://www.viodspace.org.uk/python/programs.shtml
I can ahck it over the weekend to do what you need.... You'll need to
wait until Tuesday though - I'm off the internet until then. The first
version will fetch files from a local filesystem and just insert the
contents of the BODY tag (recursively) instead of the link. If you want
any additional processing we can discuss it.
Regards,
Fuzzy http://www.voidspace.org.uk/python
[snip..] This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Francis Avila |
last post by:
Below is an implementation a 'flattening' recursive generator (take a nested
iterator and remove all its nesting). Is this possibly general and useful
enough to be included in itertools? (I know *I* wanted something like it...)
Very basic examples:
>>> rl = , '678', 9]]
>>> list(flatten(rl))
>>> notstring = lambda obj: not isinstance(obj, type(''))
|
by: Bengt Richter |
last post by:
What am I missing? (this is from 2.4b1, so probably it has been fixed?)
def flatten(list):
l =
for elt in list:
^^^^--must be expecting list instance or other sequence
t = type(elt)
if t is tuple or t is list:
^^^^--looks like it expects to refer to the type, not the arg
|
by: Ville Vainio |
last post by:
For quick-and-dirty stuff, it's often convenient to flatten a sequence
(which perl does, surprise surprise, by default):
]]] ->
One such implementation is at
http://aspn.activestate.com/ASPN/Mail/Message/python-tutor/2302348
|
by: Tom Anderson |
last post by:
Comrades,
During our current discussion of the fate of functional constructs in
python, someone brought up Guido's bull on the matter:
http://www.artima.com/weblogs/viewpost.jsp?thread=98196
He says he's going to dispose of map, filter, reduce and lambda. He's
going to give us product, any and all, though, which is nice of him.
|
by: wenmang |
last post by:
Hi,
As part of simple serialization, I like to determine which is the right
way to do:
flatten a class containing flat C-structs with some member functions or
just plain C-structs. We need to store those data as context in shared
memory. I just want to know what is pro and cons for this idea:
class Context
{
public:
memFun1();
| |
by: windandwaves |
last post by:
Hi Folk,
I use AJAX to load some XML. When I get back to XML, I want to get a
piece of html that is within <info>... lots of html .... </info>
I want to use:
xmlDoc.getElementsByTagName('info');
but that just returns
|
by: beginner |
last post by:
Hi,
I am wondering how do I 'flatten' a list or a tuple? For example, I'd
like to transform or ] to .
Another question is how do I pass a tuple or list of all the
aurgements of a function to the function. For example, I have all the
arguments of a function in a tuple a=(1,2,3). Then I want to pass each
item in the tuple to a function f so that I make a function call
f(1,2,3). In perl it is a given, but in python, I haven't figured out
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
| |
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |