473,403 Members | 2,293 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,403 software developers and data experts.

XSL for removing words less than 4 letters in a sitemap

I need to transform this:

<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://localhost/index.php/index./Paths-for-the-extreme-player</
loc>
</url>
<url>
<loc>http://localhost/index.php/index.php...e-edge-of-the-
wall</loc>
</url>
</urlset>

into this:

<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://localhost/index.php/index./Books/Paths-for-the-
extreme-player</loc>
<news:news>
<news:keywords>Books, Paths, extreme, player</
news:keywords>
</news:news>
</url>
<url>
<loc>http://localhost/index.php/index.php...e-edge-of-the-
wall</loc>
<news:news>
<news:keywords>Games, edge, wall</news:keywords>
</news:news>
</url>
</urlset>

I mean, I need a template for creating a <news:keywordstag which
contents all the words from <loctag with words of more than 3
letters.
Apr 1 '08 #1
6 2223
Olagato wrote:
I need to transform this:

<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://localhost/index.php/index./Paths-for-the-extreme-player</
loc>
</url>
<url>
<loc>http://localhost/index.php/index.php...e-edge-of-the-
wall</loc>
</url>
</urlset>

into this:

<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://localhost/index.php/index./Books/Paths-for-the-
extreme-player</loc>
<news:news>
<news:keywords>Books, Paths, extreme, player</
news:keywords>
</news:news>
</url>
<url>
<loc>http://localhost/index.php/index.php...e-edge-of-the-
wall</loc>
<news:news>
<news:keywords>Games, edge, wall</news:keywords>
</news:news>
</url>
</urlset>

I mean, I need a template for creating a <news:keywordstag which
contents all the words from <loctag with words of more than 3
letters.
Do you want to use XSLT 2.0 or 1.0?
What about words like 'localhost' or 'index', how do you decide that
those are not taken?

Here is an XSLT 2.0 stylesheet that should show you an approach using
the tokenize method:

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:news="http://example.com/2008/news"
xmlns:sm="http://www.google.com/schemas/sitemap/0.84"
exclude-result-prefixes="sm"
version="2.0">

<xsl:output method="xml" indent="yes"/>

<xsl:strip-space elements="*"/>

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="sm:url">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
<news:news>
<news:keywords>
<xsl:value-of
select="for $s in tokenize(sm:loc, '/')[position() &gt; 5]
return tokenize($s, '[\-/]')[string-length(.) &gt; 3]"
separator=", "/>
</news:keywords>
</news:news>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>

Result with Saxon 9 when run against your posted input sample (with a
'root' element added and a namespace choosen for the 'news' prefix) is

<root>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>

<loc>http://localhost/index.php/index./Paths-for-the-extreme-player</loc>
<news:news xmlns:news="http://example.com/2008/news">
<news:keywords>Paths, extreme, player</news:keywords>
</news:news>
</url>
<url>

<loc>http://localhost/index.php/index.php/Games/The-edge-of-the-wall</loc>
<news:news xmlns:news="http://example.com/2008/news">
<news:keywords>Games, edge, wall</news:keywords>
</news:news>
</url>
</urlset>
</root>
--

Martin Honnen
http://JavaScript.FAQTs.com/
Apr 2 '08 #2
Olagato wrote:
>Do you want to use XSLT 2.0 or 1.0?
I'm using XSLT 1.0
>What about words like 'localhost' or 'index', how do you decide that those are not taken?
It's not a problem now. Maybe a sentence like next:
translate( translate( substring-after( sm:loc, 'http://localhost/
index.php/index.php/') ,'-', ',') ,'/',',')

I'm trying your XSL from PHP without success:
PHP only supports XSLT 1.0 so my posted stylesheet using XSLT and XPath
2.0 functionality does not work with PHP's XSLT processor.
--

Martin Honnen
http://JavaScript.FAQTs.com/
Apr 3 '08 #3
On 3 abr, 13:06, Martin Honnen <mahotr...@yahoo.dewrote:
Olagato wrote:
Do you want to use XSLT 2.0 or 1.0?
I'm using XSLT 1.0
What about words like 'localhost' or 'index', how do you decide that those are not taken?
It's not a problem now. Maybe a sentence like next:
translate( translate( substring-after( sm:loc, 'http://localhost/
index.php/index.php/') ,'-', ',') ,'/',',')
I'm trying your XSL from PHP without success:

PHP only supports XSLT 1.0 so my posted stylesheet using XSLT and XPath
2.0 functionality does not work with PHP's XSLT processor.

--

Martin Honnen
http://JavaScript.FAQTs.com/
Your posted version in 1.0 functionality seems to be quite difficult
to implement because of lack of advanced functions (at least for a xsl
newbie like me) So my only alternative would be to use a XSLT
processor. I'll try Xalan on server: http://xalan.apache.org/
Any other idea using XSLT 1.0 will be appreciated.
Apr 3 '08 #4
Olagato wrote:
Your posted version in 1.0 functionality seems to be quite difficult
to implement because of lack of advanced functions (at least for a xsl
newbie like me) So my only alternative would be to use a XSLT
processor. I'll try Xalan on server: http://xalan.apache.org/
Any other idea using XSLT 1.0 will be appreciated.
Xalan does not do XSLT 2.0 so if you want to use XSLT 2.0 then try Saxon
(http://saxon.sourceforge.net/) or Gestalt
(http://gestalt.sourceforge.net/) or AltovaXML
(http://www.altova.com/altovaxml.html).

If you want to use PHP then I think PHP supports EXSLT so you could try
to use http://www.exslt.org/str/functions/tokenize/index.html

--

Martin Honnen
http://JavaScript.FAQTs.com/
Apr 3 '08 #5
On 3 abr, 16:45, Martin Honnen <mahotr...@yahoo.dewrote:
Olagato wrote:
Your posted version in 1.0 functionality seems to be quite difficult
to implement because of lack of advanced functions (at least for a xsl
newbie like me) So my only alternative would be to use a XSLT
processor. I'll try Xalan on server:http://xalan.apache.org/
Any other idea using XSLT 1.0 will be appreciated.

Xalan does not do XSLT 2.0 so if you want to use XSLT 2.0 then try Saxon
(http://saxon.sourceforge.net/) or Gestalt
(http://gestalt.sourceforge.net/) or AltovaXML
(http://www.altova.com/altovaxml.html).

If you want to use PHP then I think PHP supports EXSLT so you could try
to usehttp://www.exslt.org/str/functions/tokenize/index.html

--

Martin Honnen
http://JavaScript.FAQTs.com/
Thank you very much, Martin
It's now working fine with Altova XML Spy and Saxon9 as external XSLT
parser:
http://216.239.59.104/search?q=cache...ient=firefox-a

There are only 2 little issues left:

My XML input is:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://localhost/index.php/index.php...site/Rutas-de-
verano-en-España</loc>
<lastmod>2008-03-13</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://localhost/index.php/index.php...site/Rutas/El-
Camino-de-Santiago-en-el-Sobrarbe</loc>
<lastmod>2008-02-12</lastmod>
<changefreq>weekly</changefreq>
<priority>0.7</priority>
</url>
</urlset>

Your XSLT 2.0 is:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
xmlns:sm="http://www.sitemaps.org/schemas/sitemap/0.9" exclude-result-
prefixes="sm" version="2.0">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="sm:url">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
<news:news>
<news:publication_date>
<xsl:value-of select="sm:lastmod"/>
</news:publication_date>
<news:keywords>
<xsl:value-of select="for $s in tokenize(sm:loc, '/')[position()
&gt; 5]
return tokenize($s, '[\-/]')[string-length(.)
&gt; 3]" separator=", "/>
</news:keywords>
</news:news>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

The output is:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://localhost/index.php/index.php...site/Rutas-de-
verano-en-España</loc>
<lastmod>2008-03-13</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
<news:news xmlns:news="http://www.google.com/schemas/sitemap-news/
0.9">
<news:publication_date>2008-03-13</news:publication_date>
<news:keywords>ezwebin_site, Rutas, verano, España</news:keywords>
</news:news>
</url>
<url>
<loc>http://localhost/index.php/index.php...site/Rutas/El-
Camino-de-Santiago-en-el-Sobrarbe</loc>
<lastmod>2008-02-12</lastmod>
<changefreq>weekly</changefreq>
<priority>0.7</priority>
<news:news xmlns:news="http://www.google.com/schemas/sitemap-news/
0.9">
<news:publication_date>2008-02-12</news:publication_date>
<news:keywords>ezwebin_site, Rutas, Camino, Santiago, rt</
news:keywords>
</news:news>
</url>
</urlset>

But I need an output like defined by News Sitemap Protocol:
http://www.google.com/support/webmas...y?answer=42738

So there are 2 things left:
1- <lastmodtags should dissapear from <urloutputs because a
<news:publication_datetag has been defined already.
2- xmlns:news namespace should dissapear from <news:newstags and it
should be taken to the <urlset xmlns="http://www.sitemaps.org/schemas/
sitemap/0.9"tag in the header.

A good output file would be:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
<url>
<loc>http://localhost/index.php/index.php...site/Rutas-de-
verano-en-España</loc>
<changefreq>daily</changefreq>
<priority>0.8</priority>
<news:news>
<news:publication_date>2008-03-13</news:publication_date>
<news:keywords>ezwebin_site, Rutas, verano, España</news:keywords>
</news:news>
</url>
<url>
<loc>http://localhost/index.php/index.php...site/Rutas/El-
Camino-de-Santiago-en-el-Sobrarbe</loc>
<changefreq>weekly</changefreq>
<priority>0.7</priority>
<news:news>
<news:publication_date>2008-02-12</news:publication_date>
<news:keywords>ezwebin_site, Rutas, Camino, Santiago, rt</
news:keywords>
</news:news>
</url>
</urlset>

Any idea ?


Apr 8 '08 #6
Olagato wrote:
So there are 2 things left:
1- <lastmodtags should dissapear from <urloutputs because a
<news:publication_datetag has been defined already.
2- xmlns:news namespace should dissapear from <news:newstags and it
should be taken to the <urlset xmlns="http://www.sitemaps.org/schemas/
sitemap/0.9"tag in the header.
Both are easy adaptions, you need to use a predicate
[not(self::sm:lastmod)] and you can use xsl:namespace to make sure a
namespace declaration is created on the root element:

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
xmlns:sm="http://www.sitemaps.org/schemas/sitemap/0.9"
exclude-result-prefixes="sm"
version="2.0">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="sm:urlset">
<xsl:copy>
<xsl:namespace name="news"
select="'http://www.google.com/schemas/sitemap-news/0.9'"/>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="sm:url">
<xsl:copy>
<xsl:apply-templates select="@* | node()[not(self::sm:lastmod)]"/>
<news:news>
<news:publication_date>
<xsl:value-of select="sm:lastmod"/>
</news:publication_date>
<news:keywords>
<xsl:value-of select="for $s in tokenize(sm:loc, '/')[position()
&gt; 5]
return tokenize($s, '[\-/]')[string-length(.)
&gt; 3]" separator=", "/>
</news:keywords>
</news:news>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
--

Martin Honnen
http://JavaScript.FAQTs.com/
Apr 9 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Aristotle | last post by:
Could you please help me out with regular expressions. I'm trying to write a perl script that proccesses some text, and i'm stuck at the following: need to remove from the text 1. dots followed...
9
by: dan | last post by:
this is a program to count average letters per word. i am able to count the total number of letters, but not words. How do you count the total number of words in a text file, so i am able to divide...
2
by: Jazzdrums | last post by:
Hello, I've (parts of ) HTML documents and a list of words that I have to transform as an hyperlinks, i.e. surround them with a "<a href="...">" "</a>". A first simple approach is to parse the...
4
by: OpticTygre | last post by:
I need to write a loop that prints all the combination possibilities of a character array. Basically, taking a scrambled word, or a regular word, and printing out all the combinations. The...
0
by: Bill Mild | last post by:
How do I write a derived data source so that I can remove a node from a sitemap data source? Basically, I have a situation where the built-in security trimming is not exactly what I need. I need...
5
by: JJ | last post by:
Although this question involves Flash, I suspect the actual issue is an asp one.. I am trying to open the web.sitemap file in an .swf file enbedded in an asp page (I'm working in VS 2005). I...
4
by: shapper | last post by:
Hello, I have 2 questions about Asp.Net 2.0 web.sitemap: 1. Where can I find the list of all siteMapNode attributes? I looked eveywhere and couldn't find it. 2. I created a Web.sitemap...
4
by: shapper | last post by:
Hello, I am trying to convert an Asp.Net 2.0 XML sitemap file to a Google's sitemap file. I am posting the formats of both files. 1. How can I do the conversion? 2. And can I use an...
1
by: sumone14 | last post by:
I have to create a program that opens a file and I have to find and show the words that have the most letters. I got the file to open but I can't figure out how to count the letters. I think I have...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.