XSL for removing words less than 4 letters in a sitemap

Olagato

I need to transform this:

<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://localhost/index.php/index./Paths-for-the-extreme-player</
loc>
</url>
<url>
<loc>http://localhost/index.php/index.php...e-edge-of-the-
wall</loc>
</url>
</urlset>

into this:

<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://localhost/index.php/index./Books/Paths-for-the-
extreme-player</loc>
<news:news>
<news:keywords>Books, Paths, extreme, player</
news:keywords>
</news:news>
</url>
<url>
<loc>http://localhost/index.php/index.php...e-edge-of-the-
wall</loc>
<news:news>
<news:keywords>Games, edge, wall</news:keywords>
</news:news>
</url>
</urlset>

I mean, I need a template for creating a <news:keywordstag which
contents all the words from <loctag with words of more than 3
letters.

Apr 1 '08 #1

Subscribe Post Reply

2223

Martin Honnen

Olagato wrote:

I need to transform this:

<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://localhost/index.php/index./Paths-for-the-extreme-player</
loc>
</url>
<url>
<loc>http://localhost/index.php/index.php...e-edge-of-the-
wall</loc>
</url>
</urlset>

into this:

<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://localhost/index.php/index./Books/Paths-for-the-
extreme-player</loc>
<news:news>
<news:keywords>Books, Paths, extreme, player</
news:keywords>
</news:news>
</url>
<url>
<loc>http://localhost/index.php/index.php...e-edge-of-the-
wall</loc>
<news:news>
<news:keywords>Games, edge, wall</news:keywords>
</news:news>
</url>
</urlset>

I mean, I need a template for creating a <news:keywordstag which
contents all the words from <loctag with words of more than 3
letters.

Do you want to use XSLT 2.0 or 1.0?
What about words like 'localhost' or 'index', how do you decide that
those are not taken?

Here is an XSLT 2.0 stylesheet that should show you an approach using
the tokenize method:

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:news="http://example.com/2008/news"
xmlns:sm="http://www.google.com/schemas/sitemap/0.84"
exclude-result-prefixes="sm"
version="2.0">

<xsl:output method="xml" indent="yes"/>

<xsl:strip-space elements="*"/>

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="sm:url">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
<news:news>
<news:keywords>
<xsl:value-of
select="for $s in tokenize(sm:loc, '/')[position() > 5]
return tokenize($s, '[\-/]')[string-length(.) > 3]"
separator=", "/>
</news:keywords>
</news:news>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>

Result with Saxon 9 when run against your posted input sample (with a
'root' element added and a namespace choosen for the 'news' prefix) is

<root>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>

<loc>http://localhost/index.php/index./Paths-for-the-extreme-player</loc>
<news:news xmlns:news="http://example.com/2008/news">
<news:keywords>Paths, extreme, player</news:keywords>
</news:news>
</url>
<url>

<loc>http://localhost/index.php/index.php/Games/The-edge-of-the-wall</loc>
<news:news xmlns:news="http://example.com/2008/news">
<news:keywords>Games, edge, wall</news:keywords>
</news:news>
</url>
</urlset>
</root>
--

Martin Honnen
http://JavaScript.FAQTs.com/

Apr 2 '08 #2

Martin Honnen

Olagato wrote:

>Do you want to use XSLT 2.0 or 1.0?
I'm using XSLT 1.0

>What about words like 'localhost' or 'index', how do you decide that those are not taken?
It's not a problem now. Maybe a sentence like next:
translate( translate( substring-after( sm:loc, 'http://localhost/
index.php/index.php/') ,'-', ',') ,'/',',')

I'm trying your XSL from PHP without success:

PHP only supports XSLT 1.0 so my posted stylesheet using XSLT and XPath
2.0 functionality does not work with PHP's XSLT processor.
--

Martin Honnen
http://JavaScript.FAQTs.com/

Apr 3 '08 #3

Olagato

On 3 abr, 13:06, Martin Honnen <mahotr...@yahoo.dewrote:

Olagato wrote:

Do you want to use XSLT 2.0 or 1.0?
I'm using XSLT 1.0

What about words like 'localhost' or 'index', how do you decide that those are not taken?
It's not a problem now. Maybe a sentence like next:
translate( translate( substring-after( sm:loc, 'http://localhost/
index.php/index.php/') ,'-', ',') ,'/',',')

I'm trying your XSL from PHP without success:

PHP only supports XSLT 1.0 so my posted stylesheet using XSLT and XPath
2.0 functionality does not work with PHP's XSLT processor.

--

Martin Honnen
http://JavaScript.FAQTs.com/

Your posted version in 1.0 functionality seems to be quite difficult
to implement because of lack of advanced functions (at least for a xsl
newbie like me) So my only alternative would be to use a XSLT
processor. I'll try Xalan on server: http://xalan.apache.org/
Any other idea using XSLT 1.0 will be appreciated.

Apr 3 '08 #4

Martin Honnen

Olagato wrote:

Your posted version in 1.0 functionality seems to be quite difficult
to implement because of lack of advanced functions (at least for a xsl
newbie like me) So my only alternative would be to use a XSLT
processor. I'll try Xalan on server: http://xalan.apache.org/
Any other idea using XSLT 1.0 will be appreciated.

Xalan does not do XSLT 2.0 so if you want to use XSLT 2.0 then try Saxon
(http://saxon.sourceforge.net/) or Gestalt
(http://gestalt.sourceforge.net/) or AltovaXML
(http://www.altova.com/altovaxml.html).

If you want to use PHP then I think PHP supports EXSLT so you could try
to use http://www.exslt.org/str/functions/tokenize/index.html

--

Martin Honnen
http://JavaScript.FAQTs.com/

Apr 3 '08 #5

Olagato

On 3 abr, 16:45, Martin Honnen <mahotr...@yahoo.dewrote:

Olagato wrote:
Your posted version in 1.0 functionality seems to be quite difficult
to implement because of lack of advanced functions (at least for a xsl
newbie like me) So my only alternative would be to use a XSLT
processor. I'll try Xalan on server:http://xalan.apache.org/
Any other idea using XSLT 1.0 will be appreciated.

Xalan does not do XSLT 2.0 so if you want to use XSLT 2.0 then try Saxon
(http://saxon.sourceforge.net/) or Gestalt
(http://gestalt.sourceforge.net/) or AltovaXML
(http://www.altova.com/altovaxml.html).

If you want to use PHP then I think PHP supports EXSLT so you could try
to usehttp://www.exslt.org/str/functions/tokenize/index.html

--

Martin Honnen
http://JavaScript.FAQTs.com/

Thank you very much, Martin
It's now working fine with Altova XML Spy and Saxon9 as external XSLT
parser:
http://216.239.59.104/search?q=cache...ient=firefox-a

There are only 2 little issues left:

My XML input is:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://localhost/index.php/index.php...site/Rutas-de-
verano-en-España</loc>
<lastmod>2008-03-13</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://localhost/index.php/index.php...site/Rutas/El-
Camino-de-Santiago-en-el-Sobrarbe</loc>
<lastmod>2008-02-12</lastmod>
<changefreq>weekly</changefreq>
<priority>0.7</priority>
</url>
</urlset>

Your XSLT 2.0 is:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
xmlns:sm="http://www.sitemaps.org/schemas/sitemap/0.9" exclude-result-
prefixes="sm" version="2.0">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="sm:url">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
<news:news>
<news:publication_date>
<xsl:value-of select="sm:lastmod"/>
</news:publication_date>
<news:keywords>
<xsl:value-of select="for $s in tokenize(sm:loc, '/')[position()
> 5]
return tokenize($s, '[\-/]')[string-length(.)
> 3]" separator=", "/>
</news:keywords>
</news:news>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

The output is:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://localhost/index.php/index.php...site/Rutas-de-
verano-en-España</loc>
<lastmod>2008-03-13</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
<news:news xmlns:news="http://www.google.com/schemas/sitemap-news/
0.9">
<news:publication_date>2008-03-13</news:publication_date>
<news:keywords>ezwebin_site, Rutas, verano, España</news:keywords>
</news:news>
</url>
<url>
<loc>http://localhost/index.php/index.php...site/Rutas/El-
Camino-de-Santiago-en-el-Sobrarbe</loc>
<lastmod>2008-02-12</lastmod>
<changefreq>weekly</changefreq>
<priority>0.7</priority>
<news:news xmlns:news="http://www.google.com/schemas/sitemap-news/
0.9">
<news:publication_date>2008-02-12</news:publication_date>
<news:keywords>ezwebin_site, Rutas, Camino, Santiago, rt</
news:keywords>
</news:news>
</url>
</urlset>

But I need an output like defined by News Sitemap Protocol:
http://www.google.com/support/webmas...y?answer=42738

So there are 2 things left:
1- <lastmodtags should dissapear from <urloutputs because a
<news:publication_datetag has been defined already.
2- xmlns:news namespace should dissapear from <news:newstags and it
should be taken to the <urlset xmlns="http://www.sitemaps.org/schemas/
sitemap/0.9"tag in the header.

A good output file would be:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
<url>
<loc>http://localhost/index.php/index.php...site/Rutas-de-
verano-en-España</loc>
<changefreq>daily</changefreq>
<priority>0.8</priority>
<news:news>
<news:publication_date>2008-03-13</news:publication_date>
<news:keywords>ezwebin_site, Rutas, verano, España</news:keywords>
</news:news>
</url>
<url>
<loc>http://localhost/index.php/index.php...site/Rutas/El-
Camino-de-Santiago-en-el-Sobrarbe</loc>
<changefreq>weekly</changefreq>
<priority>0.7</priority>
<news:news>
<news:publication_date>2008-02-12</news:publication_date>
<news:keywords>ezwebin_site, Rutas, Camino, Santiago, rt</
news:keywords>
</news:news>
</url>
</urlset>

Any idea ?

Apr 8 '08 #6

Martin Honnen

Olagato wrote:

So there are 2 things left:
1- <lastmodtags should dissapear from <urloutputs because a
<news:publication_datetag has been defined already.
2- xmlns:news namespace should dissapear from <news:newstags and it
should be taken to the <urlset xmlns="http://www.sitemaps.org/schemas/
sitemap/0.9"tag in the header.

Both are easy adaptions, you need to use a predicate
[not(self::sm:lastmod)] and you can use xsl:namespace to make sure a
namespace declaration is created on the root element:

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
xmlns:sm="http://www.sitemaps.org/schemas/sitemap/0.9"
exclude-result-prefixes="sm"
version="2.0">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="sm:urlset">
<xsl:copy>
<xsl:namespace name="news"
select="'http://www.google.com/schemas/sitemap-news/0.9'"/>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="sm:url">
<xsl:copy>
<xsl:apply-templates select="@* | node()[not(self::sm:lastmod)]"/>
<news:news>
<news:publication_date>
<xsl:value-of select="sm:lastmod"/>
</news:publication_date>
<news:keywords>
<xsl:value-of select="for $s in tokenize(sm:loc, '/')[position()
> 5]
return tokenize($s, '[\-/]')[string-length(.)
> 3]" separator=", "/>
</news:keywords>
</news:news>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
--

Martin Honnen
http://JavaScript.FAQTs.com/

Apr 9 '08 #7

Similar topics

Removing dots - please help me out

by: Aristotle | last post by:

Could you please help me out with regular expressions. I'm trying to write a perl script that proccesses some text, and i'm stuck at the following: need to remove from the text 1. dots followed...

Perl

how to count words from text

by: dan | last post by:

this is a program to count average letters per word. i am able to count the total number of letters, but not words. How do you count the total number of words in a text file, so i am able to divide...

C / C++

Generating hyperlinks dynamically (replacing words with links)

by: Jazzdrums | last post by:

Hello, I've (parts of ) HTML documents and a list of words that I have to transform as an hyperlinks, i.e. surround them with a "<a href="...">" "</a>". A first simple approach is to parse the...

C# / C Sharp

Scrambled Words Loop

by: OpticTygre | last post by:

I need to write a loop that prints all the combination possibilities of a character array. Basically, taking a scrambled word, or a regular word, and printing out all the combinations. The...

Visual Basic .NET

removing a node from a sitemap

by: Bill Mild | last post by:

How do I write a derived data source so that I can remove a node from a sitemap data source? Basically, I have a situation where the built-in security trimming is not exactly what I need. I need...

ASP.NET

Web.sitemap and Flash

by: JJ | last post by:

Although this question involves Flash, I suspect the actual issue is an asp one.. I am trying to open the web.sitemap file in an .swf file enbedded in an asp page (I'm working in VS 2005). I...

ASP.NET

Asp.Net 2.0 Web Site Map (Web.sitemap)

by: shapper | last post by:

Hello, I have 2 questions about Asp.Net 2.0 web.sitemap: 1. Where can I find the list of all siteMapNode attributes? I looked eveywhere and couldn't find it. 2. I created a Web.sitemap...

ASP.NET

Google SiteMap

by: shapper | last post by:

Hello, I am trying to convert an Asp.Net 2.0 XML sitemap file to a Google's sitemap file. I am posting the formats of both files. 1. How can I do the conversion? 2. And can I use an...

ASP.NET

Show the words that have the most letters...

by: sumone14 | last post by:

I have to create a program that opens a file and I have to find and show the words that have the most letters. I got the file to open but I can't figure out how to count the letters. I think I have...

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA