473,320 Members | 1,713 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Getting kind of abstract text snippets from text nodes

Hi everybody,

I am about implementing a little search engine that searches a phrase
over xml text nodes. I got
that all working fine but what I want as the results is not the
complete text of the textnode,
I would like to make an abstract like result list (such output that
you get with google searches.

For eg

.... I am the <b>substring</bfrom a complete text node ...

where "substring" is the search term.

The problem is simple (I think): I want to extract all the text parts
of the complete text node,
where search searchterm is highlighted, surrounded by the text like
30
characters.

I found an intersting post "cut down text" which is almost that what
I
am looking for, but there the
text is just trimmed by x characters.

Is anybody here, that has an "elegant" way to solve that or some
hints
that get me to the solution? I am not able to use regex (would be
nice
though)
My parser is Sablotron so I am restricted to the functions that I
get.
(1.0).
Any help is greatly appreciated.
regards,
Andreas W Wylach

Mar 8 '07 #1
2 1500
Think about dividing the text into three parts: before your target, the
target itself, and after the target. Process each appropriately. If you
want to report multiple instances within the same block of text, look at
the standard examples of recursive text processing.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Mar 8 '07 #2

"Andreas W. Wylach" <aw@ioc3.dewrote in message
news:11**********************@8g2000cwh.googlegrou ps.com...
Hi everybody,

I am about implementing a little search engine that searches a phrase
over xml text nodes. I got
that all working fine but what I want as the results is not the
complete text of the textnode,
I would like to make an abstract like result list (such output that
you get with google searches.

For eg

... I am the <b>substring</bfrom a complete text node ...

where "substring" is the search term.

The problem is simple (I think): I want to extract all the text parts
of the complete text node,
where search searchterm is highlighted, surrounded by the text like
30
characters.

FXSL gives you exactly that (look for testConcordance.xsl).

As first shown here a year and a half ago:
http://www.stylusstudio.com/xsllist/...post00560.html

this was used to create a concordance of the text of the New Testament for
any word longer than three characters with frequency count in the document
not exceeding a given frequency count parameter (1280, which practically
leaves out mainly pronouns).

The code itself is 95 lines and on a 3GHz, 2GB Pentium IV PC with Saxon 8.6
(at that time) needed less than 92 seconds to produce the complete (huge)
concordance. The source xml document: "ot Ending Spaces.xml" is almost 50
000 (fifty thousand) lines long.

This is just one illustration of the reality of what can be done with XSLT,
disspelling the myths of "XSLT cannot do this or that
efficiently/elegantly".

Hope this helped.
Cheers,
Dimitre Novatchev


Mar 10 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

303
by: mike420 | last post by:
In the context of LATEX, some Pythonista asked what the big successes of Lisp were. I think there were at least three *big* successes. a. orbitz.com web site uses Lisp for algorithms, etc. b....
7
by: F. Da Costa | last post by:
Hi, I' looking to retrieve ProdName1 form the <tr> below. <tr id="1-1-1" class="even"> <td> <div class="tier4"> <a href="#" class="leaf"></a> ProdName1 </div>
4
by: leodippolito | last post by:
Hello sirs, I am trying to send a POST request to a webservice on the click of a button. This will return me an XML document with a list of combo box items. The problem: in FIREFOX, when the...
4
by: Pavils Jurjans | last post by:
Hello, I am interested in getting the XML contents as text between two XML elements that I know follow each other. They could be in completely different levels, but in XML file the first is...
7
by: Sashi | last post by:
Two questions: (1) I can pull the text of an XML element as a string just fine using code as such: strSomeString = myXmlDoc.SelectSingleNode("/Element1/Element2/Element3",...
22
by: gabon | last post by:
Hi guys, I'm facing the bug about the failure of innterHTML while reading xhtml content inside a DIV, in fact it has passed as html removing the closing of some nodes. Is there a way to read the...
4
by: R.Manikandan | last post by:
Hi In my code, one string variable is subjected to contain more amount of characters. If it cross certain limit, the string content in the varabile is automatically getting truncated and i am...
2
by: rustyc | last post by:
Well, here's my first post in this forum (other than saying 'HI' over in the hi forum ;-) As I said over there: ... for a little side project at home, I'm writing a ham radio web site in...
2
by: Glich | last post by:
"""Hi, how can I extend the code shown below so that I can identify any "CallFunc" in "func.code" and identify the value of "node" in "CallFunc"? Thanks. This is my code so far: """ """...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.