473,545 Members | 2,715 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Getting kind of abstract text snippets from text nodes

Hi everybody,

I am about implementing a little search engine that searches a phrase
over xml text nodes. I got
that all working fine but what I want as the results is not the
complete text of the textnode,
I would like to make an abstract like result list (such output that
you get with google searches.

For eg

.... I am the <b>substring</bfrom a complete text node ...

where "substring" is the search term.

The problem is simple (I think): I want to extract all the text parts
of the complete text node,
where search searchterm is highlighted, surrounded by the text like
30
characters.

I found an intersting post "cut down text" which is almost that what
I
am looking for, but there the
text is just trimmed by x characters.

Is anybody here, that has an "elegant" way to solve that or some
hints
that get me to the solution? I am not able to use regex (would be
nice
though)
My parser is Sablotron so I am restricted to the functions that I
get.
(1.0).
Any help is greatly appreciated.
regards,
Andreas W Wylach

Mar 8 '07 #1
2 1512
Think about dividing the text into three parts: before your target, the
target itself, and after the target. Process each appropriately. If you
want to report multiple instances within the same block of text, look at
the standard examples of recursive text processing.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Mar 8 '07 #2

"Andreas W. Wylach" <aw@ioc3.dewrot e in message
news:11******** **************@ 8g2000cwh.googl egroups.com...
Hi everybody,

I am about implementing a little search engine that searches a phrase
over xml text nodes. I got
that all working fine but what I want as the results is not the
complete text of the textnode,
I would like to make an abstract like result list (such output that
you get with google searches.

For eg

... I am the <b>substring</bfrom a complete text node ...

where "substring" is the search term.

The problem is simple (I think): I want to extract all the text parts
of the complete text node,
where search searchterm is highlighted, surrounded by the text like
30
characters.

FXSL gives you exactly that (look for testConcordance .xsl).

As first shown here a year and a half ago:
http://www.stylusstudio.com/xsllist/...post00560.html

this was used to create a concordance of the text of the New Testament for
any word longer than three characters with frequency count in the document
not exceeding a given frequency count parameter (1280, which practically
leaves out mainly pronouns).

The code itself is 95 lines and on a 3GHz, 2GB Pentium IV PC with Saxon 8.6
(at that time) needed less than 92 seconds to produce the complete (huge)
concordance. The source xml document: "ot Ending Spaces.xml" is almost 50
000 (fifty thousand) lines long.

This is just one illustration of the reality of what can be done with XSLT,
disspelling the myths of "XSLT cannot do this or that
efficiently/elegantly".

Hope this helped.
Cheers,
Dimitre Novatchev


Mar 10 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

303
17452
by: mike420 | last post by:
In the context of LATEX, some Pythonista asked what the big successes of Lisp were. I think there were at least three *big* successes. a. orbitz.com web site uses Lisp for algorithms, etc. b. Yahoo store was originally written in Lisp. c. Emacs The issues with these will probably come up, so I might as well mention them myself (which...
7
2103
by: F. Da Costa | last post by:
Hi, I' looking to retrieve ProdName1 form the <tr> below. <tr id="1-1-1" class="even"> <td> <div class="tier4"> <a href="#" class="leaf"></a> ProdName1 </div>
4
12201
by: leodippolito | last post by:
Hello sirs, I am trying to send a POST request to a webservice on the click of a button. This will return me an XML document with a list of combo box items. The problem: in FIREFOX, when the get the XmlDocument from the XmlHttpRequest object, I can't access its contents. I keep getting empty strings and "null".
4
2360
by: Pavils Jurjans | last post by:
Hello, I am interested in getting the XML contents as text between two XML elements that I know follow each other. They could be in completely different levels, but in XML file the first is above the second. Of course, a code could be written that would first read InnerXml property of the start tag, extract the opening parens <starttag>,...
7
3802
by: Sashi | last post by:
Two questions: (1) I can pull the text of an XML element as a string just fine using code as such: strSomeString = myXmlDoc.SelectSingleNode("/Element1/Element2/Element3", myXmlNSMgr).InnerText;
22
1842
by: gabon | last post by:
Hi guys, I'm facing the bug about the failure of innterHTML while reading xhtml content inside a DIV, in fact it has passed as html removing the closing of some nodes. Is there a way to read the content of the div perfectly how it is in the page, kind of: var obj = document.getElementById("myDIV"); var realContent = obj.toString(); ...
4
4475
by: R.Manikandan | last post by:
Hi In my code, one string variable is subjected to contain more amount of characters. If it cross certain limit, the string content in the varabile is automatically getting truncated and i am not getting the full data. Is there any max limit for an variable to store in javascript? Can anyone please help me to solve this prob?
2
3551
by: rustyc | last post by:
Well, here's my first post in this forum (other than saying 'HI' over in the hi forum ;-) As I said over there: ... for a little side project at home, I'm writing a ham radio web site in uby/Rails. I started it in Perl and gave up on Perl as I went from the 'display the database information on the web page' to the 're-display the information...
2
1629
by: Glich | last post by:
"""Hi, how can I extend the code shown below so that I can identify any "CallFunc" in "func.code" and identify the value of "node" in "CallFunc"? Thanks. This is my code so far: """ """ Given a python file, this program prints out each function's name in that file, the line number and the ast code of that function.
0
7496
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7428
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7685
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7941
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7452
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7784
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5354
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3485
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3467
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.