I'm working with java and XML documents in order to search for keywords in a
given element name, eg element name 'author' == "jo blogs".
The problem is the XML documents are downloaded (this process is automated)
from different websites thus the element names for author may differ!
Is their a way of dealing with this, such as perhaps a standard adopted by,
say educational websites to agree on element names ?
Thanks very much
ps im also looking for a good simple search method, by element name and also
just searching an xml document as a regular text document 3 1941
Options
1) If you have limited number of schema (for difffering xml documents) then
you could possibly transform these documents
into your own common format and then write an xquery / xpath expression
to search for keywords in a given element
name.
2) Second option is to store all the keywords that you encounter in a
master file and then launch a process that does your
search (multi-thread for efficiency)
3) Use a comman standard in a direct format (that would mean all the
websites generate the info in a common format).
I would not be able to help without more information over here
"sal achhala" <no**@none.co m> wrote in message
news:c1******** **@newsg2.svr.p ol.co.uk... I'm working with java and XML documents in order to search for keywords in
a given element name, eg element name 'author' == "jo blogs".
The problem is the XML documents are downloaded (this process is
automated) from different websites thus the element names for author may differ!
Is their a way of dealing with this, such as perhaps a standard adopted
by, say educational websites to agree on element names ?
Thanks very much
ps im also looking for a good simple search method, by element name and
also just searching an xml document as a regular text document
> 1) If you have limited number of schema (for difffering xml documents)
then you could possibly transform these documents into your own common format
and then write an xquery / xpath expression to search for keywords in a
given element name.
thanks Martin, the option above makes sense to me (im new to java/XML) - i
could transform the diffrent formats into a common one. How easy would that
be ?
The common format of my XML documents would be Date, Title, Author and
articleBody.
how would one go about transforming the documents ?
Considering element names would differ from site to site how would an
automated process recognise, for instance, that 'name' is the same as
'author' ?
thanks very much
sal
"Martin SChukrazy" <pr****@hotmail .com> wrote in message
news:40******** **************@ news.rcn.com... Options 1) If you have limited number of schema (for difffering xml documents)
then you could possibly transform these documents into your own common format and then write an xquery / xpath
expression to search for keywords in a given element name. 2) Second option is to store all the keywords that you encounter in a master file and then launch a process that does your search (multi-thread for efficiency) 3) Use a comman standard in a direct format (that would mean all the websites generate the info in a common format). I would not be able to help without more information over here "sal achhala" <no**@none.co m> wrote in message news:c1******** **@newsg2.svr.p ol.co.uk... I'm working with java and XML documents in order to search for keywords
in a given element name, eg element name 'author' == "jo blogs".
The problem is the XML documents are downloaded (this process is automated) from different websites thus the element names for author may differ!
Is their a way of dealing with this, such as perhaps a standard adopted by, say educational websites to agree on element names ?
Thanks very much
ps im also looking for a good simple search method, by element name and also just searching an xml document as a regular text document
There are several ways to go about this...
1) Use standard Data Transformation toolkits which transform from text / xml
to a given xml format. Usually visual GUI toolkits make the job easier..
2) Use XSLT transforms to transform from one xml format to a standard xml
format
Again you can usually try GUI tools such as Stylus Studio to do the XSLT
transform and then verify the results..
"sal achhala" <no**@none.co m> wrote in message
news:c1******** **@newsg4.svr.p ol.co.uk... 1) If you have limited number of schema (for difffering xml documents) then you could possibly transform these documents into your own common
format and then write an xquery / xpath expression to search for keywords in a given element name.
thanks Martin, the option above makes sense to me (im new to java/XML) - i could transform the diffrent formats into a common one. How easy would
that be ?
The common format of my XML documents would be Date, Title, Author and articleBody.
how would one go about transforming the documents ?
Considering element names would differ from site to site how would an automated process recognise, for instance, that 'name' is the same as 'author' ?
thanks very much
sal
"Martin SChukrazy" <pr****@hotmail .com> wrote in message news:40******** **************@ news.rcn.com... Options 1) If you have limited number of schema (for difffering xml documents) then you could possibly transform these documents into your own common format and then write an xquery / xpath expression to search for keywords in a given element name. 2) Second option is to store all the keywords that you encounter in a master file and then launch a process that does your search (multi-thread for efficiency) 3) Use a comman standard in a direct format (that would mean all the websites generate the info in a common format). I would not be able to help without more information over here "sal achhala" <no**@none.co m> wrote in message news:c1******** **@newsg2.svr.p ol.co.uk... I'm working with java and XML documents in order to search for
keywords in a given element name, eg element name 'author' == "jo blogs".
The problem is the XML documents are downloaded (this process is automated) from different websites thus the element names for author may differ!
Is their a way of dealing with this, such as perhaps a standard
adopted by, say educational websites to agree on element names ?
Thanks very much
ps im also looking for a good simple search method, by element name
and also just searching an xml document as a regular text document
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: John C |
last post by:
Hi, I am trying to include the generation of random numbers in my c++ class.
However I don't quite know how to incorporate it.
To start with, I managed to get random numbers going via the following...
============
#include <boost/random/linear_congruential.hpp>
#include <boost/random/uniform_real.hpp>
#include <boost/random/variate_generator.hpp>
|
by: Marshall Belew |
last post by:
I'm trying to synchronize a network app that uses random numbers
generated by System.Random. Rather than pass every randomly generated
number, I just pass the seed. I'm seeing a result that leads me to
believe that a seeded random number is still slightly random. I need
a predictable random number.
Here's my results
Machine 1
Seed: 1453549276
|
by: Julia |
last post by:
Hi,
We have an ASP.NET site,and a data base which contains different types of
XML documents
I would like to ask which technology is best to use for searching XML
documents.
My concern is that a 'simple' search on the XML document can retrieve wrong
results
if the search will ignore the fact that the XML have a schema inside
for example assuming I have simple XML
|
by: Geoff Jones |
last post by:
Hiya
I have a DataTable containing thousands of records. Each record has a
primary key field called "ID" and another field called "PRODUCT" I want to
retrieve the rows that satisy the following criteria:
I have a list of about 100 numbers which correspond to the ID field and also
another 40 say numbers corresponding to the numbers in the PRODUCT field. I
want to show the rows that correspond to both these criteria.
|
by: justobservant |
last post by:
When more than one keyword is typed into a search-query, most of the
search-results displayed indicate specified keywords scattered
throughout an entire website of content i.e., this is shown as three
bolded periods '...' in search-result listings.
Additionally, most content is outdated; as many users need up-to-date
content. Hence, filtering-through search-results becomes quite
cumbersome.
The newsgroup listings allow detailed...
| |
by: Alan Isaac |
last post by:
This may seem very strange, but it is true.
If I delete a .pyc file, my program executes with a different state!
In a single directory I have
module1 and module2.
module1 imports random and MyClass from module2.
module2 does not import random.
module1 sets a seed like this::
|
by: Lanny |
last post by:
Well the othe day I was making a program to make a list of all the songs in
certian directorys but I got a problem, only one of the directorys was added
to the list. Heres my code:
import random
import os
import glob
songs = glob.glob('C:\Documents and Settings\Admin\My
Documents\LimeWire\Saved\*.mp3')
|
by: Edwin.Madari |
last post by:
use songs.extend( asongs ) #append is for single item - where ever it mightbe.
good luck.
Edwin
-----Original Message-----
|
by: lemlimlee |
last post by:
hello,
this is the task i need to do:
For this task, you are to develop a Java program that allows a user to search or sort an array of numbers using an algorithm that the user chooses. The search algorithms that can be used are Linear Search and Binary Search. The sorting algorithms are bubble, selection and Insertion sort.
First, the user is asked whether he/she wants to perform a search option, a sort operation, or exit the program. If...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |