473,793 Members | 2,894 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Searching random XML documents

I'm working with java and XML documents in order to search for keywords in a
given element name, eg element name 'author' == "jo blogs".

The problem is the XML documents are downloaded (this process is automated)
from different websites thus the element names for author may differ!

Is their a way of dealing with this, such as perhaps a standard adopted by,
say educational websites to agree on element names ?

Thanks very much

ps im also looking for a good simple search method, by element name and also
just searching an xml document as a regular text document
Jul 20 '05 #1
3 1941
Options
1) If you have limited number of schema (for difffering xml documents) then
you could possibly transform these documents
into your own common format and then write an xquery / xpath expression
to search for keywords in a given element
name.
2) Second option is to store all the keywords that you encounter in a
master file and then launch a process that does your
search (multi-thread for efficiency)
3) Use a comman standard in a direct format (that would mean all the
websites generate the info in a common format).
I would not be able to help without more information over here

"sal achhala" <no**@none.co m> wrote in message
news:c1******** **@newsg2.svr.p ol.co.uk...
I'm working with java and XML documents in order to search for keywords in a given element name, eg element name 'author' == "jo blogs".

The problem is the XML documents are downloaded (this process is automated) from different websites thus the element names for author may differ!

Is their a way of dealing with this, such as perhaps a standard adopted by, say educational websites to agree on element names ?

Thanks very much

ps im also looking for a good simple search method, by element name and also just searching an xml document as a regular text document

Jul 20 '05 #2
> 1) If you have limited number of schema (for difffering xml documents)
then
you could possibly transform these documents into your own common format and then write an xquery / xpath expression to search for keywords in a
given element name.

thanks Martin, the option above makes sense to me (im new to java/XML) - i
could transform the diffrent formats into a common one. How easy would that
be ?

The common format of my XML documents would be Date, Title, Author and
articleBody.

how would one go about transforming the documents ?

Considering element names would differ from site to site how would an
automated process recognise, for instance, that 'name' is the same as
'author' ?

thanks very much

sal
"Martin SChukrazy" <pr****@hotmail .com> wrote in message
news:40******** **************@ news.rcn.com... Options
1) If you have limited number of schema (for difffering xml documents) then you could possibly transform these documents
into your own common format and then write an xquery / xpath expression to search for keywords in a given element
name.
2) Second option is to store all the keywords that you encounter in a
master file and then launch a process that does your
search (multi-thread for efficiency)
3) Use a comman standard in a direct format (that would mean all the
websites generate the info in a common format).
I would not be able to help without more information over here

"sal achhala" <no**@none.co m> wrote in message
news:c1******** **@newsg2.svr.p ol.co.uk...
I'm working with java and XML documents in order to search for keywords
in a
given element name, eg element name 'author' == "jo blogs".

The problem is the XML documents are downloaded (this process is

automated)
from different websites thus the element names for author may differ!

Is their a way of dealing with this, such as perhaps a standard adopted

by,
say educational websites to agree on element names ?

Thanks very much

ps im also looking for a good simple search method, by element name and

also
just searching an xml document as a regular text document


Jul 20 '05 #3
There are several ways to go about this...
1) Use standard Data Transformation toolkits which transform from text / xml
to a given xml format. Usually visual GUI toolkits make the job easier..
2) Use XSLT transforms to transform from one xml format to a standard xml
format

Again you can usually try GUI tools such as Stylus Studio to do the XSLT
transform and then verify the results..
"sal achhala" <no**@none.co m> wrote in message
news:c1******** **@newsg4.svr.p ol.co.uk...
1) If you have limited number of schema (for difffering xml documents) then
you could possibly transform these documents into your own common format and then write an xquery / xpath expression to search for keywords in a
given element name.

thanks Martin, the option above makes sense to me (im new to java/XML) - i
could transform the diffrent formats into a common one. How easy would that be ?

The common format of my XML documents would be Date, Title, Author and
articleBody.

how would one go about transforming the documents ?

Considering element names would differ from site to site how would an
automated process recognise, for instance, that 'name' is the same as
'author' ?

thanks very much

sal
"Martin SChukrazy" <pr****@hotmail .com> wrote in message
news:40******** **************@ news.rcn.com...
Options
1) If you have limited number of schema (for difffering xml documents) then
you could possibly transform these documents
into your own common format and then write an xquery / xpath

expression
to search for keywords in a given element
name.
2) Second option is to store all the keywords that you encounter in a
master file and then launch a process that does your
search (multi-thread for efficiency)
3) Use a comman standard in a direct format (that would mean all the
websites generate the info in a common format).
I would not be able to help without more information over here

"sal achhala" <no**@none.co m> wrote in message
news:c1******** **@newsg2.svr.p ol.co.uk...
I'm working with java and XML documents in order to search for

keywords in
a
given element name, eg element name 'author' == "jo blogs".

The problem is the XML documents are downloaded (this process is

automated)
from different websites thus the element names for author may differ!

Is their a way of dealing with this, such as perhaps a standard

adopted by,
say educational websites to agree on element names ?

Thanks very much

ps im also looking for a good simple search method, by element name
and also
just searching an xml document as a regular text document



Jul 20 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
6276
by: John C | last post by:
Hi, I am trying to include the generation of random numbers in my c++ class. However I don't quite know how to incorporate it. To start with, I managed to get random numbers going via the following... ============ #include <boost/random/linear_congruential.hpp> #include <boost/random/uniform_real.hpp> #include <boost/random/variate_generator.hpp>
10
2514
by: Marshall Belew | last post by:
I'm trying to synchronize a network app that uses random numbers generated by System.Random. Rather than pass every randomly generated number, I just pass the seed. I'm seeing a result that leads me to believe that a seeded random number is still slightly random. I need a predictable random number. Here's my results Machine 1 Seed: 1453549276
3
1153
by: Julia | last post by:
Hi, We have an ASP.NET site,and a data base which contains different types of XML documents I would like to ask which technology is best to use for searching XML documents. My concern is that a 'simple' search on the XML document can retrieve wrong results if the search will ignore the fact that the XML have a schema inside for example assuming I have simple XML
33
2517
by: Geoff Jones | last post by:
Hiya I have a DataTable containing thousands of records. Each record has a primary key field called "ID" and another field called "PRODUCT" I want to retrieve the rows that satisy the following criteria: I have a list of about 100 numbers which correspond to the ID field and also another 40 say numbers corresponding to the numbers in the PRODUCT field. I want to show the rows that correspond to both these criteria.
5
2406
by: justobservant | last post by:
When more than one keyword is typed into a search-query, most of the search-results displayed indicate specified keywords scattered throughout an entire website of content i.e., this is shown as three bolded periods '...' in search-result listings. Additionally, most content is outdated; as many users need up-to-date content. Hence, filtering-through search-results becomes quite cumbersome. The newsgroup listings allow detailed...
39
2472
by: Alan Isaac | last post by:
This may seem very strange, but it is true. If I delete a .pyc file, my program executes with a different state! In a single directory I have module1 and module2. module1 imports random and MyClass from module2. module2 does not import random. module1 sets a seed like this::
6
1509
by: Lanny | last post by:
Well the othe day I was making a program to make a list of all the songs in certian directorys but I got a problem, only one of the directorys was added to the list. Heres my code: import random import os import glob songs = glob.glob('C:\Documents and Settings\Admin\My Documents\LimeWire\Saved\*.mp3')
0
987
by: Edwin.Madari | last post by:
use songs.extend( asongs ) #append is for single item - where ever it mightbe. good luck. Edwin -----Original Message-----
5
3188
by: lemlimlee | last post by:
hello, this is the task i need to do: For this task, you are to develop a Java program that allows a user to search or sort an array of numbers using an algorithm that the user chooses. The search algorithms that can be used are Linear Search and Binary Search. The sorting algorithms are bubble, selection and Insertion sort. First, the user is asked whether he/she wants to perform a search option, a sort operation, or exit the program. If...
0
9671
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10433
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10161
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10000
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9035
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7538
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5436
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4112
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3720
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.