473,386 Members | 1,943 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Searching/replacing multiple words in input.

I need to search and replace multiple words in one pass of an input
stream or string. For example, given the input:

"The quick brown fox jumped over the lazy dog's back"

and given the replacements

quick -> slow
jump -> walk

I'd like to get the string

"The slow brown fox walked over the lazy dog's back."

I've done some Google searching and learned that the Aho-Corasick and
Commentz-Walter algorithms do this, but I can't find any Java
implementations on the net. These algorithms look time-consuming to
implement and test, and I don't want to go ahead and implement them if
I could just download an already working and debugged implementation.

Thank you.

---Chris

=========================
Good Fig - News for Christians
http://www.goodfig.org
Jul 17 '05 #1
3 13644
Using JUnit to test (and why wouldn't you?):

import junit.framework.TestCase;

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegexTest extends TestCase
{
public RegexTest(String s)
{
super(s);
}

public void testRegex()
{
String s = "The quick brown fox jumped over the lazy dog's back";
String check = "The slow brown fox walked over the lazy dog's back";

Pattern p = Pattern.compile("quick");
Matcher m = p.matcher(s);
s = m.replaceFirst("slow");

p = Pattern.compile("jump");
m = p.matcher(s);
s = m.replaceFirst("walk");

assertEquals(check, s);
}
}

/qb

Christopher R. Barry wrote:
I need to search and replace multiple words in one pass of an input
stream or string. For example, given the input:

"The quick brown fox jumped over the lazy dog's back"

and given the replacements

quick -> slow
jump -> walk

I'd like to get the string

"The slow brown fox walked over the lazy dog's back."

I've done some Google searching and learned that the Aho-Corasick and
Commentz-Walter algorithms do this, but I can't find any Java
implementations on the net. These algorithms look time-consuming to
implement and test, and I don't want to go ahead and implement them if
I could just download an already working and debugged implementation.

Thank you.

---Chris

=========================
Good Fig - News for Christians
http://www.goodfig.org


Jul 17 '05 #2
John Kordyback <jk********@hotmail.com> wrote in message news:<KHJhb.65806$pl3.6728@pd7tw3no>...
[...]

Your example scans the input once for each word to replace though.
However, once I realized that java.util.regex.Pattern supports
patterns like "foo|bar" to mean match "foo" or match "bar", I realized
a somewhat inefficient solution to my problem would be to build a
string containing all the words I want to replace delimited by "|" and
use that as the match pattern.

Here's some example code that's rather wasteful with memory but works.
This will be adequate for my purposes in the short term but eventually
I'll want an efficient version that works on streams. You construct a
MultipleSearch object by passing it a hashtable containing as keys the
words to replace and as values the replacements, and then you call the
resulting MultipleSearch object's replaceAll method on the input
string. I've included a main method with an example. The
MultipleSearch class doesn't bother to copy the Hashtable, so the
hashtable shouldn't be modified after a MultipleSearch is constructed
from it unless you pass a copy to the MultipleSearch
constructor. Since Hashtable is synchronized MultipleSearch should be
thread-safe. This code is placed in the public domain.

---Chris

import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.Hashtable;
import java.util.Enumeration;
import java.lang.StringBuffer;

public class MultipleSearch
{
private Pattern wordsToReplacePattern;
private Hashtable replacementsTable;

public MultipleSearch(Hashtable ht) {

StringBuffer regexbuf = new StringBuffer(ht.size()*7);
Enumeration wordsToReplace = ht.keys();
if (wordsToReplace.hasMoreElements()) {
regexbuf.append(wordsToReplace.nextElement());
}
while (wordsToReplace.hasMoreElements()) {
regexbuf.append("|" + wordsToReplace.nextElement());
}
wordsToReplacePattern = Pattern.compile(regexbuf.toString());
replacementsTable = ht;
}

public String replaceAll(String input) {
StringBuffer sb = new StringBuffer(input.length());
Matcher m = wordsToReplacePattern.matcher(input);

int end = 0;
while (m.find()) {
String wordToReplace = m.group();
String replacement =
(String)replacementsTable.get(wordToReplace);
m.appendReplacement(sb, replacement);
end = m.end();
}
sb.append(input.substring(end));

return sb.toString();
}

public static void main(String args[]) {
String s =
"The quick brown fox jumped over the lazy dog's back.";

Hashtable ht = new Hashtable(10);
ht.put("quick", "slow");
ht.put("jump", "walk");
ht.put("lazy", "hard working");
ht.put("brown", "red");

MultipleSearch ms = new MultipleSearch(ht);
System.out.println(ms.replaceAll(s));
}
}
Jul 17 '05 #3
Oh absolutely. My example wasn't meant to be really useful or efficient.
IntelliJ almost screamed "Extract the method you moron!" at me as I wrote it.

Good luck...qb

Christopher R. Barry wrote:
John Kordyback <jk********@hotmail.com> wrote in message news:<KHJhb.65806$pl3.6728@pd7tw3no>...
[...]

Your example scans the input once for each word to replace though.
However, once I realized that java.util.regex.Pattern supports
patterns like "foo|bar" to mean match "foo" or match "bar", I realized
a somewhat inefficient solution to my problem would be to build a
string containing all the words I want to replace delimited by "|" and
use that as the match pattern.

Here's some example code that's rather wasteful with memory but works.
This will be adequate for my purposes in the short term but eventually
I'll want an efficient version that works on streams. You construct a
MultipleSearch object by passing it a hashtable containing as keys the
words to replace and as values the replacements, and then you call the
resulting MultipleSearch object's replaceAll method on the input
string. I've included a main method with an example. The
MultipleSearch class doesn't bother to copy the Hashtable, so the
hashtable shouldn't be modified after a MultipleSearch is constructed
from it unless you pass a copy to the MultipleSearch
constructor. Since Hashtable is synchronized MultipleSearch should be
thread-safe. This code is placed in the public domain.

---Chris

import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.Hashtable;
import java.util.Enumeration;
import java.lang.StringBuffer;

public class MultipleSearch
{
private Pattern wordsToReplacePattern;
private Hashtable replacementsTable;

public MultipleSearch(Hashtable ht) {

StringBuffer regexbuf = new StringBuffer(ht.size()*7);
Enumeration wordsToReplace = ht.keys();
if (wordsToReplace.hasMoreElements()) {
regexbuf.append(wordsToReplace.nextElement());
}
while (wordsToReplace.hasMoreElements()) {
regexbuf.append("|" + wordsToReplace.nextElement());
}
wordsToReplacePattern = Pattern.compile(regexbuf.toString());
replacementsTable = ht;
}

public String replaceAll(String input) {
StringBuffer sb = new StringBuffer(input.length());
Matcher m = wordsToReplacePattern.matcher(input);

int end = 0;
while (m.find()) {
String wordToReplace = m.group();
String replacement =
(String)replacementsTable.get(wordToReplace);
m.appendReplacement(sb, replacement);
end = m.end();
}
sb.append(input.substring(end));

return sb.toString();
}

public static void main(String args[]) {
String s =
"The quick brown fox jumped over the lazy dog's back.";

Hashtable ht = new Hashtable(10);
ht.put("quick", "slow");
ht.put("jump", "walk");
ht.put("lazy", "hard working");
ht.put("brown", "red");

MultipleSearch ms = new MultipleSearch(ht);
System.out.println(ms.replaceAll(s));
}
}


Jul 17 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: dpg | last post by:
How do site searches work? I want to create a MySQL database with a field called "keywords". Then a form with a search phrase input box. I can't figure how to get the results with multiple...
13
by: yaipa | last post by:
What would be the common sense way of finding a binary pattern in a ..bin file, say some 200 bytes, and replacing it with an updated pattern of the same length at the same offset? Also, the...
7
by: Drew | last post by:
I have a db table like the following, UID, int auto-increment RegNo Person Relation YearsKnown Now here is some sample data from this table,
1
by: Robert Oschler | last post by:
I read a while back that MySQL will only use one index per query. (If this is not so, please tell me and point me to a doc that gives a good explanation of MySQL's current index usage policy). ...
12
by: Adam J. Schaff | last post by:
I am writing a quick program to edit a binary file that contains file paths (amongst other things). If I look at the files in notepad, they look like: ...
5
by: Jason | last post by:
Is there a mechanism in VB.NET that allows something like: If myVar In ("A","B","C") Then... The way I'm doing it now is: Select Case myVar Case "A","B","C" Or like this:
7
by: pbd22 | last post by:
Hi. I am somewhat new to this and would like some advice. I want to search my xml file using "keyword" search and return results based on "proximity matching" - in other words, since the search...
7
by: aine_canby | last post by:
Hi, Im totally new to Python so please bare with me. Data is entered into my program using the folling code - str = raw_input(command) words = str.split() for word in words:
7
by: Qbert16 | last post by:
I'm having a lot of trouble with the following task for one of my assignments in python. Basically I need to take the user input and replace any words in the input and replace them with words from a...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.