I've learned a programming language in C/C++. I want to make a program which try to compare two arrays (random chars)and notify the user about the differrents. The program have to notify both the subtitution and insertion/deletion.
Here are the examples :
array 1 : xxxxxxxxx...(as standard)
array 2 : xxxyxxxxx...
the program will notify that,
"there are subtitution in 4th"
and
array 1 : xxxxxxx..
array 2 : xxyyxxxxx..
"there are insertion yy at 3rd"
I've designed the algorithm with conditional if, but they are still many bugs when i'm used on variant data (array2)
16 5319
What errors you are getting? What does the code look like. Please post the appropriate code snippet along with code tags.
Regards
Dheeraj Joshi
i don't know the error mechanism,
this algorithm results the wrong analyze output..
here is the code (in C#) but i want to porting it to C++ - /* here is the fragment of my program (abz)
-
the program kindly array comparer
-
-
*/
-
public void algoritma(string dt, string ds, ref string dm, int n, int m, int array2_counts, int array1_counts, string filename)
-
{
-
// algorithm attributtes
-
int different_counts = 0;
-
int deletion_counts = 0;
-
int insertion_counts = 0;
-
array2_counts = ds.Length;
-
dm += "array filename: " + filename + "\r\n" + "total char: " + array2_counts + " \r\n ";
-
ds += "XXXXXXXXXXXX";
-
dt += "XXXXXXXXXXXX";
-
array1_counts = dt.Length;
-
array2_counts = ds.Length;
-
this.progressBar1.Maximum = array1_counts;
-
if (this.Button1.Clicked)
-
{
-
#conditional region
-
try
-
{
-
while (n < array2_counts - 12 && m < array1_counts)
-
{
-
if (ds[n] != dt[m])
-
{
-
if (n + 5 < array2_counts && m + 6 < array1_counts)
-
{
-
if ((ds[n] == dt[m + 1]) && (ds[n + 1] == dt[m + 2]) && (ds[n + 2] == dt[m + 3]) && (ds[n + 3] == dt[m + 4]) && (ds[n + 4] == dt[m + 5]) && (ds[n + 5] == dt[m + 6]))
-
{
-
dm += "there are deletion at, " + dt[m + 1] + " " + (m + 1) + " D";
-
dm += "\r\n";
-
m++;
-
different_counts++;
-
deletion_counts++;
-
}
-
else if (n + 5 < array2_counts && m + 5 < array1_counts)
-
{
-
if ((ds[n + 1] == dt[m + 1]) && (ds[n + 2] == dt[m + 2]) && (ds[n + 3] == dt[m + 3]) && (ds[n + 4] == dt[m + 4]) && (ds[n + 5] == dt[m + 5]))
-
{
-
dm += "there are substitution at, " + dt[m] + " " + (m + 1) + " " + ds[n];
-
dm += "\r\n";
-
different_counts++;
-
}
-
else if (n + 5 < array2_counts && m + 5 < array1_counts)
-
{
-
if ((ds[n + 1] == dt[m + 1]) && (ds[n + 2] != dt[m + 2]) && (ds[n + 3] == dt[m + 3]) && (ds[n + 4] == dt[m + 4]) && (ds[n + 5] == dt[m + 5]))
-
{
-
dm += "there are substitution at, " + dt[m] + " " + (m + 1) + " " + ds[n];
-
dm += "\r\n";
-
different_counts++;
-
}
-
else if (n + 8 < array2_counts && m + 8 < array1_counts)
-
{
-
if ((ds[n + 2] == dt[m + 2]) && (ds[n + 3] == dt[m + 3]) && (ds[n + 4] == dt[m + 4]) && (ds[n + 5] == dt[m + 5]) && (ds[n + 6] == dt[m + 6]) && (ds[n + 7] != dt[m + 7]) && (ds[n + 8] == dt[m + 8]))
-
{
-
dm += "there are substitution at, " + dt[m + 1] + " " + (m + 1) + " " + ds[n];
-
dm += "\r\n";
-
-
dm += "ada substitusii di posisi, " + dt[m + 1] + " " + (m + 2) + " " + ds[n + 1];
-
dm += "\r\n";
-
m++;
-
n++;
-
different_counts = different_counts + 2;
-
}
-
else if (n + 4 < array2_counts && m + 6 < array1_counts)
-
{
-
if ((ds[n] == dt[m + 2]) && (ds[n + 1] == dt[m + 3]) && (ds[n + 2] == dt[m + 4]) && (ds[n + 3] == dt[m + 5]) && (ds[n + 4] == dt[m + 6]))
-
{
-
dm += "there are deletion at, " + dt[m + 1] + " " + (m + 1) + "D";
-
dm += "\r\n";
-
dm += "there are deletion at, " + dt[m + 2] + " " + (m + 2) + "D";
-
dm += "\r\n";
-
m = m + 2;
-
different_counts = different_counts + 2;
-
deletion_counts = deletion_counts + 2;
-
}
-
else if (n + 4 < array2_counts && m + 13 < array1_counts)
-
{
-
if ((ds[n] == dt[m + 9]) && (ds[n + 1] == dt[m + 10]) && (ds[n + 2] == dt[m + 11]) && (ds[n + 3] == dt[m + 12]) && (ds[n + 4] == dt[m + 13]))
-
{
-
dm += "there are deletion at, " + dt[m + 1] + " " + (m + 1) + " D" + "\r\n";
-
dm += "there are deletion at, " + dt[m + 2] + " " + (m + 2) + " D" + "\r\n";
-
dm += "there are deletion at, " + dt[m + 3] + " " + (m + 3) + " D" + "\r\n";
-
dm += "there are deletion at, " + dt[m + 4] + " " + (m + 4) + " D" + "\r\n";
-
dm += "there are deletion at, " + dt[m + 5] + " " + (m + 5) + " D" + "\r\n";
-
dm += "there are deletion at, " + dt[m + 6] + " " + (m + 6) + " D" + "\r\n";
-
dm += "there are deletion at, " + dt[m + 7] + " " + (m + 7) + " D" + "\r\n";
-
dm += "there are deletion at, " + dt[m + 8] + " " + (m + 8) + " D" + "\r\n";
-
dm += "there are deletion at, " + dt[m + 9] + " " + (m + 9) + " D" + "\r\n";
-
m = m + 9;
-
different_counts = different_counts + 9;
-
deletion_counts = deletion_counts + 9;
-
}
-
else if (n + 5 < array2_counts && m + 4 < array1_counts)
-
{
-
if ((ds[n + 1] == dt[m]) && (ds[n + 2] == dt[m + 1]) && (ds[n + 3] == dt[m + 2]) && (ds[n + 4] == dt[m + 3]) && (ds[n + 5] == dt[m + 4]))
-
{
-
dm += "there are insertion at, " + "I " + m + ".1" + " " + ds[n + 1] + "\r\n";
-
m = m - 1;
-
different_counts++;
-
insertion_counts++;
-
}
-
else if (n + 7 < array2_counts && m + 5 < array1_counts)
-
{
-
if ((ds[n + 2] == dt[m]) && (ds[n + 3] == dt[m + 1]) && (ds[n + 4] == dt[m + 2]) && (ds[n + 5] == dt[m + 3]) && (ds[n + 6] == dt[m + 4]) && (ds[n + 7] == dt[m + 5]))
-
{
-
dm += "there are insertion at, " + "I " + m + ".2" + " " + ds[n + 2] + "\r\n";
-
m = m - 1;
-
n++;
-
-
different_counts = different_counts + 2;
-
insertion_counts = insertion_counts + 2;
-
}
-
else if (n + 8 < array2_counts && m + 4 < array1_counts)
-
{
-
if ((ds[n + 4] == dt[m]) && (ds[n + 5] == dt[m + 1]) && (ds[n + 6] == dt[m + 2]) && (ds[n + 7] == dt[m + 3]) && (ds[n + 8] == dt[m + 4]))
-
{
-
dm += "there are insertion at, " + "I " + m + ".4" + " " + ds[n + 4] + "\r\n";
-
m = m - 1;
-
n = n + 3;
-
different_counts = different_counts + 4;
-
insertion_counts = insertion_counts + 4;
-
}
-
else if (n + 13 < array2_counts && m + 7 < array1_counts)
-
{
-
if ((ds[n + 6] == dt[m]) && (ds[n + 7] == dt[m + 1]) && (ds[n + 8] == dt[m + 2]) && (ds[n + 9] == dt[m + 3]) && (ds[n + 10] == dt[m + 4]) && (ds[n + 11] == dt[m + 5]) && (ds[n + 12] == dt[m + 6]) && (ds[n + 13] == dt[m + 7]))
-
{
-
dm += "there are insertion at, " + "I " + m + ".6" + " " + ds[n + 6] + "\r\n";
-
m = m - 1;
-
n = n + 5;
-
different_counts = different_counts + 6;
-
insertion_counts = insertion_counts + 6;
-
}
-
else if (n + 13 < array2_counts && m + 7 < array1_counts)
-
{
-
if ((ds[n + 6] == dt[m]) && (ds[n + 7] == dt[m + 1]) && (ds[n + 8] == dt[m + 2]) && (ds[n + 9] == dt[m + 3]) && (ds[n + 10] == dt[m + 4]) && (ds[n + 11] == dt[m + 5]) && (ds[n + 12] == dt[m + 6]) && (ds[n + 13] == dt[m + 7]))
-
{
-
dm += "there are insertion at, " + "I " + m + ".6" + " " + ds[n + 6] + "\r\n";
-
m = m - 1;
-
n = n + 5;
-
different_counts = different_counts + 6;
-
insertion_counts = insertion_counts + 6;
-
}
-
else if (n + 2 < array2_counts && m + 1 < array1_counts)
-
{
-
if ((ds[n + 1] == dt[m]) && (ds[n + 2] == dt[m + 1]))
-
{
-
dm += "there are insertion at, " + "I" + m + ".1" + " " + ds[n + 1] + "\r\n";
-
m = m - 1;
-
different_counts++;
-
insertion_counts++;
-
}
-
else if (n + 1 < array2_counts && m < array1_counts)
-
{
-
if ((ds[n + 1] == dt[m]))
-
{
-
dm += "there are insertion at, " + "I " + m + ".1" + " " + ds[n + 1] + "\r\n";
-
m = m - 1;
-
different_counts++;
-
insertion_counts++;
-
}
-
else if (n + 1 < array2_counts && m + 1 < array1_counts)
-
{
-
if (ds[n + 1] == dt[m + 1])
-
{
-
-
dm += "ada substitusi diposisi, " + dt[m] + " " + (m + 1) + " " + ds[n] + "\r\n";
-
different_counts++;
-
}
-
else
-
{
-
dm += "there are substitution at, " + dt[m] + " " + (m + 1) + " " + ds[n] + "\r\n";
-
different_counts++;
-
}
-
}
-
}
-
}
-
}
-
}
-
}
-
}
-
}
-
}
-
}
-
}
-
}
-
}
-
}
-
}
-
n++;
-
m++;
-
if you know the total index of two arrays, you can compare the each and every index and print the output, which one is getting differed until end of the index.
Find out the size of 2 arrays.
If sizes are not same no need to do further calculation. You can exit.
Is sizes are same, traverse through each element of array and compare them. If character at position "n" are not same exit. You can also count the position of difference(which you can calculate using loop index).
Regards
Dheeraj Joshi
yupz, my algorithm do..
it compare the array length to find out the insertion/deletion
after the array length are the same, continue to substitution compare..
but my algorithm have some error on some data..
can somebody suggest the effective algorithm?? please
thank you.. :)
Effective in time and space?
Some algorithm experts may help you.
If the input set is small, almost all algorithms works ok. But as the input set grows the complexity changes.
Regards
Dheeraj Joshi
By any chance do you have the code in pseudocode? You've haven't really commented your code so it's hard to see what's going on.
Also, let's say you have
bnanan
banana
Is this considered an insertion of an a, then a deletion of a n,
OR a deletion of an n, then an insertion of an a?
@dheerajjoshim
hmm, my algo is just conditional if, which prior to insertion/deletion then substitution..
@jkmyoung
owh i cannot commented the code well, my english was bad, so i can't descripting it fine :(
at your example, it will insertion of an a, then a deletion of a n
because the array 1 is a standard which to be fixed (may not changed or reassembly)
thx a lot everybody :)
Ok, so you prefer insertion over deletion.
Where are you setting n and m? If not set they probably have the old values in them. Actually looking at the current algorithm, it is painful.
offhand pseudo algorithm: bound checking excluded. - Comparing ds to dt
-
m, n set to 0
-
mmax = ds.length - 1
-
nmax = dt.length - 1
-
while (m <= mmax and n <= nmax)
-
if ds[m] = dt[n], m++, n++, continue
-
//did not match.
-
invalidChars = 1
-
// try pure insertion first
-
for (insert = invalidChars; insert >= 0; insert --){
-
if dt[n+insert] = ds[m+invalidChars - insert]{
-
***print off list of inserted and invalid characters
-
n += 1 + insert
-
m += 1 + invalidChars - insert
-
break;
-
}
-
}
-
To explain this: for (insert = invalidChars; insert >= 0; insert --){
Lets say you have
abcde
fcghi
We list the comparisons in order below:
compare a -> f, no
invalidChars = 1, insert = 1
a -> c? no, insert = 0
b -> f? no.
invalidChars = 2, insert = 2
a -> g? no. insert = 1
b-> c? no. insert = 0
c -> f? no.
invalidChars = 3, insert = 3
a -> h? no. insert = 2
b-> g? no. insert = 1
c -> c? MATCH!
So,
a deleted
b deleted
f inserted.
===
This biggest problem with this is that it is iterative. Eg if you have
abcbde
acbde
the algorithm will match the wrong b's and say:
c inserted at 2
c deleted at 3
b deleted at 4.
As opposed to the simple:
b deleted at 2
If you want to avoid that, you need to implement saving states and backtracking with some sort of recursion.
my mean is not prefer insertion over deletion..
at your example, if i prefer deletion over insertion, the array1 lenght will changed :)
the algo have not changed the array1 but array2, cause array1 act as standard, so the array2 have to matched with array1
at my algo, i make the 'fragmen' to compare
cause the real array can up to thousands,
this algo will create a fragments with 13char (array) for each
to make possible char matching process go to the end
here are the links, my algo with a little comment http://pastebin.com/chvTQq0w
sorry cause it too ugly to post here
my english was bad too :(
@jkmyoung
no not like this dude..
insertion is the insertion data to array1 so the array1 will changed
eg:
array1 : abcde < 5 char
array2 : abc fde < 6 char
for deletion:
array1 : abcde < 5 char
array2 : abde < 4 char, d deleted
so substitution will executed after the lenght of both array are same..
here is a insertion and subt case:
array1 : abcdef < 6 char
array2 : abc ghef < 7 char
at array2 : g will deleted to reduce the array length
array1 : abcdef
array2 : abc hef
the insertion/deletion have higher prior than subs, so after lenght data matched the subs comparison will executed..
thx a lot :D
I know what you mean, but what I mean is coding up the algorithm smartly will not be so simple: Take the core of this example again:
bcb
cb
To the human eye, we can easily see that we just need to delete b, to get the string. To the computer's eye, it's not that easy. It only looks at what we tell it, so it might match the wrong b's first. If we tell it to look for deletion first, we get the reverse problem. eg:
cb
bcb
Again the wrong b's are matched up.
===
To do this smartly, you need to save the results of multiple methods, recursively.
public void algorithma(string dt, string ds, ref string dm, int n, int m, int array2_counts, int array1_counts, string filename)
Is this the only method declaration you are allowed to use?
The algorithm would be more like: -
Algorithm(..arguments...){
-
while matching, increment m and n
-
once we get to our first mismatch, break;
-
-
If we reached the end of one of the strings, finish results and return them.
-
-
Otherwise:
-
Recursively find all possible solutions from here on.
-
Pick the best solution.
-
return the best solution
-
}
-
In order to not go through every case, have a variable, which holds the best results so far, passed in to the function. Once we reach that number, we end output and return null.
Eg a saved state might look something like -
class State
-
{
-
String mismatchOutputString;
-
int errors;
-
-
public State() {
-
errors = 0;
-
mismatchOutputString = "";
-
}
-
-
public void Insert(int i, char c)
-
{
-
mismatchOutputString += "inserted at " + i + ":" + c + "\n";
-
errors++;
-
}
-
}
Then when you recurse, you check the number of errors. If worse than your best case so far, throw it away.
thx a lot to the responses
i've found the algorithm!
what i need is just "levensthein distance"
:D
now i'm trying on it..
hope i can :p
interesting! Note, that the algorithm uses replacement as a single operation, but you can easily modify it.
owh my...
i been worked at levenshtein algo in weeks..
and i got nothing! T.T
yesterday i've found what i'm exactly want! Needleman–Wunsch algorithm
i been trying to see the naligner source code too, but i don't understand, how stupid i am T.T
anyone can help me to write this algo in c#, so i can insert the conditional if when insertion/deletion/subs happened,...please, thx u
That algorithm is much more complicated. Even if you have a match, the best thing to do might not be to match them!
At each stage, you calculate 3 possible values:
a. use characters from both strings
b. use 1 char from string 1. (gap penalty)
c. use 1 char from string 2. (gap penalty)
Unfortunately, because of the nature of the algorithm, the truncation detection is much more difficult.
A bad approximation for maximum possible value of 2 remaining strings is
Take the maximum value in that array. max1Value
Assuming length(m) <= length (n), the max value is
max1Value * length(m) - gapPenalty * (length (n - m))
(if n is short, just swap m and n)
After calculating the first possible value, you use that as a best value so far, and truncate branches if they don't fit.
If you want more help, please show how you're laying out this matrix.
Sign in to post your reply or Sign up for a free account.
Similar topics
by: gsv2com |
last post by:
One of my weaknesses has always been pattern matching. Something I
definitely need to study up on and maybe you guys can give me a pointer
here.
I'm looking to remove all of this code and just...
|
by: Thomas Reichelt |
last post by:
Moin,
short question: is there any language combining the syntax, flexibility and
great programming experience of Python with static typing? Is there a
project to add static typing to Python?
...
|
by: Xah Lee |
last post by:
# -*- coding: utf-8 -*-
# Python
# Matching string patterns
#
# Sometimes you want to know if a string is of
# particular pattern. Let's say in your website
# you have converted all images...
|
by: Henry |
last post by:
I have a table that stores a list of zip codes using a varchar column type,
and I need to perform some string prefix pattern matching search. Let's say
that I have the columns:
94000-1235
94001...
|
by: bpontius |
last post by:
The GES Algorithm
A Surprisingly Simple Algorithm for Parallel Pattern Matching
"Partially because the best algorithms presented in the literature
are difficult to understand and to implement,...
|
by: olaufr |
last post by:
Hi,
I'd need to perform simple pattern matching within a string using a
list of possible patterns. For example, I want to know if the substring
starting at position n matches any of the string I...
|
by: Jim Lewis |
last post by:
Anyone have experience with string pattern matching?
I need a fast way to match variables to strings. Example:
string - variables
============
abcaaab - xyz
abca - xy
eeabcac - vxw
x...
|
by: Ole Nielsby |
last post by:
First, bear with my xpost. This goes to
comp.lang.c++
comp.lang.functional
with follow-up to comp.lang.c++
- I want to discuss an aspect of using C++ to implement a
functional language, and...
|
by: VanKha |
last post by:
I write this program for pattern-matching,but it gives wrong result:
#include<iostream>
#include<conio.h>
#include<string.h>
using namespace std;
main()
{
char text,pat;...
|
by: pramodkh |
last post by:
Hi All
I am trying to match a pattern in a file and insert a line. If the pattern matches then insert a line before the matching pattern line.
for example,
I have the following content in a...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
|
by: ryjfgjl |
last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
| |