468,293 Members | 1,853 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,293 developers. It's quick & easy.

Sequence of characters problem

I have a sequence of characters. The characters are ordered in chronological order. I am looking for an algorithm to group the sequence and remove errors in the data. I am having an hard time to explain in words/maths what the requirement of fixing the sequence is but something like "the outcome is to group as many characters as possible in a chain of constant letters" and minimum sequence length should be a setting. Below I am trying to visualize by some examples what I mean (3 chars minimum):

AAACCAACA => AAAAAAAAA

ACACAAAAA => AAAAAAAAA

AYYYYYYYA => YYYYYYYYY

YYYYAAYYY => YYYYYYYYY

AYAYAYYYY => YYYYYYYYY

And for longer sequences it becomes a little more tricky

AYAYAYAYAAAAAAAAAAYAYYYYYYYYYYYYYY => AAAAAAAAAAAAAAAAAAYYYYYYYYYYYYYYYY

HTHTTHHHTTHHH => TTTTTHHHHHHHH

TTOAOAOOAATTA => OOOOOOOOAAAAA

This is bit tricky because even thought the 3 O's aren't directly next to each other, they are still the most probable uninterrupted sequence of characters.

Does anyone know of any algorithm (Machine learning?) or similar (term of this problem, problem name) that can do this type of error-code correction?

Thank you in advance
Feb 11 '20 #1
1 1895
Rabbit
12,512 Expert Mod 8TB
If you want to write a rules based engine, then you'll need to come up with more precise rules. For example, you will need to define why AYAYAYYYY maps to YYYYYYYYY and not AAAAAYYYY.

If you want to write a fuzzy matching algorithm, then there's plenty to choose from, n-gram matching and levenshtein distance comes to mind.

If you want to use machine learning, as in a neural network, then you will need tens of thousands, if not more, of manually verified examples to feed the network for training.

Or, don't fix the data, instead, fix the source of the error. Go back to the system before you receive the data and fix the issue that is sending you incorrect data.
Feb 12 '20 #2

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

12 posts views Thread by Mosher | last post: by
3 posts views Thread by Dave | last post: by
5 posts views Thread by chandanlinster | last post: by
4 posts views Thread by =?Utf-8?B?TGVvbg==?= | last post: by
reply views Thread by Teichintx | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.