473,699 Members | 2,734 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Get regular expression

I have a regular expression (^(.+)(?=\s*).* \1 ) that results in
matches. I would like to get what the actual regular expression is.
In other words, when I apply ^(.+)(?=\s*).*\ 1 to " HEART (CONDUCTION
DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How
do I gain access to the expression (not the matches) at runtime?

Thanks,

Mike

Jun 20 '06 #1
25 5151
Mike wrote:
I have a regular expression (^(.+)(?=\s*).* \1 ) that results in
matches. I would like to get what the actual regular expression is.
In other words, when I apply ^(.+)(?=\s*).*\ 1 to " HEART (CONDUCTION
DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How
do I gain access to the expression (not the matches) at runtime?


you want to access the expression "HEART (CONDUCTION DEFECT)" or the
regex "^(.+)(?=\s*).* \1" at run-time?? dont think you can get exactly
the latter one though, for the previous one, you can use named capture,
like

^(?<expr>.+)(?= \s*).*\k<expr>

and access the variable "expr" at run time?

Xicheng

Jun 20 '06 #2
Xicheng Jia wrote:
Mike wrote:
I have a regular expression (^(.+)(?=\s*).* \1 ) that results in
matches. I would like to get what the actual regular expression is.
In other words, when I apply ^(.+)(?=\s*).*\ 1 to " HEART (CONDUCTION
DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How
do I gain access to the expression (not the matches) at runtime?


you want to access the expression "HEART (CONDUCTION DEFECT)" or the
regex "^(.+)(?=\s*).* \1" at run-time?? dont think you can get exactly
the latter one though, for the previous one, you can use named capture,
like

^(?<expr>.+)(?= \s*).*\k<expr>

and access the variable "expr" at run time?

Xicheng


I want to access the expression "HEART (CONDUCTION DEFECT)" I'll try
your suggestion first off in the morning.

Thanks,

Jun 20 '06 #3
Mike wrote:
Xicheng Jia wrote:
Mike wrote:
I have a regular expression (^(.+)(?=\s*).* \1 ) that results in
matches. I would like to get what the actual regular expression is.
In other words, when I apply ^(.+)(?=\s*).*\ 1 to " HEART (CONDUCTION
DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How
do I gain access to the expression (not the matches) at runtime?


you want to access the expression "HEART (CONDUCTION DEFECT)" or the
regex "^(.+)(?=\s*).* \1" at run-time?? dont think you can get exactly
the latter one though, for the previous one, you can use named capture,
like

^(?<expr>.+)(?= \s*).*\k<expr>

and access the variable "expr" at run time?

Xicheng


I want to access the expression "HEART (CONDUCTION DEFECT)" I'll try
your suggestion first off in the morning.


err, my bad. you need the match object's Groups property, like
Group("expr") or Group(1) to access the captured values..(Groups[...]
for C#)

Xicheng

Jun 20 '06 #4
> I want to access the expression "HEART (CONDUCTION DEFECT)" I'll try
your suggestion first off in the morning.
First, "HEART (CONDUCTION DEFECT)" is not an expression. That is a
substring of the original string. The regular expression is the string
"^(.+)(?=\s*).* \1" that you are using to get your match. Assuming that
"HEART (CONDUCTION DEFECT)" is your match (which it is not), you could call
it a match for the regular expression (which may match more than once in a
string). But it is a substring of the original string. It may seem picky,
but in order to communicate effectively, one must use the right terms. As an
example, if I told you that I ate a car for breakfast, would you know that I
ate an apple?

Second, the string you posted contains 2 instances of the substring "HEART
(CONDUCTION DEFECT)". Do you want to get both of them? If so, what exactly
are your pattern-matching rules? A regular expression matches a pattern.
Obviously, not all of the strings you will be working with will be:

" HEART (CONDUCTION
DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
CATHETER 37.34/2 "

In fact, probably due to this being a newsgroup, and my using a newsreader,
I would doubt that the line breaks in the string are where they are, if they
are. And I have to wonder whether the string actually begins and ends with a
space.

In other words, you're going to be using a regular expression to isolate
substrings of various strings (most probably). A regular expression is
shorthand for a set of rules that defines a pattern you're looking for.
Whether the strings contain line breaks, for example, is important. Your
regular expression begins with the caret '^' character. This character can
indicate the beginning of a string, or the beginning of a line *or* a
string, depending upon what options you use. You didn't specify the
option(s) you're using, so we have no way to know.

In addition, your pattern is not likely to work in the way you expect. for
example, the following would match:

THIS IS NOT WHAT THIS IS SUPPOSED TO BE. (Matches the phrase "THIS IS ")

And in addition, if there are line breaks, like your example (as split by
the newsreader), the matching substring would be:

DEFECT) 37.33/2 HEART (CONDUCTION DEFECT)

So, can you explain what your rules are, and what you are trying to match
here? I'm just guessing that you're parsing medical transcriptions, but
beyond that, I'm stumped.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

I recycle.
I send everything back to the planet it came from.
"Mike" <ms********@cha rter.net> wrote in message
news:11******** **************@ p79g2000cwp.goo glegroups.com.. . Xicheng Jia wrote:
Mike wrote:
> I have a regular expression (^(.+)(?=\s*).* \1 ) that results in
> matches. I would like to get what the actual regular expression is.
> In other words, when I apply ^(.+)(?=\s*).*\ 1 to " HEART (CONDUCTION
> DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
> CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How
> do I gain access to the expression (not the matches) at runtime?


you want to access the expression "HEART (CONDUCTION DEFECT)" or the
regex "^(.+)(?=\s*).* \1" at run-time?? dont think you can get exactly
the latter one though, for the previous one, you can use named capture,
like

^(?<expr>.+)(?= \s*).*\k<expr>

and access the variable "expr" at run time?

Xicheng


I want to access the expression "HEART (CONDUCTION DEFECT)" I'll try
your suggestion first off in the morning.

Thanks,

Jun 21 '06 #5
Must say I get burned in six different ways. Some groups I top post
and get scolded. On other groups others people top post and nobody
appears to have a problem. I'll top post here.

Given I've been asked for details I'll provide them, but typically
nobody wants to wade through them.

In the dark ages I had 24,000 lines of ICD9 index entries which got
appended with ICD9 codes and were processed one time per year into a
big paper report with a tree-like structure by an assembler program on
an OS390. An abbreviated example of the report is below for the
Ablation entry.

Ablation
Endometrial (Hysteroscopic) 68.23
Heart (Conduction Defect) 27.33/2
With Catheter 37.34/2
Inner Ear (Cryosurgery) (Ultrasound) 20.79/4
By Injection 20.72
Lesion Heart
By Peripherally Inserted Catheter 37.34

Across my institution in the past there have been multiple "master"
copies of ICD9 codes and index entries. The order came down that
long-term we will work towards a single copy of ICD9 codes with index
entries that will be accessed via webservices. The structure of the
data in our old database was as follows (no line breaks -- each entry
was one line):

ABLATION ENDOMETRIAL (HYSTEROSCOPIC) 68.23
ABLATION HEART (CONDUCTION DEFECT) 37.33/2
ABLATION HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2
ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) 20.79/4
ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) BY INJECTION 20.72
ABLATION LESION HEART BY PERIPHERALLY INSERTED CATHETER 37.34
ABLATION LESION HEART ENDOVASCULAR APPROACH 37.34
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) ENDOVASCULAR
APPROACH 37.34
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) OPEN (TRANS-THORACIC)
APPROACH 37.33
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) TRANS-THORACIC
APPROACH 37.33
ABLATION PITUITARY 7.69
ABLATION PITUITARY BY COBALT-60 92.32
ABLATION PITUITARY BY IMPLANTATION (STRONTIUM-YTTRIUM) (Y) NEC 92.39
ABLATION PITUITARY BY PROTON BEAM (BRAGG PEAK) 92.33
ABLATION PROSTATE (ANAT = 59.02) BY LASER, TRANSURETHRAL 60.21
ABLATION PROSTATE (ANAT = 59.02) BY RADIOFREQUENCY THERMOTHERAPY
60.97
ABLATION PROSTATE (ANAT = 59.02) BY TRANSURETHRAL NEEDLE ABLATION
(TUNA) 60.97
ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY CRYOABLATION 60.62
ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY RADICAL CRYOSURGICAL
ABLATION (RCSA) 60.62
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL BY LASER 60.21
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL CRYOABLATION 60.29
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL RADICAL CRYOSURGICAL
ABLATION (RCSA) 60.29
ABLATION TISSUE HEART - SEE ABLATION, LESION, HEART 0
ABLATION VESICLE NECK (ANAT = 60.02) 57.91
The new webservices still have this same index structure except now,
for example, "Ablation Vesicle Neck (ANAT = 60.02)" is just a property
of code 57.91. The surgical coders still want to view the index
entries in a tree structure on demand. Without getting into
mind-numbing details, I can jump through some hoops and get back a set
of index entries that look like above for ABLATION but they are not
formatted in the way the surgical coders desire. I believe I have a
recursive algorithm that will work to format these into a tree
structure but this algorithm is predicated on being able to find the
nodes.

If you look carefully, the root node for entire set of index entries
above is "ABLATION" (as that is what begins each entry and repeats
across all of them). Subsequently, Endometrial (Hysteroscopic) + code
is a child of ABLATION with no children of its own because it is not
repeated. Next, Heart (Conduction Defect) + code is a node with "With
Catheter + code" as a child of that node because "Heart (Conduction
Defect)" repeats across both those lines.

I have begged the group that now owns the webservice to allow me to
restructure the data but no go (they say that would be bastardizing the
concept of everything being 'code-centric'). I am stuck with this and
also with the demand by the coders that they get the formatted tree
structure to look at when they code.

In general, I think if I do the following I can figure out the nodes
and children:

1. Read index entries until the first word changes.
2. Get the substring that begins the string and is repeated elsewhere
in the string (this is the node).
3. Remove that node and keep processing until the base case is hit etc.

If anyone has any better ideas of how to deal with this I would be
thrilled to no end to hear them.
Thanks,

Mike
Kevin Spencer wrote:
I want to access the expression "HEART (CONDUCTION DEFECT)" I'll try
your suggestion first off in the morning.


First, "HEART (CONDUCTION DEFECT)" is not an expression. That is a
substring of the original string. The regular expression is the string
"^(.+)(?=\s*).* \1" that you are using to get your match. Assuming that
"HEART (CONDUCTION DEFECT)" is your match (which it is not), you could call
it a match for the regular expression (which may match more than once in a
string). But it is a substring of the original string. It may seem picky,
but in order to communicate effectively, one must use the right terms. As an
example, if I told you that I ate a car for breakfast, would you know that I
ate an apple?

Second, the string you posted contains 2 instances of the substring "HEART
(CONDUCTION DEFECT)". Do you want to get both of them? If so, what exactly
are your pattern-matching rules? A regular expression matches a pattern.
Obviously, not all of the strings you will be working with will be:

" HEART (CONDUCTION
DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
CATHETER 37.34/2 "

In fact, probably due to this being a newsgroup, and my using a newsreader,
I would doubt that the line breaks in the string are where they are, if they
are. And I have to wonder whether the string actually begins and ends with a
space.

In other words, you're going to be using a regular expression to isolate
substrings of various strings (most probably). A regular expression is
shorthand for a set of rules that defines a pattern you're looking for.
Whether the strings contain line breaks, for example, is important. Your
regular expression begins with the caret '^' character. This character can
indicate the beginning of a string, or the beginning of a line *or* a
string, depending upon what options you use. You didn't specify the
option(s) you're using, so we have no way to know.

In addition, your pattern is not likely to work in the way you expect. for
example, the following would match:

THIS IS NOT WHAT THIS IS SUPPOSED TO BE. (Matches the phrase "THIS IS ")

And in addition, if there are line breaks, like your example (as split by
the newsreader), the matching substring would be:

DEFECT) 37.33/2 HEART (CONDUCTION DEFECT)

So, can you explain what your rules are, and what you are trying to match
here? I'm just guessing that you're parsing medical transcriptions, but
beyond that, I'm stumped.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

I recycle.
I send everything back to the planet it came from.
"Mike" <ms********@cha rter.net> wrote in message
news:11******** **************@ p79g2000cwp.goo glegroups.com.. .
Xicheng Jia wrote:
Mike wrote:
> I have a regular expression (^(.+)(?=\s*).* \1 ) that results in
> matches. I would like to get what the actual regular expression is.
> In other words, when I apply ^(.+)(?=\s*).*\ 1 to " HEART (CONDUCTION
> DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
> CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)". How
> do I gain access to the expression (not the matches) at runtime?

you want to access the expression "HEART (CONDUCTION DEFECT)" or the
regex "^(.+)(?=\s*).* \1" at run-time?? dont think you can get exactly
the latter one though, for the previous one, you can use named capture,
like

^(?<expr>.+)(?= \s*).*\k<expr>

and access the variable "expr" at run time?

Xicheng


I want to access the expression "HEART (CONDUCTION DEFECT)" I'll try
your suggestion first off in the morning.

Thanks,


Jun 21 '06 #6
Hi Mike,

As far as Top-Posting is concerned, AFAIK it's still a matter of debate, and
as we're talking about Netiquette, not ISO or W3C standards, my personal
feeling is that anyone who scolds one about top- or bottom-posting has poor
sense of priority. After all, the purpose of groups such as this is
communication. I find it far more difficult to deal with poor communication
than with the format of a post, but that's just me! ;-)

In your case, you have done a pretty darned good job of communication, and I
appreciate that, so I will certainly do all I can to help out! I did have to
do a little research into ICD9, but that wasn't hard with Google.

It took me a few minutes of study to figure out (for the most part) what
your requirements are. Let me see if I can repeat them back to you in my own
words, and ask a couple of questions:

1. You have a set of data that is pure text, and is either stored in an
actual database, or in the text equivalent of a database as a multi-line
text document. I can't be exactly sure.
2. In any case, this data consists multiple single-line entries of text.
3. The data is stored in such a way that the text represents a hierarchical
structure of nodes.
4. This is achieved by a top-level classification that is repeated in each
"record" (line) for every record that falls under it.
5. Sub-nodes are indicated in the same way by the first text that follows
the top-level node text.
6. The node identifier text in the sub-nodes can be identified by comparing
it with other records that are under the top-level node. There is no other
way to distinguish this text from any other text in the record, other than
by comparing it with other records.
7. Therefore, the structure of the hierarchy can be inferred by using a
recursive procedure that identifies increasingly "deep" sub-nodes within the
set of records.
8. (Now here's where I'm a bit fuzzy). Your task is to put all of this into
some form of data structure that can be used as an index, probably a
hierarchical structure such as a tree.

Question: Will these records be ordered in any way? IOW, for example, will
they be ordered alphabetically? If they are ordered alphabetically, the
structure is already present, by virtue of the rules as stated above.
Otherwise, it will be necessary to do some form of re-scanning of the data.

Question: Can you tell me what sort of format the end result is supposed to
be in? Is it simply a data structure in memory? Or what?

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

I recycle.
I send everything back to the planet it came from.


"Mike" <ms********@cha rter.net> wrote in message
news:11******** *************@u 72g2000cwu.goog legroups.com...
Must say I get burned in six different ways. Some groups I top post
and get scolded. On other groups others people top post and nobody
appears to have a problem. I'll top post here.

Given I've been asked for details I'll provide them, but typically
nobody wants to wade through them.

In the dark ages I had 24,000 lines of ICD9 index entries which got
appended with ICD9 codes and were processed one time per year into a
big paper report with a tree-like structure by an assembler program on
an OS390. An abbreviated example of the report is below for the
Ablation entry.

Ablation
Endometrial (Hysteroscopic) 68.23
Heart (Conduction Defect) 27.33/2
With Catheter 37.34/2
Inner Ear (Cryosurgery) (Ultrasound) 20.79/4
By Injection 20.72
Lesion Heart
By Peripherally Inserted Catheter 37.34

Across my institution in the past there have been multiple "master"
copies of ICD9 codes and index entries. The order came down that
long-term we will work towards a single copy of ICD9 codes with index
entries that will be accessed via webservices. The structure of the
data in our old database was as follows (no line breaks -- each entry
was one line):

ABLATION ENDOMETRIAL (HYSTEROSCOPIC) 68.23
ABLATION HEART (CONDUCTION DEFECT) 37.33/2
ABLATION HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2
ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) 20.79/4
ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) BY INJECTION 20.72
ABLATION LESION HEART BY PERIPHERALLY INSERTED CATHETER 37.34
ABLATION LESION HEART ENDOVASCULAR APPROACH 37.34
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) ENDOVASCULAR
APPROACH 37.34
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) OPEN (TRANS-THORACIC)
APPROACH 37.33
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) TRANS-THORACIC
APPROACH 37.33
ABLATION PITUITARY 7.69
ABLATION PITUITARY BY COBALT-60 92.32
ABLATION PITUITARY BY IMPLANTATION (STRONTIUM-YTTRIUM) (Y) NEC 92.39
ABLATION PITUITARY BY PROTON BEAM (BRAGG PEAK) 92.33
ABLATION PROSTATE (ANAT = 59.02) BY LASER, TRANSURETHRAL 60.21
ABLATION PROSTATE (ANAT = 59.02) BY RADIOFREQUENCY THERMOTHERAPY
60.97
ABLATION PROSTATE (ANAT = 59.02) BY TRANSURETHRAL NEEDLE ABLATION
(TUNA) 60.97
ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY CRYOABLATION 60.62
ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY RADICAL CRYOSURGICAL
ABLATION (RCSA) 60.62
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL BY LASER 60.21
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL CRYOABLATION 60.29
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL RADICAL CRYOSURGICAL
ABLATION (RCSA) 60.29
ABLATION TISSUE HEART - SEE ABLATION, LESION, HEART 0
ABLATION VESICLE NECK (ANAT = 60.02) 57.91
The new webservices still have this same index structure except now,
for example, "Ablation Vesicle Neck (ANAT = 60.02)" is just a property
of code 57.91. The surgical coders still want to view the index
entries in a tree structure on demand. Without getting into
mind-numbing details, I can jump through some hoops and get back a set
of index entries that look like above for ABLATION but they are not
formatted in the way the surgical coders desire. I believe I have a
recursive algorithm that will work to format these into a tree
structure but this algorithm is predicated on being able to find the
nodes.

If you look carefully, the root node for entire set of index entries
above is "ABLATION" (as that is what begins each entry and repeats
across all of them). Subsequently, Endometrial (Hysteroscopic) + code
is a child of ABLATION with no children of its own because it is not
repeated. Next, Heart (Conduction Defect) + code is a node with "With
Catheter + code" as a child of that node because "Heart (Conduction
Defect)" repeats across both those lines.

I have begged the group that now owns the webservice to allow me to
restructure the data but no go (they say that would be bastardizing the
concept of everything being 'code-centric'). I am stuck with this and
also with the demand by the coders that they get the formatted tree
structure to look at when they code.

In general, I think if I do the following I can figure out the nodes
and children:

1. Read index entries until the first word changes.
2. Get the substring that begins the string and is repeated elsewhere
in the string (this is the node).
3. Remove that node and keep processing until the base case is hit etc.

If anyone has any better ideas of how to deal with this I would be
thrilled to no end to hear them.
Thanks,

Mike
Kevin Spencer wrote:
> I want to access the expression "HEART (CONDUCTION DEFECT)" I'll try
> your suggestion first off in the morning.


First, "HEART (CONDUCTION DEFECT)" is not an expression. That is a
substring of the original string. The regular expression is the string
"^(.+)(?=\s*).* \1" that you are using to get your match. Assuming that
"HEART (CONDUCTION DEFECT)" is your match (which it is not), you could
call
it a match for the regular expression (which may match more than once in
a
string). But it is a substring of the original string. It may seem picky,
but in order to communicate effectively, one must use the right terms. As
an
example, if I told you that I ate a car for breakfast, would you know
that I
ate an apple?

Second, the string you posted contains 2 instances of the substring
"HEART
(CONDUCTION DEFECT)". Do you want to get both of them? If so, what
exactly
are your pattern-matching rules? A regular expression matches a pattern.
Obviously, not all of the strings you will be working with will be:

" HEART (CONDUCTION
DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
CATHETER 37.34/2 "

In fact, probably due to this being a newsgroup, and my using a
newsreader,
I would doubt that the line breaks in the string are where they are, if
they
are. And I have to wonder whether the string actually begins and ends
with a
space.

In other words, you're going to be using a regular expression to isolate
substrings of various strings (most probably). A regular expression is
shorthand for a set of rules that defines a pattern you're looking for.
Whether the strings contain line breaks, for example, is important. Your
regular expression begins with the caret '^' character. This character
can
indicate the beginning of a string, or the beginning of a line *or* a
string, depending upon what options you use. You didn't specify the
option(s) you're using, so we have no way to know.

In addition, your pattern is not likely to work in the way you expect.
for
example, the following would match:

THIS IS NOT WHAT THIS IS SUPPOSED TO BE. (Matches the phrase "THIS IS ")

And in addition, if there are line breaks, like your example (as split by
the newsreader), the matching substring would be:

DEFECT) 37.33/2 HEART (CONDUCTION DEFECT)

So, can you explain what your rules are, and what you are trying to match
here? I'm just guessing that you're parsing medical transcriptions, but
beyond that, I'm stumped.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

I recycle.
I send everything back to the planet it came from.
"Mike" <ms********@cha rter.net> wrote in message
news:11******** **************@ p79g2000cwp.goo glegroups.com.. .
> Xicheng Jia wrote:
>> Mike wrote:
>> > I have a regular expression (^(.+)(?=\s*).* \1 ) that results in
>> > matches. I would like to get what the actual regular expression is.
>> > In other words, when I apply ^(.+)(?=\s*).*\ 1 to " HEART
>> > (CONDUCTION
>> > DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
>> > CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)".
>> > How
>> > do I gain access to the expression (not the matches) at runtime?
>>
>> you want to access the expression "HEART (CONDUCTION DEFECT)" or the
>> regex "^(.+)(?=\s*).* \1" at run-time?? dont think you can get exactly
>> the latter one though, for the previous one, you can use named
>> capture,
>> like
>>
>> ^(?<expr>.+)(?= \s*).*\k<expr>
>>
>> and access the variable "expr" at run time?
>>
>> Xicheng
>
> I want to access the expression "HEART (CONDUCTION DEFECT)" I'll try
> your suggestion first off in the morning.
>
> Thanks,
>

Jun 21 '06 #7
Hi Kevin,

Since you appear to be rational about the actual objective of groups
(communication) , I'll try to respond effectively.
1. You have a set of data that is pure text, and is either stored in an
actual database, or in the text equivalent of a database as a multi-line
text document. I can't be exactly sure.
Yes, the test is now stored in a database with webservice front-end.
2. In any case, this data consists multiple single-line entries of text.
More-or-less, yes. I can get the data as such after jumping through
some hoops (user enters the first word of the index entry they want
which retrieves all the codes that have an index entry that has an
Index Entry property starting with that word. Then, before each code
can have multiple index entries -- not necessarily all ones they want
-- I have to discard the ones that do not start with 'Ablation' for
example).
3. The data is stored in such a way that the text represents a hierarchical
structure of nodes.
If I understand your comment, yes. Actually, sorted alphabetically,
ALL the index entries are in the order they would appear in a tree
except for "WITH, BY" and a couple other terms that can become nodes
themselves. The issues is that the surgical indexers are used to
viewing the data in a certain way and want to continue to view it
without all the duplicate text in surrounding index entries.
4. This is achieved by a top-level classification that is repeated in each
"record" (line) for every record that falls under it.
Yes, I believe so.
5. Sub-nodes are indicated in the same way by the first text that follows
the top-level node text.
Yes. Sub nodes are just the text that repeats across lines after the
repeated substring in a larger set of lines has been removed.
6. The node identifier text in the sub-nodes can be identified by comparing
it with other records that are under the top-level node. There is no other
way to distinguish this text from any other text in the record, other than
by comparing it with other records.
Yes, because the actual text has no meta data to identify which is a
parent and which is a child.
7. Therefore, the structure of the hierarchy can be inferred by using a
recursive procedure that identifies increasingly "deep" sub-nodes within the
set of records.
Yes, I believe I can do this. On the old OS390 it was done kind of
like this with heavy parsing.
8. (Now here's where I'm a bit fuzzy). Your task is to put all of this into
some form of data structure that can be used as an index, probably a
hierarchical structure such as a tree.

Yes, exactly, I believe I can setup a recursive algorithm that will
populate a tree view control that will represent the codes in a way the
surgical coders are used to seeing. They will, for example, be able to
explode "ABLATION" and see subnodes of "ENDOMETRIA L (HYSTEROSCOPIC)
68.23"
"Heart (Conduction Defect) 27.33/2"

Then, upon exploding those nodes any child nodes would be displayed,
etc.
Question: Will these records be ordered in any way? IOW, for example, will
they be ordered alphabetically? If they are ordered alphabetically, the
structure is already present, by virtue of the rules as stated above.
Otherwise, it will be necessary to do some form of re-scanning of the data.
In my pre-processing I can sort these alphabetically. As always, the
question is really how to eliminate the duplicated text from
surrounding lines and correctly place children/parents in relation to
each other.

Question: Can you tell me what sort of format the end result is supposed to
be in? Is it simply a data structure in memory? Or what?
Simply a data structure in memory as the end users want to be able to
pull this up on demand as they are coding surgical cases.
Kevin Spencer wrote: Hi Mike,

As far as Top-Posting is concerned, AFAIK it's still a matter of debate, and
as we're talking about Netiquette, not ISO or W3C standards, my personal
feeling is that anyone who scolds one about top- or bottom-posting has poor
sense of priority. After all, the purpose of groups such as this is
communication. I find it far more difficult to deal with poor communication
than with the format of a post, but that's just me! ;-)

In your case, you have done a pretty darned good job of communication, and I
appreciate that, so I will certainly do all I can to help out! I did have to
do a little research into ICD9, but that wasn't hard with Google.

It took me a few minutes of study to figure out (for the most part) what
your requirements are. Let me see if I can repeat them back to you in my own
words, and ask a couple of questions:

1. You have a set of data that is pure text, and is either stored in an
actual database, or in the text equivalent of a database as a multi-line
text document. I can't be exactly sure.
2. In any case, this data consists multiple single-line entries of text.
3. The data is stored in such a way that the text represents a hierarchical
structure of nodes.
4. This is achieved by a top-level classification that is repeated in each
"record" (line) for every record that falls under it.
5. Sub-nodes are indicated in the same way by the first text that follows
the top-level node text.
6. The node identifier text in the sub-nodes can be identified by comparing
it with other records that are under the top-level node. There is no other
way to distinguish this text from any other text in the record, other than
by comparing it with other records.
7. Therefore, the structure of the hierarchy can be inferred by using a
recursive procedure that identifies increasingly "deep" sub-nodes within the
set of records.
8. (Now here's where I'm a bit fuzzy). Your task is to put all of this into
some form of data structure that can be used as an index, probably a
hierarchical structure such as a tree.

Question: Will these records be ordered in any way? IOW, for example, will
they be ordered alphabetically? If they are ordered alphabetically, the
structure is already present, by virtue of the rules as stated above.
Otherwise, it will be necessary to do some form of re-scanning of the data.

Question: Can you tell me what sort of format the end result is supposed to
be in? Is it simply a data structure in memory? Or what?

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

I recycle.
I send everything back to the planet it came from.


"Mike" <ms********@cha rter.net> wrote in message
news:11******** *************@u 72g2000cwu.goog legroups.com...
Must say I get burned in six different ways. Some groups I top post
and get scolded. On other groups others people top post and nobody
appears to have a problem. I'll top post here.

Given I've been asked for details I'll provide them, but typically
nobody wants to wade through them.

In the dark ages I had 24,000 lines of ICD9 index entries which got
appended with ICD9 codes and were processed one time per year into a
big paper report with a tree-like structure by an assembler program on
an OS390. An abbreviated example of the report is below for the
Ablation entry.

Ablation
Endometrial (Hysteroscopic) 68.23
Heart (Conduction Defect) 27.33/2
With Catheter 37.34/2
Inner Ear (Cryosurgery) (Ultrasound) 20.79/4
By Injection 20.72
Lesion Heart
By Peripherally Inserted Catheter 37.34

Across my institution in the past there have been multiple "master"
copies of ICD9 codes and index entries. The order came down that
long-term we will work towards a single copy of ICD9 codes with index
entries that will be accessed via webservices. The structure of the
data in our old database was as follows (no line breaks -- each entry
was one line):

ABLATION ENDOMETRIAL (HYSTEROSCOPIC) 68.23
ABLATION HEART (CONDUCTION DEFECT) 37.33/2
ABLATION HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2
ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) 20.79/4
ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) BY INJECTION 20.72
ABLATION LESION HEART BY PERIPHERALLY INSERTED CATHETER 37.34
ABLATION LESION HEART ENDOVASCULAR APPROACH 37.34
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) ENDOVASCULAR
APPROACH 37.34
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) OPEN (TRANS-THORACIC)
APPROACH 37.33
ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) TRANS-THORACIC
APPROACH 37.33
ABLATION PITUITARY 7.69
ABLATION PITUITARY BY COBALT-60 92.32
ABLATION PITUITARY BY IMPLANTATION (STRONTIUM-YTTRIUM) (Y) NEC 92.39
ABLATION PITUITARY BY PROTON BEAM (BRAGG PEAK) 92.33
ABLATION PROSTATE (ANAT = 59.02) BY LASER, TRANSURETHRAL 60.21
ABLATION PROSTATE (ANAT = 59.02) BY RADIOFREQUENCY THERMOTHERAPY
60.97
ABLATION PROSTATE (ANAT = 59.02) BY TRANSURETHRAL NEEDLE ABLATION
(TUNA) 60.97
ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY CRYOABLATION 60.62
ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY RADICAL CRYOSURGICAL
ABLATION (RCSA) 60.62
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL BY LASER 60.21
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL CRYOABLATION 60.29
ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL RADICAL CRYOSURGICAL
ABLATION (RCSA) 60.29
ABLATION TISSUE HEART - SEE ABLATION, LESION, HEART 0
ABLATION VESICLE NECK (ANAT = 60.02) 57.91
The new webservices still have this same index structure except now,
for example, "Ablation Vesicle Neck (ANAT = 60.02)" is just a property
of code 57.91. The surgical coders still want to view the index
entries in a tree structure on demand. Without getting into
mind-numbing details, I can jump through some hoops and get back a set
of index entries that look like above for ABLATION but they are not
formatted in the way the surgical coders desire. I believe I have a
recursive algorithm that will work to format these into a tree
structure but this algorithm is predicated on being able to find the
nodes.

If you look carefully, the root node for entire set of index entries
above is "ABLATION" (as that is what begins each entry and repeats
across all of them). Subsequently, Endometrial (Hysteroscopic) + code
is a child of ABLATION with no children of its own because it is not
repeated. Next, Heart (Conduction Defect) + code is a node with "With
Catheter + code" as a child of that node because "Heart (Conduction
Defect)" repeats across both those lines.

I have begged the group that now owns the webservice to allow me to
restructure the data but no go (they say that would be bastardizing the
concept of everything being 'code-centric'). I am stuck with this and
also with the demand by the coders that they get the formatted tree
structure to look at when they code.

In general, I think if I do the following I can figure out the nodes
and children:

1. Read index entries until the first word changes.
2. Get the substring that begins the string and is repeated elsewhere
in the string (this is the node).
3. Remove that node and keep processing until the base case is hit etc.

If anyone has any better ideas of how to deal with this I would be
thrilled to no end to hear them.
Thanks,

Mike
Kevin Spencer wrote:
> I want to access the expression "HEART (CONDUCTION DEFECT)" I'll try
> your suggestion first off in the morning.

First, "HEART (CONDUCTION DEFECT)" is not an expression. That is a
substring of the original string. The regular expression is the string
"^(.+)(?=\s*).* \1" that you are using to get your match. Assuming that
"HEART (CONDUCTION DEFECT)" is your match (which it is not), you could
call
it a match for the regular expression (which may match more than once in
a
string). But it is a substring of the original string. It may seem picky,
but in order to communicate effectively, one must use the right terms. As
an
example, if I told you that I ate a car for breakfast, would you know
that I
ate an apple?

Second, the string you posted contains 2 instances of the substring
"HEART
(CONDUCTION DEFECT)". Do you want to get both of them? If so, what
exactly
are your pattern-matching rules? A regular expression matches a pattern.
Obviously, not all of the strings you will be working with will be:

" HEART (CONDUCTION
DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
CATHETER 37.34/2 "

In fact, probably due to this being a newsgroup, and my using a
newsreader,
I would doubt that the line breaks in the string are where they are, if
they
are. And I have to wonder whether the string actually begins and ends
with a
space.

In other words, you're going to be using a regular expression to isolate
substrings of various strings (most probably). A regular expression is
shorthand for a set of rules that defines a pattern you're looking for.
Whether the strings contain line breaks, for example, is important. Your
regular expression begins with the caret '^' character. This character
can
indicate the beginning of a string, or the beginning of a line *or* a
string, depending upon what options you use. You didn't specify the
option(s) you're using, so we have no way to know.

In addition, your pattern is not likely to work in the way you expect.
for
example, the following would match:

THIS IS NOT WHAT THIS IS SUPPOSED TO BE. (Matches the phrase "THIS IS ")

And in addition, if there are line breaks, like your example (as split by
the newsreader), the matching substring would be:

DEFECT) 37.33/2 HEART (CONDUCTION DEFECT)

So, can you explain what your rules are, and what you are trying to match
here? I'm just guessing that you're parsing medical transcriptions, but
beyond that, I'm stumped.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

I recycle.
I send everything back to the planet it came from.
"Mike" <ms********@cha rter.net> wrote in message
news:11******** **************@ p79g2000cwp.goo glegroups.com.. .
> Xicheng Jia wrote:
>> Mike wrote:
>> > I have a regular expression (^(.+)(?=\s*).* \1 ) that results in
>> > matches. I would like to get what the actual regular expression is.
>> > In other words, when I apply ^(.+)(?=\s*).*\ 1 to " HEART
>> > (CONDUCTION
>> > DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
>> > CATHETER 37.34/2 " the expression is "HEART (CONDUCTION DEFECT)".
>> > How
>> > do I gain access to the expression (not the matches) at runtime?
>>
>> you want to access the expression "HEART (CONDUCTION DEFECT)" or the
>> regex "^(.+)(?=\s*).* \1" at run-time?? dont think you can get exactly
>> the latter one though, for the previous one, you can use named
>> capture,
>> like
>>
>> ^(?<expr>.+)(?= \s*).*\k<expr>
>>
>> and access the variable "expr" at run time?
>>
>> Xicheng
>
> I want to access the expression "HEART (CONDUCTION DEFECT)" I'll try
> your suggestion first off in the morning.
>
> Thanks,
>


Jun 21 '06 #8
Hi Mike,

I fiddled with this problem using regular expressions for entirely too long
last night, and finally came to the conclusion that regular expressions
aren't going to provide what you need in this case. As you discovered, your
regular expression solution, which was about as close as one could get to
something that works with regular expressions, can't identify an unknown
pattern and then match that, which is essentially what you tried valiantly
to do. I have to give you credit for creativity!

Of course, this doesn't bring you any closer to a solution, so I gave that
some thought as well. It seems to me that you're looking for some sort of
recursive nested looping function. Once the data is sorted alphabetically,
it's basically a matter of comparing each line with the line that follows.
If you can be sure that the pattern will break on a word break (space), the
task becomes easier. I'll try to sketch something out along the lines of
what I'm thinking, and you can see what you think and perhaps flesh it out:

This is the comparison method. It does a char-by-char comparison of 2
strings, returning the number of chars that match from the beginning of the
first string. If you can be sure that your nodes will break on spaces, you
could optimize this by using a word-by-word comparison.

string[] items;

// Compare each char of a string in an array with
// Each char of the next string in the array, and
// return the length of the matched string.
int Match(int index, int length)
{
int maxLength = (length < 0 ? items[i].Length : length;
if (index == items.Length - 1) return 0;
for (int i = 0; i < maxLength; i++)
{
if (items[index][i] != items[index + 1][i]) return i - 1;
if (i == items[index + 1].Length - 1) return i;
}
}

Now, what I would do with this is, since you want to create a hierarchical
tree, use the System.Xml Namespace, and an XmlDocument class to create your
in-memory structure. You could certainly create your own lightweight
hierarchical node type, but this way, if the need ever arises (and it
probably will) that you want to transform your data to another format, you
have the ability to use the XmlDocument class as an XML Document, and
transform it any way you like (including as pure XML text), one of the
beauties of XML.

Once you've created your root node, you loop through the "array", calling
the Match method for each item in the "array" until it returns 0. You
initialize it by passing -1 to it, which indicates that it compares the
entire length of the first string. After that, you pass the return value
from the first comparison, which gives you the length of the first child
node. At this point you have your first sub-grouping, and your first child
node, which is the substring of the first string having the length returned
by the first comparison.

If the number of iterations is less than the length of the "array" you start
again with the next item in the array, in the same manner as the first. Each
pass of this routine returns a "node" and the length of the node value.

You recursively repeat this process for each subset of each node, starting
with the length of the node value, and using the substring starting from
that point for each element in the subset. This adds a list of nodes to each
node, and recursively does the same for each child node of each node, and so
on. When you have reached the end of all the strings, you're done.

This is about as elegant a solution as I can come up with. I'll be
interested to hear about your final solution.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

I recycle.
I send everything back to the planet it came from.

"Mike" <ms********@cha rter.net> wrote in message
news:11******** **************@ m73g2000cwd.goo glegroups.com.. .
Hi Kevin,

Since you appear to be rational about the actual objective of groups
(communication) , I'll try to respond effectively.
1. You have a set of data that is pure text, and is either stored in an
actual database, or in the text equivalent of a database as a multi-line
text document. I can't be exactly sure.


Yes, the test is now stored in a database with webservice front-end.
2. In any case, this data consists multiple single-line entries of text.


More-or-less, yes. I can get the data as such after jumping through
some hoops (user enters the first word of the index entry they want
which retrieves all the codes that have an index entry that has an
Index Entry property starting with that word. Then, before each code
can have multiple index entries -- not necessarily all ones they want
-- I have to discard the ones that do not start with 'Ablation' for
example).
3. The data is stored in such a way that the text represents a
hierarchical
structure of nodes.


If I understand your comment, yes. Actually, sorted alphabetically,
ALL the index entries are in the order they would appear in a tree
except for "WITH, BY" and a couple other terms that can become nodes
themselves. The issues is that the surgical indexers are used to
viewing the data in a certain way and want to continue to view it
without all the duplicate text in surrounding index entries.
4. This is achieved by a top-level classification that is repeated in
each
"record" (line) for every record that falls under it.


Yes, I believe so.
5. Sub-nodes are indicated in the same way by the first text that follows
the top-level node text.


Yes. Sub nodes are just the text that repeats across lines after the
repeated substring in a larger set of lines has been removed.
6. The node identifier text in the sub-nodes can be identified by
comparing
it with other records that are under the top-level node. There is no
other
way to distinguish this text from any other text in the record, other
than
by comparing it with other records.


Yes, because the actual text has no meta data to identify which is a
parent and which is a child.
7. Therefore, the structure of the hierarchy can be inferred by using a
recursive procedure that identifies increasingly "deep" sub-nodes within
the
set of records.


Yes, I believe I can do this. On the old OS390 it was done kind of
like this with heavy parsing.
8. (Now here's where I'm a bit fuzzy). Your task is to put all of this
into
some form of data structure that can be used as an index, probably a
hierarchical structure such as a tree.


Yes, exactly, I believe I can setup a recursive algorithm that will
populate a tree view control that will represent the codes in a way the
surgical coders are used to seeing. They will, for example, be able to
explode "ABLATION" and see subnodes of "ENDOMETRIA L (HYSTEROSCOPIC)
68.23"
"Heart (Conduction Defect) 27.33/2"

Then, upon exploding those nodes any child nodes would be displayed,
etc.
Question: Will these records be ordered in any way? IOW, for example,
will
they be ordered alphabetically? If they are ordered alphabetically, the
structure is already present, by virtue of the rules as stated above.
Otherwise, it will be necessary to do some form of re-scanning of the
data.


In my pre-processing I can sort these alphabetically. As always, the
question is really how to eliminate the duplicated text from
surrounding lines and correctly place children/parents in relation to
each other.

Question: Can you tell me what sort of format the end result is supposed
to
be in? Is it simply a data structure in memory? Or what?


Simply a data structure in memory as the end users want to be able to
pull this up on demand as they are coding surgical cases.
Kevin Spencer wrote:
Hi Mike,

As far as Top-Posting is concerned, AFAIK it's still a matter of debate,
and
as we're talking about Netiquette, not ISO or W3C standards, my personal
feeling is that anyone who scolds one about top- or bottom-posting has
poor
sense of priority. After all, the purpose of groups such as this is
communication. I find it far more difficult to deal with poor
communication
than with the format of a post, but that's just me! ;-)

In your case, you have done a pretty darned good job of communication,
and I
appreciate that, so I will certainly do all I can to help out! I did have
to
do a little research into ICD9, but that wasn't hard with Google.

It took me a few minutes of study to figure out (for the most part) what
your requirements are. Let me see if I can repeat them back to you in my
own
words, and ask a couple of questions:

1. You have a set of data that is pure text, and is either stored in an
actual database, or in the text equivalent of a database as a multi-line
text document. I can't be exactly sure.
2. In any case, this data consists multiple single-line entries of text.
3. The data is stored in such a way that the text represents a
hierarchical
structure of nodes.
4. This is achieved by a top-level classification that is repeated in
each
"record" (line) for every record that falls under it.
5. Sub-nodes are indicated in the same way by the first text that follows
the top-level node text.
6. The node identifier text in the sub-nodes can be identified by
comparing
it with other records that are under the top-level node. There is no
other
way to distinguish this text from any other text in the record, other
than
by comparing it with other records.
7. Therefore, the structure of the hierarchy can be inferred by using a
recursive procedure that identifies increasingly "deep" sub-nodes within
the
set of records.
8. (Now here's where I'm a bit fuzzy). Your task is to put all of this
into
some form of data structure that can be used as an index, probably a
hierarchical structure such as a tree.

Question: Will these records be ordered in any way? IOW, for example,
will
they be ordered alphabetically? If they are ordered alphabetically, the
structure is already present, by virtue of the rules as stated above.
Otherwise, it will be necessary to do some form of re-scanning of the
data.

Question: Can you tell me what sort of format the end result is supposed
to
be in? Is it simply a data structure in memory? Or what?

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

I recycle.
I send everything back to the planet it came from.


"Mike" <ms********@cha rter.net> wrote in message
news:11******** *************@u 72g2000cwu.goog legroups.com...
> Must say I get burned in six different ways. Some groups I top post
> and get scolded. On other groups others people top post and nobody
> appears to have a problem. I'll top post here.
>
> Given I've been asked for details I'll provide them, but typically
> nobody wants to wade through them.
>
> In the dark ages I had 24,000 lines of ICD9 index entries which got
> appended with ICD9 codes and were processed one time per year into a
> big paper report with a tree-like structure by an assembler program on
> an OS390. An abbreviated example of the report is below for the
> Ablation entry.
>
> Ablation
> Endometrial (Hysteroscopic) 68.23
> Heart (Conduction Defect) 27.33/2
> With Catheter 37.34/2
> Inner Ear (Cryosurgery) (Ultrasound) 20.79/4
> By Injection 20.72
> Lesion Heart
> By Peripherally Inserted Catheter 37.34
>
> Across my institution in the past there have been multiple "master"
> copies of ICD9 codes and index entries. The order came down that
> long-term we will work towards a single copy of ICD9 codes with index
> entries that will be accessed via webservices. The structure of the
> data in our old database was as follows (no line breaks -- each entry
> was one line):
>
> ABLATION ENDOMETRIAL (HYSTEROSCOPIC) 68.23
> ABLATION HEART (CONDUCTION DEFECT) 37.33/2
> ABLATION HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2
> ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) 20.79/4
> ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) BY INJECTION 20.72
> ABLATION LESION HEART BY PERIPHERALLY INSERTED CATHETER 37.34
> ABLATION LESION HEART ENDOVASCULAR APPROACH 37.34
> ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) ENDOVASCULAR
> APPROACH 37.34
> ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) OPEN (TRANS-THORACIC)
> APPROACH 37.33
> ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) TRANS-THORACIC
> APPROACH 37.33
> ABLATION PITUITARY 7.69
> ABLATION PITUITARY BY COBALT-60 92.32
> ABLATION PITUITARY BY IMPLANTATION (STRONTIUM-YTTRIUM) (Y) NEC 92.39
> ABLATION PITUITARY BY PROTON BEAM (BRAGG PEAK) 92.33
> ABLATION PROSTATE (ANAT = 59.02) BY LASER, TRANSURETHRAL 60.21
> ABLATION PROSTATE (ANAT = 59.02) BY RADIOFREQUENCY THERMOTHERAPY
> 60.97
> ABLATION PROSTATE (ANAT = 59.02) BY TRANSURETHRAL NEEDLE ABLATION
> (TUNA) 60.97
> ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY CRYOABLATION 60.62
> ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY RADICAL CRYOSURGICAL
> ABLATION (RCSA) 60.62
> ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL BY LASER 60.21
> ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL CRYOABLATION 60.29
> ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL RADICAL CRYOSURGICAL
> ABLATION (RCSA) 60.29
> ABLATION TISSUE HEART - SEE ABLATION, LESION, HEART 0
> ABLATION VESICLE NECK (ANAT = 60.02) 57.91
>
>
> The new webservices still have this same index structure except now,
> for example, "Ablation Vesicle Neck (ANAT = 60.02)" is just a property
> of code 57.91. The surgical coders still want to view the index
> entries in a tree structure on demand. Without getting into
> mind-numbing details, I can jump through some hoops and get back a set
> of index entries that look like above for ABLATION but they are not
> formatted in the way the surgical coders desire. I believe I have a
> recursive algorithm that will work to format these into a tree
> structure but this algorithm is predicated on being able to find the
> nodes.
>
> If you look carefully, the root node for entire set of index entries
> above is "ABLATION" (as that is what begins each entry and repeats
> across all of them). Subsequently, Endometrial (Hysteroscopic) + code
> is a child of ABLATION with no children of its own because it is not
> repeated. Next, Heart (Conduction Defect) + code is a node with "With
> Catheter + code" as a child of that node because "Heart (Conduction
> Defect)" repeats across both those lines.
>
> I have begged the group that now owns the webservice to allow me to
> restructure the data but no go (they say that would be bastardizing the
> concept of everything being 'code-centric'). I am stuck with this and
> also with the demand by the coders that they get the formatted tree
> structure to look at when they code.
>
> In general, I think if I do the following I can figure out the nodes
> and children:
>
> 1. Read index entries until the first word changes.
> 2. Get the substring that begins the string and is repeated elsewhere
> in the string (this is the node).
> 3. Remove that node and keep processing until the base case is hit etc.
>
> If anyone has any better ideas of how to deal with this I would be
> thrilled to no end to hear them.
>
>
> Thanks,
>
> Mike
>
>
> Kevin Spencer wrote:
>> > I want to access the expression "HEART (CONDUCTION DEFECT)" I'll
>> > try
>> > your suggestion first off in the morning.
>>
>> First, "HEART (CONDUCTION DEFECT)" is not an expression. That is a
>> substring of the original string. The regular expression is the string
>> "^(.+)(?=\s*).* \1" that you are using to get your match. Assuming that
>> "HEART (CONDUCTION DEFECT)" is your match (which it is not), you
>> could
>> call
>> it a match for the regular expression (which may match more than once
>> in
>> a
>> string). But it is a substring of the original string. It may seem
>> picky,
>> but in order to communicate effectively, one must use the right terms.
>> As
>> an
>> example, if I told you that I ate a car for breakfast, would you know
>> that I
>> ate an apple?
>>
>> Second, the string you posted contains 2 instances of the substring
>> "HEART
>> (CONDUCTION DEFECT)". Do you want to get both of them? If so, what
>> exactly
>> are your pattern-matching rules? A regular expression matches a
>> pattern.
>> Obviously, not all of the strings you will be working with will be:
>>
>> " HEART (CONDUCTION
>> DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
>> CATHETER 37.34/2 "
>>
>> In fact, probably due to this being a newsgroup, and my using a
>> newsreader,
>> I would doubt that the line breaks in the string are where they are,
>> if
>> they
>> are. And I have to wonder whether the string actually begins and ends
>> with a
>> space.
>>
>> In other words, you're going to be using a regular expression to
>> isolate
>> substrings of various strings (most probably). A regular expression is
>> shorthand for a set of rules that defines a pattern you're looking
>> for.
>> Whether the strings contain line breaks, for example, is important.
>> Your
>> regular expression begins with the caret '^' character. This character
>> can
>> indicate the beginning of a string, or the beginning of a line *or* a
>> string, depending upon what options you use. You didn't specify the
>> option(s) you're using, so we have no way to know.
>>
>> In addition, your pattern is not likely to work in the way you expect.
>> for
>> example, the following would match:
>>
>> THIS IS NOT WHAT THIS IS SUPPOSED TO BE. (Matches the phrase "THIS IS
>> ")
>>
>> And in addition, if there are line breaks, like your example (as split
>> by
>> the newsreader), the matching substring would be:
>>
>> DEFECT) 37.33/2 HEART (CONDUCTION DEFECT)
>>
>> So, can you explain what your rules are, and what you are trying to
>> match
>> here? I'm just guessing that you're parsing medical transcriptions,
>> but
>> beyond that, I'm stumped.
>>
>> --
>> HTH,
>>
>> Kevin Spencer
>> Microsoft MVP
>> Professional Chicken Salad Alchemist
>>
>> I recycle.
>> I send everything back to the planet it came from.
>>
>>
>> "Mike" <ms********@cha rter.net> wrote in message
>> news:11******** **************@ p79g2000cwp.goo glegroups.com.. .
>> > Xicheng Jia wrote:
>> >> Mike wrote:
>> >> > I have a regular expression (^(.+)(?=\s*).* \1 ) that results in
>> >> > matches. I would like to get what the actual regular expression
>> >> > is.
>> >> > In other words, when I apply ^(.+)(?=\s*).*\ 1 to " HEART
>> >> > (CONDUCTION
>> >> > DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
>> >> > CATHETER 37.34/2 " the expression is "HEART (CONDUCTION
>> >> > DEFECT)".
>> >> > How
>> >> > do I gain access to the expression (not the matches) at runtime?
>> >>
>> >> you want to access the expression "HEART (CONDUCTION DEFECT)" or
>> >> the
>> >> regex "^(.+)(?=\s*).* \1" at run-time?? dont think you can get
>> >> exactly
>> >> the latter one though, for the previous one, you can use named
>> >> capture,
>> >> like
>> >>
>> >> ^(?<expr>.+)(?= \s*).*\k<expr>
>> >>
>> >> and access the variable "expr" at run time?
>> >>
>> >> Xicheng
>> >
>> > I want to access the expression "HEART (CONDUCTION DEFECT)" I'll
>> > try
>> > your suggestion first off in the morning.
>> >
>> > Thanks,
>> >
>

Jun 22 '06 #9
Hi Kevin,

I have been working on this problem on and off for maybe 9 months. New
ICD9 codes are released in October so I have a few months to get a
solution in place. I am going to see if there is some other way to get
the owners of the webservices to offer something different.. I'd like
to get the data more structure (i.e do something similar to this:
http://www.developerfusion.co.uk/show/4633/) Any idea if retrieving
data structured like in the article would fit well with the C# tree
structure?

I will also look at your suggestion, but I had previously tried to
implement something similar with a two-dimensional array and the
problem was with performance. Very slow for those entries with several
hundred rows such as "Biopsy"...

Mike


Kevin Spencer wrote:
Hi Mike,

I fiddled with this problem using regular expressions for entirely too long
last night, and finally came to the conclusion that regular expressions
aren't going to provide what you need in this case. As you discovered, your
regular expression solution, which was about as close as one could get to
something that works with regular expressions, can't identify an unknown
pattern and then match that, which is essentially what you tried valiantly
to do. I have to give you credit for creativity!

Of course, this doesn't bring you any closer to a solution, so I gave that
some thought as well. It seems to me that you're looking for some sort of
recursive nested looping function. Once the data is sorted alphabetically,
it's basically a matter of comparing each line with the line that follows.
If you can be sure that the pattern will break on a word break (space), the
task becomes easier. I'll try to sketch something out along the lines of
what I'm thinking, and you can see what you think and perhaps flesh it out:

This is the comparison method. It does a char-by-char comparison of 2
strings, returning the number of chars that match from the beginning of the
first string. If you can be sure that your nodes will break on spaces, you
could optimize this by using a word-by-word comparison.

string[] items;

// Compare each char of a string in an array with
// Each char of the next string in the array, and
// return the length of the matched string.
int Match(int index, int length)
{
int maxLength = (length < 0 ? items[i].Length : length;
if (index == items.Length - 1) return 0;
for (int i = 0; i < maxLength; i++)
{
if (items[index][i] != items[index + 1][i]) return i - 1;
if (i == items[index + 1].Length - 1) return i;
}
}

Now, what I would do with this is, since you want to create a hierarchical
tree, use the System.Xml Namespace, and an XmlDocument class to create your
in-memory structure. You could certainly create your own lightweight
hierarchical node type, but this way, if the need ever arises (and it
probably will) that you want to transform your data to another format, you
have the ability to use the XmlDocument class as an XML Document, and
transform it any way you like (including as pure XML text), one of the
beauties of XML.

Once you've created your root node, you loop through the "array", calling
the Match method for each item in the "array" until it returns 0. You
initialize it by passing -1 to it, which indicates that it compares the
entire length of the first string. After that, you pass the return value
from the first comparison, which gives you the length of the first child
node. At this point you have your first sub-grouping, and your first child
node, which is the substring of the first string having the length returned
by the first comparison.

If the number of iterations is less than the length of the "array" you start
again with the next item in the array, in the same manner as the first. Each
pass of this routine returns a "node" and the length of the node value.

You recursively repeat this process for each subset of each node, starting
with the length of the node value, and using the substring starting from
that point for each element in the subset. This adds a list of nodes to each
node, and recursively does the same for each child node of each node, and so
on. When you have reached the end of all the strings, you're done.

This is about as elegant a solution as I can come up with. I'll be
interested to hear about your final solution.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

I recycle.
I send everything back to the planet it came from.

"Mike" <ms********@cha rter.net> wrote in message
news:11******** **************@ m73g2000cwd.goo glegroups.com.. .
Hi Kevin,

Since you appear to be rational about the actual objective of groups
(communication) , I'll try to respond effectively.
1. You have a set of data that is pure text, and is either stored in an
actual database, or in the text equivalent of a database as a multi-line
text document. I can't be exactly sure.


Yes, the test is now stored in a database with webservice front-end.
2. In any case, this data consists multiple single-line entries of text.


More-or-less, yes. I can get the data as such after jumping through
some hoops (user enters the first word of the index entry they want
which retrieves all the codes that have an index entry that has an
Index Entry property starting with that word. Then, before each code
can have multiple index entries -- not necessarily all ones they want
-- I have to discard the ones that do not start with 'Ablation' for
example).
3. The data is stored in such a way that the text represents a
hierarchical
structure of nodes.


If I understand your comment, yes. Actually, sorted alphabetically,
ALL the index entries are in the order they would appear in a tree
except for "WITH, BY" and a couple other terms that can become nodes
themselves. The issues is that the surgical indexers are used to
viewing the data in a certain way and want to continue to view it
without all the duplicate text in surrounding index entries.
4. This is achieved by a top-level classification that is repeated in
each
"record" (line) for every record that falls under it.


Yes, I believe so.
5. Sub-nodes are indicated in the same way by the first text that follows
the top-level node text.


Yes. Sub nodes are just the text that repeats across lines after the
repeated substring in a larger set of lines has been removed.
6. The node identifier text in the sub-nodes can be identified by
comparing
it with other records that are under the top-level node. There is no
other
way to distinguish this text from any other text in the record, other
than
by comparing it with other records.


Yes, because the actual text has no meta data to identify which is a
parent and which is a child.
7. Therefore, the structure of the hierarchy can be inferred by using a
recursive procedure that identifies increasingly "deep" sub-nodes within
the
set of records.


Yes, I believe I can do this. On the old OS390 it was done kind of
like this with heavy parsing.
8. (Now here's where I'm a bit fuzzy). Your task is to put all of this
into
some form of data structure that can be used as an index, probably a
hierarchical structure such as a tree.


Yes, exactly, I believe I can setup a recursive algorithm that will
populate a tree view control that will represent the codes in a way the
surgical coders are used to seeing. They will, for example, be able to
explode "ABLATION" and see subnodes of "ENDOMETRIA L (HYSTEROSCOPIC)
68.23"
"Heart (Conduction Defect) 27.33/2"

Then, upon exploding those nodes any child nodes would be displayed,
etc.
Question: Will these records be ordered in any way? IOW, for example,
will
they be ordered alphabetically? If they are ordered alphabetically, the
structure is already present, by virtue of the rules as stated above.
Otherwise, it will be necessary to do some form of re-scanning of the
data.


In my pre-processing I can sort these alphabetically. As always, the
question is really how to eliminate the duplicated text from
surrounding lines and correctly place children/parents in relation to
each other.

Question: Can you tell me what sort of format the end result is supposed
to
be in? Is it simply a data structure in memory? Or what?


Simply a data structure in memory as the end users want to be able to
pull this up on demand as they are coding surgical cases.
Kevin Spencer wrote:
Hi Mike,

As far as Top-Posting is concerned, AFAIK it's still a matter of debate,
and
as we're talking about Netiquette, not ISO or W3C standards, my personal
feeling is that anyone who scolds one about top- or bottom-posting has
poor
sense of priority. After all, the purpose of groups such as this is
communication. I find it far more difficult to deal with poor
communication
than with the format of a post, but that's just me! ;-)

In your case, you have done a pretty darned good job of communication,
and I
appreciate that, so I will certainly do all I can to help out! I did have
to
do a little research into ICD9, but that wasn't hard with Google.

It took me a few minutes of study to figure out (for the most part) what
your requirements are. Let me see if I can repeat them back to you in my
own
words, and ask a couple of questions:

1. You have a set of data that is pure text, and is either stored in an
actual database, or in the text equivalent of a database as a multi-line
text document. I can't be exactly sure.
2. In any case, this data consists multiple single-line entries of text.
3. The data is stored in such a way that the text represents a
hierarchical
structure of nodes.
4. This is achieved by a top-level classification that is repeated in
each
"record" (line) for every record that falls under it.
5. Sub-nodes are indicated in the same way by the first text that follows
the top-level node text.
6. The node identifier text in the sub-nodes can be identified by
comparing
it with other records that are under the top-level node. There is no
other
way to distinguish this text from any other text in the record, other
than
by comparing it with other records.
7. Therefore, the structure of the hierarchy can be inferred by using a
recursive procedure that identifies increasingly "deep" sub-nodes within
the
set of records.
8. (Now here's where I'm a bit fuzzy). Your task is to put all of this
into
some form of data structure that can be used as an index, probably a
hierarchical structure such as a tree.

Question: Will these records be ordered in any way? IOW, for example,
will
they be ordered alphabetically? If they are ordered alphabetically, the
structure is already present, by virtue of the rules as stated above.
Otherwise, it will be necessary to do some form of re-scanning of the
data.

Question: Can you tell me what sort of format the end result is supposed
to
be in? Is it simply a data structure in memory? Or what?

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

I recycle.
I send everything back to the planet it came from.


"Mike" <ms********@cha rter.net> wrote in message
news:11******** *************@u 72g2000cwu.goog legroups.com...
> Must say I get burned in six different ways. Some groups I top post
> and get scolded. On other groups others people top post and nobody
> appears to have a problem. I'll top post here.
>
> Given I've been asked for details I'll provide them, but typically
> nobody wants to wade through them.
>
> In the dark ages I had 24,000 lines of ICD9 index entries which got
> appended with ICD9 codes and were processed one time per year into a
> big paper report with a tree-like structure by an assembler program on
> an OS390. An abbreviated example of the report is below for the
> Ablation entry.
>
> Ablation
> Endometrial (Hysteroscopic) 68.23
> Heart (Conduction Defect) 27.33/2
> With Catheter 37.34/2
> Inner Ear (Cryosurgery) (Ultrasound) 20.79/4
> By Injection 20.72
> Lesion Heart
> By Peripherally Inserted Catheter 37.34
>
> Across my institution in the past there have been multiple "master"
> copies of ICD9 codes and index entries. The order came down that
> long-term we will work towards a single copy of ICD9 codes with index
> entries that will be accessed via webservices. The structure of the
> data in our old database was as follows (no line breaks -- each entry
> was one line):
>
> ABLATION ENDOMETRIAL (HYSTEROSCOPIC) 68.23
> ABLATION HEART (CONDUCTION DEFECT) 37.33/2
> ABLATION HEART (CONDUCTION DEFECT) WITH CATHETER 37.34/2
> ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) 20.79/4
> ABLATION INNER EAR (CRYOSURGERY) (ULTRASOUND) BY INJECTION 20.72
> ABLATION LESION HEART BY PERIPHERALLY INSERTED CATHETER 37.34
> ABLATION LESION HEART ENDOVASCULAR APPROACH 37.34
> ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) ENDOVASCULAR
> APPROACH 37.34
> ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) OPEN (TRANS-THORACIC)
> APPROACH 37.33
> ABLATION LESION HEART MAZE PROCEDURE (COX-MAZE) TRANS-THORACIC
> APPROACH 37.33
> ABLATION PITUITARY 7.69
> ABLATION PITUITARY BY COBALT-60 92.32
> ABLATION PITUITARY BY IMPLANTATION (STRONTIUM-YTTRIUM) (Y) NEC 92.39
> ABLATION PITUITARY BY PROTON BEAM (BRAGG PEAK) 92.33
> ABLATION PROSTATE (ANAT = 59.02) BY LASER, TRANSURETHRAL 60.21
> ABLATION PROSTATE (ANAT = 59.02) BY RADIOFREQUENCY THERMOTHERAPY
> 60.97
> ABLATION PROSTATE (ANAT = 59.02) BY TRANSURETHRAL NEEDLE ABLATION
> (TUNA) 60.97
> ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY CRYOABLATION 60.62
> ABLATION PROSTATE (ANAT = 59.02) PERINEAL BY RADICAL CRYOSURGICAL
> ABLATION (RCSA) 60.62
> ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL BY LASER 60.21
> ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL CRYOABLATION 60.29
> ABLATION PROSTATE (ANAT = 59.02) TRANSURETHRAL RADICAL CRYOSURGICAL
> ABLATION (RCSA) 60.29
> ABLATION TISSUE HEART - SEE ABLATION, LESION, HEART 0
> ABLATION VESICLE NECK (ANAT = 60.02) 57.91
>
>
> The new webservices still have this same index structure except now,
> for example, "Ablation Vesicle Neck (ANAT = 60.02)" is just a property
> of code 57.91. The surgical coders still want to view the index
> entries in a tree structure on demand. Without getting into
> mind-numbing details, I can jump through some hoops and get back a set
> of index entries that look like above for ABLATION but they are not
> formatted in the way the surgical coders desire. I believe I have a
> recursive algorithm that will work to format these into a tree
> structure but this algorithm is predicated on being able to find the
> nodes.
>
> If you look carefully, the root node for entire set of index entries
> above is "ABLATION" (as that is what begins each entry and repeats
> across all of them). Subsequently, Endometrial (Hysteroscopic) + code
> is a child of ABLATION with no children of its own because it is not
> repeated. Next, Heart (Conduction Defect) + code is a node with "With
> Catheter + code" as a child of that node because "Heart (Conduction
> Defect)" repeats across both those lines.
>
> I have begged the group that now owns the webservice to allow me to
> restructure the data but no go (they say that would be bastardizing the
> concept of everything being 'code-centric'). I am stuck with this and
> also with the demand by the coders that they get the formatted tree
> structure to look at when they code.
>
> In general, I think if I do the following I can figure out the nodes
> and children:
>
> 1. Read index entries until the first word changes.
> 2. Get the substring that begins the string and is repeated elsewhere
> in the string (this is the node).
> 3. Remove that node and keep processing until the base case is hit etc.
>
> If anyone has any better ideas of how to deal with this I would be
> thrilled to no end to hear them.
>
>
> Thanks,
>
> Mike
>
>
> Kevin Spencer wrote:
>> > I want to access the expression "HEART (CONDUCTION DEFECT)" I'll
>> > try
>> > your suggestion first off in the morning.
>>
>> First, "HEART (CONDUCTION DEFECT)" is not an expression. That is a
>> substring of the original string. The regular expression is the string
>> "^(.+)(?=\s*).* \1" that you are using to get your match. Assuming that
>> "HEART (CONDUCTION DEFECT)" is your match (which it is not), you
>> could
>> call
>> it a match for the regular expression (which may match more than once
>> in
>> a
>> string). But it is a substring of the original string. It may seem
>> picky,
>> but in order to communicate effectively, one must use the right terms.
>> As
>> an
>> example, if I told you that I ate a car for breakfast, would you know
>> that I
>> ate an apple?
>>
>> Second, the string you posted contains 2 instances of the substring
>> "HEART
>> (CONDUCTION DEFECT)". Do you want to get both of them? If so, what
>> exactly
>> are your pattern-matching rules? A regular expression matches a
>> pattern.
>> Obviously, not all of the strings you will be working with will be:
>>
>> " HEART (CONDUCTION
>> DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
>> CATHETER 37.34/2 "
>>
>> In fact, probably due to this being a newsgroup, and my using a
>> newsreader,
>> I would doubt that the line breaks in the string are where they are,
>> if
>> they
>> are. And I have to wonder whether the string actually begins and ends
>> with a
>> space.
>>
>> In other words, you're going to be using a regular expression to
>> isolate
>> substrings of various strings (most probably). A regular expression is
>> shorthand for a set of rules that defines a pattern you're looking
>> for.
>> Whether the strings contain line breaks, for example, is important.
>> Your
>> regular expression begins with the caret '^' character. This character
>> can
>> indicate the beginning of a string, or the beginning of a line *or* a
>> string, depending upon what options you use. You didn't specify the
>> option(s) you're using, so we have no way to know.
>>
>> In addition, your pattern is not likely to work in the way you expect.
>> for
>> example, the following would match:
>>
>> THIS IS NOT WHAT THIS IS SUPPOSED TO BE. (Matches the phrase "THIS IS
>> ")
>>
>> And in addition, if there are line breaks, like your example (as split
>> by
>> the newsreader), the matching substring would be:
>>
>> DEFECT) 37.33/2 HEART (CONDUCTION DEFECT)
>>
>> So, can you explain what your rules are, and what you are trying to
>> match
>> here? I'm just guessing that you're parsing medical transcriptions,
>> but
>> beyond that, I'm stumped.
>>
>> --
>> HTH,
>>
>> Kevin Spencer
>> Microsoft MVP
>> Professional Chicken Salad Alchemist
>>
>> I recycle.
>> I send everything back to the planet it came from.
>>
>>
>> "Mike" <ms********@cha rter.net> wrote in message
>> news:11******** **************@ p79g2000cwp.goo glegroups.com.. .
>> > Xicheng Jia wrote:
>> >> Mike wrote:
>> >> > I have a regular expression (^(.+)(?=\s*).* \1 ) that results in
>> >> > matches. I would like to get what the actual regular expression
>> >> > is.
>> >> > In other words, when I apply ^(.+)(?=\s*).*\ 1 to " HEART
>> >> > (CONDUCTION
>> >> > DEFECT) 37.33/2 HEART (CONDUCTION DEFECT) WITH
>> >> > CATHETER 37.34/2 " the expression is "HEART (CONDUCTION
>> >> > DEFECT)".
>> >> > How
>> >> > do I gain access to the expression (not the matches) at runtime?
>> >>
>> >> you want to access the expression "HEART (CONDUCTION DEFECT)" or
>> >> the
>> >> regex "^(.+)(?=\s*).* \1" at run-time?? dont think you can get
>> >> exactly
>> >> the latter one though, for the previous one, you can use named
>> >> capture,
>> >> like
>> >>
>> >> ^(?<expr>.+)(?= \s*).*\k<expr>
>> >>
>> >> and access the variable "expr" at run time?
>> >>
>> >> Xicheng
>> >
>> > I want to access the expression "HEART (CONDUCTION DEFECT)" I'll
>> > try
>> > your suggestion first off in the morning.
>> >
>> > Thanks,
>> >
>


Jun 22 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4173
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make regular expressions easier to create and use (and in my experience as a regular expression user, it makes them MUCH easier to create and use.) I'm still working on formal documentation, and in any case, such documentation isn't necessarily the...
4
5153
by: Buddy | last post by:
Can someone please show me how to create a regular expression to do the following My text is set to MyColumn{1, 100} Test I want a regular expression that sets the text to the following testMyColumn{1, 100}Test Basically I want the regular expression to add the word test infront of the
4
3223
by: Neri | last post by:
Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go over each document, find out if it contains a header and/or a footer and extract only the main content part. The headers and the footers have no specific format and I have to detect and remove them using a list of strings that may appear as...
11
5374
by: Dimitris Georgakopuolos | last post by:
Hello, I have a text file that I load up to a string. The text includes certain expression like {firstName} or {userName} that I want to match and then replace with a new expression. However, I want to use the text included within the brackets to do a lookup so that I can replace the expression with the new text. For example:
3
3213
by: James D. Marshall | last post by:
The issue at hand, I believe is my comprehension of using regular expression, specially to assist in replacing the expression with other text. using regular expression (\s*) my understanding is that this will one or more occurrences to replace all the white space between with a comma. This search ElseIf InStr(1, indivline, "$") Then insert a replace statement that uses the regular expression to find and replace all the white space...
7
3825
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I want to avoid that. My question here is if there is a way to pass either a memory stream or array of "find", "replace" expressions or any other way to avoid multiple copies of a string. Any help will be highly appreciated
9
3353
by: Pete Davis | last post by:
I'm using regular expressions to extract some data and some links from some web pages. I download the page and then I want to get a list of certain links. For building regular expressions, I use an app call The Regulator, which makes it pretty easy to build and test regular expressions. As a warning, I'm real weak with regular expressions. Let's say my regular expression is:
1
4380
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find the first regular expression that matches the string. I've gor the regular expressions ordered so that the highest priority is first (if two or more regular expressions match the string I want the first one returned) The code that does this has...
1
3400
by: NvrBst | last post by:
I want to use the .replace() method with the regular expression /^ %VAR % =,($|&)/. The following DOESN'T replace the "^default.aspx=,($|&)" regular expression with "": --------------------------------- myStringVar = myStringVar.replace("^" + iName + "=,($|&)", ""); --------------------------------- The following DOES replace it though: --------------------------------- var match = myStringVar.match("^" + iName + "=,($|&)");
0
8704
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9187
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9054
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8936
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8894
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7777
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5879
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4637
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3071
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.