I'm working on a data file and can't find any common delimmiters in the
file to indicate the end of one row of data and the start of the next.
Rows are not on individual lines but run accross multiple lines.
It would appear though that every distinct set of data starts with a
'code' that is always the 25 characters long. The text is variable
however.
Assuming i've read the contents of the file into the string myfile, how
do i split my file into an array, using this variable text, fixed 25
character long, delimiter?
Thankyou!
Gary- 24 4769
Hello,
>Assuming i've read the contents of the file into the string myfile, how do i split my file into an array, using this variable text, fixed 25 character long, delimiter?
You should probably be able to use Regex.Split(...), with a good regular
expression of course. I can give you help on writing that regular
expression, but I'll have to know a lot more about the delimiter string.
Oliver Sturm
-- http://www.sturmnet.org/blog
How do *you* know it's a delimiter and not data?
In other words, if *I* were to look at the file, knowing nothing about it,
how could I tell what was a delimiter and what was data? How would you
explain to me what to look for?
When you can answer that, you can start thinking about how to pass that
information to a machine.
HTH
Peter
<ga********@myway.comwrote in message
news:11**********************@16g2000cwy.googlegro ups.com...
I'm working on a data file and can't find any common delimmiters in the
file to indicate the end of one row of data and the start of the next.
Rows are not on individual lines but run accross multiple lines.
It would appear though that every distinct set of data starts with a
'code' that is always the 25 characters long. The text is variable
however.
Assuming i've read the contents of the file into the string myfile, how
do i split my file into an array, using this variable text, fixed 25
character long, delimiter?
Thankyou!
Gary-
Thankyou for your replies. OK I have had another look at i think the
task has just got harder. The length isn't always 25 characters. But I
have found a pattern, hopefully this will help.
I am using this 'code' as a delimmiter because it always proceeds the
name of an item, and this file is essentially a database of items.
Following the name of an item, a number of item characteristcs specific
to that item are listed. Eventually the items characteristics are
completely listed and the next 'code' is encountered which proceeds the
next item in the database.
There does seem to be some identifiable traits of this code.
It appears to be always at least 20 characters long.
- The code is continuous there are no spaces present.
- It is always composed of letters ranging from A-Z, or numbers 0-9.
- The first two characters of this code are always letters raning from
A-Z.
- These two letters are repeated at least two other times during the
code.
e.g.
DODE86DODE86SZDO010144
So I guess what I am trying to do now is split the string, every time a
a string in encountered that is at least 20 characters long, is alpha
numeric, and has the first two letters repeated initself at least two
other times.
I think this is going to be tough?
Any ideas?
Thankyou-
Peter Bradley wrote:
How do *you* know it's a delimiter and not data?
In other words, if *I* were to look at the file, knowing nothing about it,
how could I tell what was a delimiter and what was data? How would you
explain to me what to look for?
When you can answer that, you can start thinking about how to pass that
information to a machine.
HTH
Peter
<ga********@myway.comwrote in message
news:11**********************@16g2000cwy.googlegro ups.com...
I'm working on a data file and can't find any common delimmiters in the
file to indicate the end of one row of data and the start of the next.
Rows are not on individual lines but run accross multiple lines.
It would appear though that every distinct set of data starts with a
'code' that is always the 25 characters long. The text is variable
however.
Assuming i've read the contents of the file into the string myfile, how
do i split my file into an array, using this variable text, fixed 25
character long, delimiter?
Thankyou!
Gary-
In case it wasn't obvious I would also like to add that the code has at
least one space at the start of it and the end of it. ga********@myway.com wrote:
Thankyou for your replies. OK I have had another look at i think the
task has just got harder. The length isn't always 25 characters. But I
have found a pattern, hopefully this will help.
I am using this 'code' as a delimmiter because it always proceeds the
name of an item, and this file is essentially a database of items.
Following the name of an item, a number of item characteristcs specific
to that item are listed. Eventually the items characteristics are
completely listed and the next 'code' is encountered which proceeds the
next item in the database.
There does seem to be some identifiable traits of this code.
It appears to be always at least 20 characters long.
- The code is continuous there are no spaces present.
- It is always composed of letters ranging from A-Z, or numbers 0-9.
- The first two characters of this code are always letters raning from
A-Z.
- These two letters are repeated at least two other times during the
code.
e.g.
DODE86DODE86SZDO010144
So I guess what I am trying to do now is split the string, every time a
a string in encountered that is at least 20 characters long, is alpha
numeric, and has the first two letters repeated initself at least two
other times.
I think this is going to be tough?
Any ideas?
Thankyou-
Peter Bradley wrote:
How do *you* know it's a delimiter and not data?
In other words, if *I* were to look at the file, knowing nothing about it,
how could I tell what was a delimiter and what was data? How would you
explain to me what to look for?
When you can answer that, you can start thinking about how to pass that
information to a machine.
HTH
Peter
<ga********@myway.comwrote in message
news:11**********************@16g2000cwy.googlegro ups.com...
I'm working on a data file and can't find any common delimmiters in the
file to indicate the end of one row of data and the start of the next.
Rows are not on individual lines but run accross multiple lines.
>
It would appear though that every distinct set of data starts with a
'code' that is always the 25 characters long. The text is variable
however.
>
Assuming i've read the contents of the file into the string myfile, how
do i split my file into an array, using this variable text, fixed 25
character long, delimiter?
>
Thankyou!
>
Gary-
>
There does seem to be some identifiable traits of this code.
It appears to be always at least 20 characters long.
- The code is continuous there are no spaces present.
- It is always composed of letters ranging from A-Z, or numbers 0-9.
- The first two characters of this code are always letters raning from
A-Z.
- These two letters are repeated at least two other times during the
code.
e.g.
DODE86DODE86SZDO010144
So I guess what I am trying to do now is split the string, every time a
a string in encountered that is at least 20 characters long, is alpha
numeric, and has the first two letters repeated initself at least two
other times.
Who created this file? Are there no documentation which describes its
format? Can you post a sample of the data that shows at least 2
complete "records" or items? Is there anything in the file, perhaps a
header of some sort, that can shed any light on the format?
Chris
Hello,
>DODE86DODE86SZDO010144
So I guess what I am trying to do now is split the string, every time a a string in encountered that is at least 20 characters long, is alpha numeric, and has the first two letters repeated initself at least two other times.
Yes, well... you could try using a regular expression such as this:
[ ]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]
(This could also be simplified a bit.)
This does evaluate the double repetition of the initial two characters,
but it can't check the maximum length of the string at the same time. If
you'd just be searching the text in question for occurrences of the
expression, you could easily write an additional check in code, to find
out whether any given string has the correct maximum length. But if you're
using this expression in a Split() call, you couldn't do that...
Personally I would probably still use an expression such as this, to
search for that is, and do the splitting myself. If you can't do the
splitting fully automatically, you'll have to do it yourself in any case -
and using a regular expression to do the delimiter searching seems a
better option to me than coding up the search in C#.
Oliver Sturm
-- http://www.sturmnet.org/blog
I agree, that's' wonky sounding data :) How about something like this?
It's not quite what you want, but might give you a start. currently it
just finds the codes as you've described them and returns them, but
I've got to do some real work so...
I'm not great with Regular expressions so I've only used one to check
the first two characters occur three times. Oh and it's very scrappy.
private void fooTest2()
{
foreach(string s in
foo2(",,,12tt12ttt12ttttttttt,,ab111ab11111111111a b,"))
{
Console.WriteLine(s);
}
}
private System.Collections.ArrayList foo2(string pFoo)
{
int i;
int j;
int o=0;
int p=0;
System.Text.RegularExpressions.Regex r;
bool running=true;
char[] c;
String s;
System.Collections.ArrayList a = new ArrayList();
c=pFoo.ToCharArray();
for(i=0, j=0; i<c.Length ; i=j)
{
for(;j<c.Length;j++)
{
if(IsAN(c[j]))
{
if(running)
{
p++;
}
else
{
running = true;
p=1;
o=j;
}
}
else
{
running = false;
}
if(20 == p)
{
r = new
System.Text.RegularExpressions.Regex(pFoo.Substrin g(o,2));
s=pFoo.Substring(o,j-o+1);
if(3 == r.Matches(s).Count)
{
p=0;
running = false;
a.Add(s);
}
else
{
running=false;
}
}
}
}
return a;
}
private bool IsAN(char pC)
{
char c = pC.ToString().ToUpper().ToCharArray()[0];
if('A' <= c && 'Z' >= c)
{
return true;
}
if('0' <= pC && '9' >= pC)
{
return true;
}
return false;
}
Hi it's used in a custom written programme where I work which is dos
based.
The developers have long since dissapeared.
I'd really like to know how to achieve this in code if possible,
Thanks,
Gary-
There does seem to be some identifiable traits of this code.
It appears to be always at least 20 characters long.
>
- The code is continuous there are no spaces present.
- It is always composed of letters ranging from A-Z, or numbers 0-9.
- The first two characters of this code are always letters raning from
A-Z.
- These two letters are repeated at least two other times during the
code.
>
e.g.
>
DODE86DODE86SZDO010144
>
So I guess what I am trying to do now is split the string, every time a
a string in encountered that is at least 20 characters long, is alpha
numeric, and has the first two letters repeated initself at least two
other times.
Who created this file? Are there no documentation which describes its
format? Can you post a sample of the data that shows at least 2
complete "records" or items? Is there anything in the file, perhaps a
header of some sort, that can shed any light on the format?
Chris
This group is amazing. Thankyou both very much, i'm going to explore
them both now.
Oliver Sturm wrote:
Hello,
DODE86DODE86SZDO010144
So I guess what I am trying to do now is split the string, every time a
a string in encountered that is at least 20 characters long, is alpha
numeric, and has the first two letters repeated initself at least two
other times.
Yes, well... you could try using a regular expression such as this:
[ ]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]
(This could also be simplified a bit.)
This does evaluate the double repetition of the initial two characters,
but it can't check the maximum length of the string at the same time. If
you'd just be searching the text in question for occurrences of the
expression, you could easily write an additional check in code, to find
out whether any given string has the correct maximum length. But if you're
using this expression in a Split() call, you couldn't do that...
Personally I would probably still use an expression such as this, to
search for that is, and do the splitting myself. If you can't do the
splitting fully automatically, you'll have to do it yourself in any case -
and using a regular expression to do the delimiter searching seems a
better option to me than coding up the search in C#.
Oliver Sturm
-- http://www.sturmnet.org/blog
I am trying to use the regular expression that Oliver kindly provided
as a starting point.
filecontents is a string that contains my file contents. But i cant get
this to work. I added the @ in as i was getting an error that it didn't
recognise the escape sequence, but it still isn't working. How can i
fix this? Thankyou.
Im getting an error at Regex.Split(...)
Regex r = new Regex(@"[
]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]");
Regex.Split(filecontents, r);
MessageBox.Show(filecontents.Length.ToString());
Thankyou
try
string[] matches = r.Split(filecontents); Assuming filecontents is the
text we're searching. ga********@myway.com wrote:
I am trying to use the regular expression that Oliver kindly provided
as a starting point.
filecontents is a string that contains my file contents. But i cant get
this to work. I added the @ in as i was getting an error that it didn't
recognise the escape sequence, but it still isn't working. How can i
fix this? Thankyou.
Im getting an error at Regex.Split(...)
Regex r = new Regex(@"[
]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]");
Regex.Split(filecontents, r);
MessageBox.Show(filecontents.Length.ToString());
Thankyou
Thankyou developer x, i'm not getting the desired result. Then i
realised i shouldn't be using @ as that will just negate the escape
characters.
The regex doesn't like \1 any suggestions what this should be changed
to?
Thanks,
Gary-
DeveloperX wrote:
try
string[] matches = r.Split(filecontents); Assuming filecontents is the
text we're searching.
ga********@myway.com wrote:
I am trying to use the regular expression that Oliver kindly provided
as a starting point.
filecontents is a string that contains my file contents. But i cant get
this to work. I added the @ in as i was getting an error that it didn't
recognise the escape sequence, but it still isn't working. How can i
fix this? Thankyou.
Im getting an error at Regex.Split(...)
Regex r = new Regex(@"[
]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]");
Regex.Split(filecontents, r);
MessageBox.Show(filecontents.Length.ToString());
Thankyou
Bizarre, I pasted the regex into my little test app with the @ of
course and it fires
foo3(",,,12tt12ttt12ttttttttt,, ABCCCABCCCCCCCCCABCC ,");
private void foo3(string pFoo)
{
System.Text.RegularExpressions.Regex r = new
System.Text.RegularExpressions.Regex(@"[
]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]");
Console.WriteLine(r.Matches(pFoo).Count.ToString() );
string[] s = r.Split(pFoo);
}
the \1 refers to the first group ]([A-Z][A-Z]) so what this regex is
saying is match a space then XX then any combination of X or N then the
XX found earlier, then more XX or N then our original XX again followed
by more X or N then a space iirc. X - A-Z, N = 0-9.
You might also wish to look at
System.Text.RegularExpressions.RegexOptions enum which sets things like
case sensitivity, multi line support and so forth. As you can see above
I didn't set anything and just took the defaults.
What is the actual error you got?
garyuse...@myway.com wrote:
Thankyou developer x, i'm not getting the desired result. Then i
realised i shouldn't be using @ as that will just negate the escape
characters.
The regex doesn't like \1 any suggestions what this should be changed
to?
Thanks,
Gary-
DeveloperX wrote:
try
string[] matches = r.Split(filecontents); Assuming filecontents is the
text we're searching. ga********@myway.com wrote:
I am trying to use the regular expression that Oliver kindly provided
as a starting point.
filecontents is a string that contains my file contents. But i cant get
this to work. I added the @ in as i was getting an error that it didn't
recognise the escape sequence, but it still isn't working. How can i
fix this? Thankyou.
>
Im getting an error at Regex.Split(...)
>
Regex r = new Regex(@"[
]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]");
Regex.Split(filecontents, r);
>
MessageBox.Show(filecontents.Length.ToString());
>
Thankyou
Hello,
>Thankyou developer x, i'm not getting the desired result. Then i realised i shouldn't be using @ as that will just negate the escape characters.
No, using the @ should be just fine, I usually do that myself.
>The regex doesn't like \1 any suggestions what this should be changed to?
That's if you don't use the @, right?
I'm not really sure what the problem might be - of course my expression is
working with a lot of assumptions that you and I have been making in this
discussion, so accordingly there may be a lot of reasons why you're not
"getting the desired results" :-)
I checked that my expression worked with the delimiter string you
previously posted, but nothing else of course. If you can post further
examples of the delimiter string, maybe that would help... otherwise, feel
free to send me a sample program or a sample data file by email (I think
attachments can't be posted to this group?) or something and I'll have a
look.
Oliver Sturm
-- http://www.sturmnet.org/blog
Thankyou DeveloperX. I don't get an error with the @ only when i remove
the @.
I removed the @ because when i include it the result isn't what i
expected.
If i run this with the @ and then check the length of arraylist its
3055.
now the sample file im running it on has three of these 'codes' and
three rows of data.
e.g.
HUa82ab8HU272ajHUeje <lots of other text here running over multiple
linesUNa8723oansjaUNasUNa <more text here running over many lines>
IN8aatjresINiys9aINsa <more text here>
Now i thought i would get all the text between <...into individual
arraylist elements by running this but i'm not... what am i doing
wrong?
Thankyou
Gary-
I was expecting each part of my arraylist
e.g. [0], [1], ...
to contain everything between a set of codes.
DeveloperX wrote:
Bizarre, I pasted the regex into my little test app with the @ of
course and it fires
foo3(",,,12tt12ttt12ttttttttt,, ABCCCABCCCCCCCCCABCC ,");
private void foo3(string pFoo)
{
System.Text.RegularExpressions.Regex r = new
System.Text.RegularExpressions.Regex(@"[
]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]");
Console.WriteLine(r.Matches(pFoo).Count.ToString() );
string[] s = r.Split(pFoo);
}
the \1 refers to the first group ]([A-Z][A-Z]) so what this regex is
saying is match a space then XX then any combination of X or N then the
XX found earlier, then more XX or N then our original XX again followed
by more X or N then a space iirc. X - A-Z, N = 0-9.
You might also wish to look at
System.Text.RegularExpressions.RegexOptions enum which sets things like
case sensitivity, multi line support and so forth. As you can see above
I didn't set anything and just took the defaults.
What is the actual error you got?
garyuse...@myway.com wrote:
Thankyou developer x, i'm not getting the desired result. Then i
realised i shouldn't be using @ as that will just negate the escape
characters.
The regex doesn't like \1 any suggestions what this should be changed
to?
Thanks,
Gary-
DeveloperX wrote:
try
string[] matches = r.Split(filecontents); Assuming filecontents is the
text we're searching.
> ga********@myway.com wrote:
I am trying to use the regular expression that Oliver kindly provided
as a starting point.
filecontents is a string that contains my file contents. But i cant get
this to work. I added the @ in as i was getting an error that it didn't
recognise the escape sequence, but it still isn't working. How can i
fix this? Thankyou.
Im getting an error at Regex.Split(...)
Regex r = new Regex(@"[
]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]");
Regex.Split(filecontents, r);
MessageBox.Show(filecontents.Length.ToString());
Thankyou
Here's foo3 again with some extra code. The interesting bit is the for
loop at the end. If you print out what it's matching and the position
in the source data we can see what's going on. Is it feasable to post
the test data?
On the @ thing, c# uses escape characters in strings so \ followed by a
character has different meanings. \t is tab (iirc) What the @ does
before the string is tell the compiler that everything in the quotes is
now a literal string and it shouldn't get fancy and try and replace \1
with what it things \1 should mean (or crash when it doesn't know what
it is :)).
private void foo3(string pFoo)
{
System.Text.RegularExpressions.Regex r = new
System.Text.RegularExpressions.Regex(@"[
]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]");
Console.WriteLine(r.Matches(pFoo).Count.ToString() );
string[] s = r.Split(pFoo);
//Console.WriteLine(s[0]);
//Console.WriteLine(s[1]);
System.Text.RegularExpressions.MatchCollection c = r.Matches(pFoo);
foreach(System.Text.RegularExpressions.Match m in c)
{
//Console.WriteLine(m.Index.ToString());
}
System.Text.RegularExpressions.Match a;
for(a=r.Match(pFoo);a.Success; a=a.NextMatch())
{
Console.WriteLine(a.Index.ToString() + " " + a.Value);
}
}
garyuse...@myway.com wrote:
Thankyou DeveloperX. I don't get an error with the @ only when i remove
the @.
I removed the @ because when i include it the result isn't what i
expected.
If i run this with the @ and then check the length of arraylist its
3055.
now the sample file im running it on has three of these 'codes' and
three rows of data.
e.g.
HUa82ab8HU272ajHUeje <lots of other text here running over multiple
linesUNa8723oansjaUNasUNa <more text here running over many lines>
IN8aatjresINiys9aINsa <more text here>
Now i thought i would get all the text between <...into individual
arraylist elements by running this but i'm not... what am i doing
wrong?
Thankyou
Gary-
I was expecting each part of my arraylist
e.g. [0], [1], ...
to contain everything between a set of codes.
DeveloperX wrote:
Bizarre, I pasted the regex into my little test app with the @ of
course and it fires
foo3(",,,12tt12ttt12ttttttttt,, ABCCCABCCCCCCCCCABCC ,");
private void foo3(string pFoo)
{
System.Text.RegularExpressions.Regex r = new
System.Text.RegularExpressions.Regex(@"[
]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]");
Console.WriteLine(r.Matches(pFoo).Count.ToString() );
string[] s = r.Split(pFoo);
}
the \1 refers to the first group ]([A-Z][A-Z]) so what this regex is
saying is match a space then XX then any combination of X or N then the
XX found earlier, then more XX or N then our original XX again followed
by more X or N then a space iirc. X - A-Z, N = 0-9.
You might also wish to look at
System.Text.RegularExpressions.RegexOptions enum which sets things like
case sensitivity, multi line support and so forth. As you can see above
I didn't set anything and just took the defaults.
What is the actual error you got?
garyuse...@myway.com wrote:
Thankyou developer x, i'm not getting the desired result. Then i
realised i shouldn't be using @ as that will just negate the escape
characters.
>
The regex doesn't like \1 any suggestions what this should be changed
to?
>
Thanks,
>
Gary-
>
DeveloperX wrote:
>
try
string[] matches = r.Split(filecontents); Assuming filecontents is the
text we're searching. ga********@myway.com wrote:
I am trying to use the regular expression that Oliver kindly provided
as a starting point.
filecontents is a string that contains my file contents. But i cant get
this to work. I added the @ in as i was getting an error that it didn't
recognise the escape sequence, but it still isn't working. How can i
fix this? Thankyou.
>
Im getting an error at Regex.Split(...)
>
Regex r = new Regex(@"[
]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]");
Regex.Split(filecontents, r);
>
MessageBox.Show(filecontents.Length.ToString());
>
Thankyou
Hello,
>HUa82ab8HU272ajHUeje <lots of other text here running over multiple linesUNa8723oansjaUNasUNa <more text here running over many lines> IN8aatjresINiys9aINsa <more text here>
Now i thought i would get all the text between <...into individual arraylist elements by running this but i'm not... what am i doing wrong?
An obvious thing could be to use RegexOptions.IgnoreCase in your call to
Split() - your original delimiter didn't have any lower case characters,
but those you're posting now do.
Apart from that - either describe in much more detail how your code works
now and what result you're actually getting, or post or mail something
that lets us reproduce the problem ourselves.
Oliver Sturm
-- http://www.sturmnet.org/blog
Emailed a sample, thanks very much.
Oliver Sturm wrote:
Hello,
Thankyou developer x, i'm not getting the desired result. Then i
realised i shouldn't be using @ as that will just negate the escape
characters.
No, using the @ should be just fine, I usually do that myself.
The regex doesn't like \1 any suggestions what this should be changed
to?
That's if you don't use the @, right?
I'm not really sure what the problem might be - of course my expression is
working with a lot of assumptions that you and I have been making in this
discussion, so accordingly there may be a lot of reasons why you're not
"getting the desired results" :-)
I checked that my expression worked with the delimiter string you
previously posted, but nothing else of course. If you can post further
examples of the delimiter string, maybe that would help... otherwise, feel
free to send me a sample program or a sample data file by email (I think
attachments can't be posted to this group?) or something and I'll have a
look.
Oliver Sturm
-- http://www.sturmnet.org/blog ga********@myway.com wrote:
Hi it's used in a custom written programme where I work which is dos
based.
The developers have long since dissapeared.
I'd really like to know how to achieve this in code if possible,
That's why I suggested posting a few "records" from this file so we
could see it and maybe help determine its format. Does the file start
with this data immediately or is there any header data in the beginning
of the file?
There does seem to be some identifiable traits of this code.
It appears to be always at least 20 characters long.
- The code is continuous there are no spaces present.
- It is always composed of letters ranging from A-Z, or numbers 0-9.
- The first two characters of this code are always letters raning from
A-Z.
- These two letters are repeated at least two other times during the
code.
e.g.
DODE86DODE86SZDO010144
So I guess what I am trying to do now is split the string, every time a
a string in encountered that is at least 20 characters long, is alpha
numeric, and has the first two letters repeated initself at least two
other times.
Who created this file? Are there no documentation which describes its
format? Can you post a sample of the data that shows at least 2
complete "records" or items? Is there anything in the file, perhaps a
header of some sort, that can shed any light on the format?
Chris
Hello,
>Emailed a sample, thanks very much.
Replied by email. Just a quick summary of what was wrong with your
previous code.
Looking at the regex I had posted previously:
[ ]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]
This regex has a number of groups that I had added for test purposes. It
can be stripped down to this without any changes:
[ ]([A-Z][A-Z])[A-Z0-9]+\1[A-Z0-9]+\1[A-Z0-9]+[ ]
It still has one capture group that is absolutely necessary to make the
back reference work. The Regex.Split method has the peculiar behaviour of
adding the result of the capture group itself to the string array it
returns, and there doesn't seem to be a way around that. So in the sample
program I sent you, I used the matching functionality of the Regex class
instead and picked out the pieces from the string "manually".
All this is probably not the most efficient algorithm in the world -
including the idea of reading the whole 14MB file into a string - but I
wouldn't expect any big performance problems on a modern system... if
performance is important, there are certainly lots of optimizations that
can be done.
Oliver Sturm
-- http://www.sturmnet.org/blog
You are a gentleman and a scholar sir, I'm going to spend a good couple
of days reading over the code in your email when it arrives - until I
become confident with these techniques.
Regex is very new to me, I would have been completely lost without your
help.
Many, many, thanks again,
Gary-
Oliver Sturm wrote:
Hello,
Emailed a sample, thanks very much.
Replied by email. Just a quick summary of what was wrong with your
previous code.
Looking at the regex I had posted previously:
[ ]([A-Z][A-Z])([A-Z0-9]+)(\1)([A-Z0-9]+)(\1)([A-Z0-9]+)[ ]
This regex has a number of groups that I had added for test purposes. It
can be stripped down to this without any changes:
[ ]([A-Z][A-Z])[A-Z0-9]+\1[A-Z0-9]+\1[A-Z0-9]+[ ]
It still has one capture group that is absolutely necessary to make the
back reference work. The Regex.Split method has the peculiar behaviour of
adding the result of the capture group itself to the string array it
returns, and there doesn't seem to be a way around that. So in the sample
program I sent you, I used the matching functionality of the Regex class
instead and picked out the pieces from the string "manually".
All this is probably not the most efficient algorithm in the world -
including the idea of reading the whole 14MB file into a string - but I
wouldn't expect any big performance problems on a modern system... if
performance is important, there are certainly lots of optimizations that
can be done.
Oliver Sturm
-- http://www.sturmnet.org/blog
Thanks again Oliver, i'm just working through that code today. I
understand (at least at a very basic level) what most of the code is
doing. With the exception of the following line: -
string content = i == matches.Count - 1 ?
could you explain that line for me please,
Thank you,
Gary-
Hello,
>Thanks again Oliver, i'm just working through that code today. I understand (at least at a very basic level) what most of the code is doing. With the exception of the following line: -
string content = i == matches.Count - 1 ?
could you explain that line for me please,
It actually continues to say
string content = i == matches.Count - 1 ?
text.Substring(match.Index + match.Length) :
text.Substring(match.Index + match.Length, matches[i + 1].Index - match.Index - match.Length);
Sorry I used this - it's not the most widely understood or liked
construct. The whole thing is called a ternary expression and it's a
slightly shorter way of saying
if (i == matches.Count - 1)
content = text.Substring(match.Index + match.Length);
else
content = text.Substring(match.Index + match.Length, matches[i + 1].Index - match.Index - match.Length);
Oliver Sturm
-- http://www.sturmnet.org/blog This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Stu Cazzo |
last post by:
I have the following:
String myStringArray;
String myString = "98 99 100";
I want to split up myString and put it into myStringArray.
If I use this:
myStringArray = myString.split(" ");
it...
|
by: Sandfordc |
last post by:
I have tried several time to do this but have been unsucessful.
I tried something like:
myFunction(charater)
str=frm.s1.value
sb1=str.substring(0,charater)...
|
by: Paul |
last post by:
hi, there,
for example,
char *mystr="##this is##a examp#le";
I want to replace all the "##" in mystr with "****". How can I do this?
I checked all the string functions in C, but did not...
|
by: AlanL |
last post by:
I am calling a Fortran DLL that has a declaration like: character * 260
variablename. I do not have the Fortran code. A path to a file is
passed to the DLL.
C# does not have a fixed character...
|
by: blue |
last post by:
I have a table that I want to have a set width and height. The table has a
title and the text may be larger than the width of the table. If the text
is too large to fit into the table, it either...
|
by: Neil Robbins |
last post by:
I have a text file that stores a number of records that I need to access in
a vb.net application. Each of the fields that make up a record are of a
fixed number of bytes. So for instance there is...
|
by: Martin Herbert Dietze |
last post by:
Hi,
I need to calculate the physical length of text in a text
input. The term "physical" means in this context, that I
consider 7bit-Ascii as one-byte-per character. Other characters
may be...
|
by: Garima12 |
last post by:
Hi,
There is a string like:
parcel = 1 AND id = 546 OR shape = 'point'
I want to split this string with AND and OR
I wrote split function like:
var contentarr = qString2.split("AND");
it...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: Aliciasmith |
last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
|
by: NeoPa |
last post by:
Hello everyone.
I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report).
I know it can be done by selecting :...
|
by: NeoPa |
last post by:
Introduction
For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
|
by: Teri B |
last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course.
0ne-to-many. One course many roles.
Then I created a report based on the Course form and...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM)
Please note that the UK and Europe revert to winter time on...
|
by: isladogs |
last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, Mike...
|
by: GKJR |
last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...
| | |