473,387 Members | 1,528 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Text Processing Headache - Please Help.

Hi All,

I have been working on the following programme over the last day or so
and have made a good deal of progress. It is a very simple programme,
but is proving very useful as a learning aid, and will eventually be
useful to me in it's own right.

It function is to open a text file, and remove HTTP addresses from the
file. The file is always in a certain format, and the HTTP address is
always proceeded by a key phrase.

So far I have got as far as opening the file and removing all the junk
before the first site is listed. What I'm trying to do now is split the
remaining string into an array. I want to use a phrase as the
delimitter and reading this forum I found a previous poster had the
same problem and someone suggested him using the Regex function.

I've tried that but am getting a result I don't understand. When I
check the size of the array the Regex function has created for me it is
far too small.

The code i'm talking about is here: -

string[] sitearray = Regex.Split(shortenedstring, "num=");
MessageBox.Show(sitearray.Length.ToString());

Now when i open the file in notepad and count the number of 'num='
occurances there are between 7-10 in each file i test. But i'm getting
arrays of sizes sometimes as low as 3.

The full code to my programme is here (it's quite a simple programme!)

http://rafb.net/paste/results/b1p5uU35.html

I look forward to your feedback,

Thankyou,

Gary.

Nov 28 '06 #1
2 1607
Help us answer your question by giving sample input that demonstrates
the problem!

Greg
--
What light is to the eyes -- what air is to the lungs -- what love is to
the heart, liberty is to the soul of man.
-- Robert Green Ingersoll
Nov 28 '06 #2
On 28 Nov 2006 01:59:10 -0800, ga********@myway.com wrote:
>Hi All,

I have been working on the following programme over the last day or so
and have made a good deal of progress. It is a very simple programme,
but is proving very useful as a learning aid, and will eventually be
useful to me in it's own right.

It function is to open a text file, and remove HTTP addresses from the
file. The file is always in a certain format, and the HTTP address is
always proceeded by a key phrase.

So far I have got as far as opening the file and removing all the junk
before the first site is listed. What I'm trying to do now is split the
remaining string into an array. I want to use a phrase as the
delimitter and reading this forum I found a previous poster had the
same problem and someone suggested him using the Regex function.

I've tried that but am getting a result I don't understand. When I
check the size of the array the Regex function has created for me it is
far too small.

The code i'm talking about is here: -

string[] sitearray = Regex.Split(shortenedstring, "num=");
MessageBox.Show(sitearray.Length.ToString());

Now when i open the file in notepad and count the number of 'num='
occurances there are between 7-10 in each file i test. But i'm getting
arrays of sizes sometimes as low as 3.

The full code to my programme is here (it's quite a simple programme!)

http://rafb.net/paste/results/b1p5uU35.html

I look forward to your feedback,

Thankyou,

Gary.
Putting the guts of your program into a Console test it seemed to work
fine, except for the Regex.Split returning an initial null string.

Here is my version of your code (careful with the line wrap!):

class ConsoleScratch {

static int LocateStartOfSubString(string FullString, string
SubString) {
int FirstChr = FullString.IndexOf(SubString);
//SHOWS START POSITION OF SUBSTRING
return FirstChr;
}
static void Main() {

// Dummy file contents, easier for testing.
string filebuffer = "xxxxx sites num=first, num=second,
num=third, " +
"num=fourth, num=fifth, num=sixth,
num=seventh, " +
"num=eighth, num=ninth, num=tenth";

// cut off everything before adurl

string substring = "sites";
int mainindexofinterest =
LocateStartOfSubString(filebuffer, substring);
string strippedstring = filebuffer.Remove(0,
mainindexofinterest);
string shortenedstring = strippedstring.Remove(0, 5);
//remove the sites phrase
//MessageBox.Show(shortenedstring);
Console.WriteLine("Shortened string: >{0}<\n",
shortenedstring);

//int spaceafteraddressindex =
LocateStartOfSubString(shortenedstring, " ");
//string firstwebaddress =
shortenedstring.Remove(spaceafteraddressindex);

// You can use Trim to remove leading and trailing spaces:
shortenedstring = shortenedstring.Trim();
Console.WriteLine("Trimmed string: >{0}<\n",
shortenedstring);
string[] sitearray = Regex.Split(shortenedstring, "num=");
//MessageBox.Show(sitearray.Length.ToString());
Console.WriteLine("sitearray.Length = {0}",
sitearray.Length);
foreach (string s in sitearray) {
Console.WriteLine(" >{0}<", s);
}

Console.Write("Press [Enter] to continue... ");
Console.ReadLine();
} // end Main()
}

This gave the results:

Shortened string: num=first, num=second, num=third, num=fourth,
num=fifth, num=sixth, num=seventh, num=eighth, num=ninth, num=tenth<

Trimmed string: >num=first, num=second, num=third, num=fourth,
num=fifth, num=sixth, num=seventh, num=eighth, num=ninth, num=tenth<

sitearray.Length = 11
><
first, <
second, <
third, <
fourth, <
fifth, <
sixth, <
seventh, <
eighth, <
ninth, <
tenth<
Since your code seems to work as expected, I would think that the
problem might lie somewhere in your input file. Try changing the
input file in various ways to see if that has any effect. For
example, do you have "num =" instead of "num=" anywhere?

While testing it is also worth showing the full contents of sitearray,
which your original code did not do, so you can see what your program
is actually doing. That could well help with diagnosing the problem.

As an aside, full HTTP addresses will always start "http://" or
"HTTP://" with HTTPS addresses using "https://" or "HTTPS://" which
may help you.

rossum

Nov 28 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

27
by: Eric | last post by:
Assume that disk space is not an issue (the files will be small < 5k in general for the purpose of storing preferences) Assume that transportation to another OS may never occur. Are there...
11
by: Glen Wolinsky | last post by:
This is my first attempt as asynchronous processing. I have created a small test app as proof of concept, but I am having one proglem. In the example (code listed below), my callback routine has...
5
by: Ryan Ternier | last post by:
I'm having an issue with an SQL insert statement. It's a very simple statement, and it's causing too much fuss. strSQL = "INSERT INTO tblFieldLayouts(TypeID, FieldID, OrderID, Hidden) VALUES("...
3
by: Echo | last post by:
Hi all. Sorry about the crude headline but this problem is really giving me a headache. See I am currently deloping an app in Visual Studios 2005 C# Lanuage and the thing is like this: I have a...
11
by: Ron L | last post by:
I have a barcode scanner which uses a "keyboard wedge" program so that the data it scans comes through as if it was typed on a keyboard. I am trying to have the data in the barcode be displayed in...
9
by: jazzslider | last post by:
I have a headache. I've done a LOT of research lately into XForms, and I am thoroughly convinced that a good implementation of this technology would help me immensely in converting my...
2
by: plemon | last post by:
this script was working now ive added to it and its no longer my friend. here is the very first one <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"...
0
by: JosAH | last post by:
Greetings, Introduction At the end of the last Compiler article part I stated that I wanted to write about text processing. I had no idea what exactly to talk about; until my wife commanded...
3
by: Rinaldo | last post by:
Hi, I have a label on my dialogbox who has to change text while running. This is what I do: lblBackup.Text = "Bezig met de backup naar " + F1.FTPserver; but the text does'nt appear, only if...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.