473,811 Members | 3,532 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Trying to find substring efficiently.

14 New Member
Hi,
I am not sure where I am going wrong with this code.
It seems to work fine for a small text file but when I use files larger than 100MB it does not give me an accurate count.
The program is pretty simple. It should return the number of occurences of a substring from within a text file. I look for a match to the first character of the substring and if found then my code should test to see if the whole substring exists.
I have run several tests but for some reason I am not getting an accurate result. I used the windowsupdate.l og file. My code trturned 34596 hits but when I used MS Word to find the number of occurences it returned 50096. Obviously there is something wrong with my code but I cant seem to figure it out.
Any help would be highly appreciated. Thanks a lot.
Rob

here is the code:
Expand|Select|Wrap|Line Numbers
  1. Imports System.IO
  2. Imports System.Diagnostics
  3. Imports System.Threading.Tasks
  4. Imports System.Text
  5.  
  6.  
  7.  
  8. Module Module1
  9.  
  10.     Sub Main()
  11.         Dim sw As New Stopwatch
  12.         sw.Start()
  13.         Dim fs As New FileStream("C:\mpi\windowsupdate.log", FileMode.Open, FileAccess.Read)
  14.         Dim br As New StreamReader(fs)
  15.         Dim searchterm As String = "WINDOWS"
  16.         Dim bytesTerm As Byte() = Encoding.ASCII.GetBytes(searchterm)
  17.         Dim WordLen As Integer = searchterm.Length - 1
  18.         Dim matchcount As Integer = 0
  19.         Dim matches As Boolean = False
  20.         Dim c As Byte
  21.         Console.WriteLine("Processing...")
  22.         While br.Peek <> -1
  23.             c = br.Read
  24.             If c = bytesTerm(0) Then
  25.                 matches = True
  26.                 For i = 1 To WordLen
  27.                     c = br.Read
  28.                     If c = bytesTerm(i) Then
  29.                         matches = True
  30.                     Else
  31.                         matches = False
  32.                     End If
  33.                 Next
  34.                 If matches = True Then
  35.                     matchcount += 1
  36.                     matches = False
  37.                 End If
  38.             End If
  39.         End While
  40.         sw.Stop()
  41.         Console.WriteLine("Total matches:" & matchcount)
  42.         Console.WriteLine("Total time:" & sw.Elapsed.Seconds)
  43.         Console.ReadLine()
  44.         br.Close()
  45.         fs.Close()
  46.  
  47.     End Sub
  48.  
  49. End Module
  50.  
Jan 5 '11 #1
3 1403
Rabbit
12,516 Recognized Expert Moderator MVP
A couple of things. You need to exit out of that For loop once you hit a False. Otherwise, if the last letter matches, it's going to return a hit. Second, you're only matching on capital letters, what about lowercase?
Jan 5 '11 #2
Rob S
14 New Member
Hey Rabbit,
Thanks a lot for your quick response. I made the changes you suggested and it works fine now. As you noticedthe search is case sensetive and will only return an exact hit which is what i was looking for anyways. I might modify the algorithm at some point to work with regular expressions as well.
By the way, have you ever worked with MPI or any kind of distributed applications?
Take care.
Rob
Jan 5 '11 #3
Rabbit
12,516 Recognized Expert Moderator MVP
Sorry, can't say that I have.
Jan 5 '11 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

5
1844
by: MyHaz | last post by:
OK i find this a quark in string.find that i think needs some consideration. Normally if a substring is not in the searched_string then string.find returns -1 (why not len(searched_string) + 1, i don't know but nevermind that) but what if a searched_string == '' ? Then string.find(substring,searched_string) == 0 ? example:
37
2146
by: Anony | last post by:
Hi All, I'm trying to chunk a long string SourceString into lines of LineLength using this code: Dim sReturn As String = "" Dim iPos As Integer = 0 Do Until iPos >= SourceString.Length - LineLength sReturn += SourceString.Substring(iPos, LineLength) + vbCrLf iPos += LineLength
14
22586
by: micklee74 | last post by:
hi say i have string like this astring = 'abcd efgd 1234 fsdf gfds abcde 1234' if i want to find which postion is 1234, how can i achieve this...? i want to use index() but it only give me the first occurence. I want to know the positions of both "1234" thanks
0
1247
by: vqthomf | last post by:
Hi I am new to VB I am trying to find a function to find a folder size, also I am trying find and list any duplicated files. If anybody can help it would be much appreciated. Charles
11
51831
by: Johny | last post by:
Is there a good way how to use string.find function to find a substring if I need to you case insensitive substring? Thanks for reply LL
1
5533
by: benhoefer | last post by:
I have been searching around and have not been able to find any info on this. I have a unique situation where I need a case sensitive map: std::map<string, intimap; I need to be able to run a find on this map with a case sensitive AND case insensitive search. I need to be able to change this dynamically during execution. Is this possible? Any thoughts on this? I understand that I can make the map case insensitive, but that is not...
3
1603
nirmalsingh
by: nirmalsingh | last post by:
hai all, i am using c#.net. i am having a string as strData= "0108"; i want to get "01" and "08" seperately. how to get this? thanx in advance. with Cheers Nirmal.
4
2806
by: sandvet03 | last post by:
I am trying to expand on a earlier program for counting subs and now i am trying to replace substrings within a given string. For example if the main string was "The cat in the hat" i am trying to find a chosen substring lets say "cat" and then replace it with a difrent inputed substring say "dog". Tried to get as far as i could on my own but need some help, sugestions, snippets or guidence. # include<stdio.h> # include<math.h> #...
9
2532
by: tinnews | last post by:
What's the neatest and/or most efficient way of testing if one of a set of strings (contained in a dictionary, list or similar) is a sub-string of a given string? I.e. I have a string delivered into my program and I want to see if any of a set of strings is a substring of the string I have been given. It's quite OK to stop at the first one found. Ideally the strings being searched through will be the keys of a dictionary but this...
5
1595
by: Kris Kennaway | last post by:
I am trying to parse a bit-stream file format (bzip2) that does not have byte-aligned record boundaries, so I need to do efficient matching of bit substrings at arbitrary bit offsets. Is there a package that can do this? This one comes close: http://ilan.schnell-web.net/prog/bitarray/ but it only supports single bit substring match.
0
9724
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9604
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10379
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10394
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7665
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6882
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5690
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3863
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3015
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.