473,406 Members | 2,281 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Strange Idea

I have been accumulating newsgroup messages from here for about 9 months now
almost since I started getting into .NET. Being the new year I decided to do
some organization.

The idea occured to me to write an application that stores the title, author
and contents to SQL database from which searches and the like can be done by
author and subject matter either in the subject line or the body of the
message. Then of course, becuase my SQL Server is public I would then share
the application with my many loyal friends on this ng and we would all
rejoice. (LOL)

Well maybe not :). Most of you guys that taught me would be too far advanced
to need access to something like that. I have found going back and searching
old messages to be extremely helpful. If anyone will be interested let me
know. I will be happy to share the app when complete.

If nothing else it will be a good exercise for me.
The database, datasets, and such are set and I have done practice runs with
single messages to see how it would work and it worked okay.
As it turned out the easy part was writing the content to the database in
individual runs. The hard part was what you would think would be the
easiest, looping through the folder and reading the files.

The first thing I needed to do is write a routine to loop through all the
files in the folder I have stored these messages in that have an extension
of .nws and get the contents of the files. But with .nws files it is not
behaving as expected. I knew how to do this with text files. What do I need
to change for .nws files in order to do a loop thorugh the folder and read
the file?

(By the way using Outlook Express as a newsreader)

Nov 20 '05 #1
7 925
"Peace" <Its the end of the world as we know it@here.com> wrote...
I have been accumulating newsgroup messages from here for about 9 months now almost since I started getting into .NET. Being the new year I decided to do some organization.
Hi,
Were you aware that Google has been doing this too (and for some years.)?
The first thing I needed to do is write a routine to loop through all the
files in the folder I have stored these messages in that have an extension
of .nws and get the contents of the files. But with .nws files it is not
behaving as expected. I knew how to do this with text files. What do I need to change for .nws files in order to do a loop thorugh the folder and read
the file?


The .nws format is some sort of binary format and apparently MS doesn't
publish the format. You probably have a few choices since Outlook Express
can save the files as text files (it's an option) but header information is
lost doing that. There is a utility (here)
http://www.oehelp.com/DBXtract/Default.aspx that might be of help. It
sounds like the guy figured out the .nws format and his program will convert
them to text.

Now what would really be useful is... rather than duplicate what Google is
doing (and doing well) would be to distill the data into "information."
There is way too much noise and far too little signal. This happens when
the same questions are asked 300 times, when entire messages are quoted and
when lengthy arguments about meaningless topics take place. Also note that
Google (and others) store everything which means every wrong answer is
available for search also... you have to believe that from time-to-time
somebody finds the wrong answer (but not the correction) and goes off trying
that :-)

You should also add a "concept" search. So for instance if there was a
bunch of source code in the message but the words "source code" didn't
appear in the message it would still be found when somebody typed in "vb.net
source code" as a search criteria. You'll have a lot of work to do...

Tom


Nov 20 '05 #2
Hi Peace,

I read Tom's notes and he's certainly correct about a lot of things, but,
hey, I've been doing what you've been doing for a bit more than a year now!
It would be nice to use your device, if it a similar mechanism exists on
google; besides, it represents a great exercise for you.

Go for it!

Bernie Yaeger

"Peace" <Its the end of the world as we know it@here.com> wrote in message
news:Ou**************@TK2MSFTNGP09.phx.gbl...
I have been accumulating newsgroup messages from here for about 9 months now almost since I started getting into .NET. Being the new year I decided to do some organization.

The idea occured to me to write an application that stores the title, author and contents to SQL database from which searches and the like can be done by author and subject matter either in the subject line or the body of the
message. Then of course, becuase my SQL Server is public I would then share the application with my many loyal friends on this ng and we would all
rejoice. (LOL)

Well maybe not :). Most of you guys that taught me would be too far advanced to need access to something like that. I have found going back and searching old messages to be extremely helpful. If anyone will be interested let me
know. I will be happy to share the app when complete.

If nothing else it will be a good exercise for me.
The database, datasets, and such are set and I have done practice runs with single messages to see how it would work and it worked okay.
As it turned out the easy part was writing the content to the database in
individual runs. The hard part was what you would think would be the
easiest, looping through the folder and reading the files.

The first thing I needed to do is write a routine to loop through all the
files in the folder I have stored these messages in that have an extension
of .nws and get the contents of the files. But with .nws files it is not
behaving as expected. I knew how to do this with text files. What do I need to change for .nws files in order to do a loop thorugh the folder and read
the file?

(By the way using Outlook Express as a newsreader)


Nov 20 '05 #3
I would be interested in seeing what you have done or helping out if you
need it ( which you probably dont now ). I would certainly be intersted in
your database if nothing else.

Regards - OHM

Peace wrote:
I have been accumulating newsgroup messages from here for about 9
months now almost since I started getting into .NET. Being the new
year I decided to do some organization.

The idea occured to me to write an application that stores the title,
author and contents to SQL database from which searches and the like
can be done by author and subject matter either in the subject line
or the body of the message. Then of course, becuase my SQL Server is
public I would then share the application with my many loyal friends
on this ng and we would all rejoice. (LOL)

Well maybe not :). Most of you guys that taught me would be too far
advanced to need access to something like that. I have found going
back and searching old messages to be extremely helpful. If anyone
will be interested let me know. I will be happy to share the app when
complete.

If nothing else it will be a good exercise for me.
The database, datasets, and such are set and I have done practice
runs with single messages to see how it would work and it worked okay.
As it turned out the easy part was writing the content to the
database in individual runs. The hard part was what you would think
would be the easiest, looping through the folder and reading the
files.

The first thing I needed to do is write a routine to loop through all
the files in the folder I have stored these messages in that have an
extension of .nws and get the contents of the files. But with .nws
files it is not behaving as expected. I knew how to do this with text
files. What do I need to change for .nws files in order to do a loop
thorugh the folder and read the file?

(By the way using Outlook Express as a newsreader)


--
Best Regards - OHM

O_H_M{at}BTInternet{dot}com
Nov 20 '05 #4
> Hi,
Were you aware that Google has been doing this too (and for some years.)?
Right and I have not been all that happy with their engine pretty much for
the reasons you specify below.

There is a utility (here) http://www.oehelp.com/DBXtract/Default.aspx that might be of help. It
sounds like the guy figured out the .nws format and his program will convert them to text.
Perfect. Thank you for the link.

Now what would really be useful is... rather than duplicate what Google is
doing (and doing well) would be to distill the data into "information."
There is way too much noise and far too little signal. This happens when
the same questions are asked 300 times, when entire messages are quoted and when lengthy arguments about meaningless topics take place. Also note that Google (and others) store everything which means every wrong answer is
available for search also... you have to believe that from time-to-time
somebody finds the wrong answer (but not the correction) and goes off trying that :-)

You should also add a "concept" search. So for instance if there was a
bunch of source code in the message but the words "source code" didn't
appear in the message it would still be found when somebody typed in "vb.net source code" as a search criteria. You'll have a lot of work to do...


Yes I do have a lot of work. What got me started on this was Mr.
IAmIronMan's assault of the group and I realized his messages were being
given as much credit as legit topics.

Thank you for your thoughts though. Developing the filter will be the
toughest part. I was thinking about using what I hae seen Herfired quote, I
believe it to be some sort of etiquette rules, as a "filter" for the group
and messages that violated that would be removed from the database. Of
course anything marked OT would automatically be dumped.
Nov 20 '05 #5
I let you know One Handed.......any thoughts be sure to send them my way.

"One Handed Man [ OHM# ]" <O_H_M{at}BTInternet{dot}com> wrote in message
news:%2****************@TK2MSFTNGP10.phx.gbl...
I would be interested in seeing what you have done or helping out if you
need it ( which you probably dont now ). I would certainly be intersted in
your database if nothing else.

Regards - OHM

Peace wrote:
I have been accumulating newsgroup messages from here for about 9
months now almost since I started getting into .NET. Being the new
year I decided to do some organization.

The idea occured to me to write an application that stores the title,
author and contents to SQL database from which searches and the like
can be done by author and subject matter either in the subject line
or the body of the message. Then of course, becuase my SQL Server is
public I would then share the application with my many loyal friends
on this ng and we would all rejoice. (LOL)

Well maybe not :). Most of you guys that taught me would be too far
advanced to need access to something like that. I have found going
back and searching old messages to be extremely helpful. If anyone
will be interested let me know. I will be happy to share the app when
complete.

If nothing else it will be a good exercise for me.
The database, datasets, and such are set and I have done practice
runs with single messages to see how it would work and it worked okay.
As it turned out the easy part was writing the content to the
database in individual runs. The hard part was what you would think
would be the easiest, looping through the folder and reading the
files.

The first thing I needed to do is write a routine to loop through all
the files in the folder I have stored these messages in that have an
extension of .nws and get the contents of the files. But with .nws
files it is not behaving as expected. I knew how to do this with text
files. What do I need to change for .nws files in order to do a loop
thorugh the folder and read the file?

(By the way using Outlook Express as a newsreader)


--
Best Regards - OHM

O_H_M{at}BTInternet{dot}com

Nov 20 '05 #6
Cor
Do you not have a HKW connector?

Placed it with the wrong thread

:-))
Nov 20 '05 #7
"Peace" <Its the end of the world as we know it@here.com> wrote...
Thank you for your thoughts though. Developing the filter will be the
toughest part. I was thinking about using what I hae seen Herfired quote, I believe it to be some sort of etiquette rules, as a "filter" for the group
and messages that violated that would be removed from the database. Of
course anything marked OT would automatically be dumped.


Google has an API which might be interesting for you too look at:
http://www.google.com/apis/

And, just a thought but I wouldn't arbitrarily dismiss messages that didn't
meet some etiquette rule. Not to belabor the point but consider that there
can be two answers posted to a question. One is somewhat rude and abrasive
but contains 25 lines of code illustrating the solution, the other is a link
to a web page which may or may not be available today. Few people would
trade the answer for a dead link.

That's why I often use the terms "information" and "data." There is a lot
of data in the world, it's produced all the time but information
(particularly useful information) is often hard to come by.

Good luck,
Tom
Nov 20 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Allcomp | last post by:
Hello, I have seen something strange on a customer's computer. It is a P4 3 GHz with 512 MB Ram running on a Win2K SP3 When he uses a part of my application, it is really slow (more than 10 sec...
3
by: Sven Reifegerste | last post by:
Hi, i have a table with INT columns id,key,b1,b2,c1,c2, having 1.500.000 rows. 'key' and 'id' are indexed (Kardinality 385381) and id (Kardinality 1541525). Performing a SELECT * FROM...
0
by: serge calderara | last post by:
Dear all, I have a really strange beaviour in my application. First of all I have a single plugin interface named IPlugIn which as been build in a separate project library named PLugin.dll...
16
by: Darius.Moos AT esigma-systems DOT de | last post by:
Hi, there seems to be a problem with manipulators in g++/gcc-2.96. First some are not defined in std:: and second, when using this manipulators on streams, they give strange results. A small...
6
by: WindAndWaves | last post by:
Hi Gurus The page below has a strange error. It seems to be working very well, just when you enter 8 or 9 for day, month or year then you get an error. I really have no idea where that is...
7
by: M O J O | last post by:
Hi, I'm developing a asp.net application and ran into a strange css problem. I want all my links to have a dashed underline and when they are hovered, it must change to a solid line. Sounds...
5
by: Shapper | last post by:
Hello, I just upload my web site to my hosting server and when I access it I always get an error: "Redicterion limit for this URL exceeded. Unable to load the requested page" Does anyone...
1
by: JoReiners | last post by:
Hello, I have a really strange problem. I'm unable to figure it out on my own. I parse very simple xml documents, without any check for their form. These files look very similar and are encoded...
11
by: Mike C# | last post by:
Hi all, I keep getting a strange error and can't pin it down. The message is: This application has requested the Runtime to terminate it in an unusual way. Please contact the application's...
18
by: Eric | last post by:
Ok...this seems to be treading into some really esoteric areas of c++. I will do my best to explain this issue, but I don't fully understand what is happening myself. I am hoping that the problem...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.