473,385 Members | 1,379 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

datamining .txt-files, library?

i have a big collection of .txt files that i want to open and parse to
extract information.

is there a library for this or maybe even built in?
Jun 27 '08 #1
7 1008
On May 15, 2:27*pm, globalrev <skanem...@yahoo.sewrote:
i have a big collection of .txt files that i want to open and parse to
extract information.

is there a library for this or maybe even built in?
os.open to open the files and iterate through it and built in string
functions to parse it.
Jun 27 '08 #2
On May 15, 8:27*am, globalrev <skanem...@yahoo.sewrote:
i have a big collection of .txt files that i want to open and parse to
extract information.

is there a library for this or maybe even built in?
This has a lot to do with how well-structured are your files and what
kind of information you hope to extract (shallow or deep). NLTK might
be a good starting point: http://nltk.sourceforge.net/

George
Jun 27 '08 #3
On 15 Maj, 14:40, George Sakkis <george.sak...@gmail.comwrote:
On May 15, 8:27 am, globalrev <skanem...@yahoo.sewrote:
i have a big collection of .txt files that i want to open and parse to
extract information.
is there a library for this or maybe even built in?

This has a lot to do with how well-structured are your files and what
kind of information you hope to extract (shallow or deep). NLTK might
be a good starting point:http://nltk.sourceforge.net/

George
superstructured, 10K+files, all look the same.
Jun 27 '08 #4
Chris <cw****@gmail.comwrites:
On May 15, 2:27*pm, globalrev <skanem...@yahoo.sewrote:
>i have a big collection of .txt files that i want to open and parse to
extract information.

is there a library for this or maybe even built in?

os.open to open the files and iterate through it and built in string
functions to parse it.
Or more probably, regular expression library "re".
Jun 27 '08 #5
Chris a écrit :
On May 15, 2:27 pm, globalrev <skanem...@yahoo.sewrote:
>i have a big collection of .txt files that i want to open and parse to
extract information.

is there a library for this or maybe even built in?

os.open to open the files
What's wrong with (builtin) open ?

Jun 27 '08 #6
look at module re (rgular expression) or pyparser

see http://nedbatchelder.com/text/python-parsers.html

Jun 27 '08 #7
On May 16, 3:22*am, martindesali...@gmail.com wrote:
look at module re (rgular expression) or pyparser

seehttp://nedbatchelder.com/text/python-parsers.html
This ties in to 'call tree tool?' from yesterday. Do we have any
visualization modules? My two examples were 're' and 'call trees'
Jun 27 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Christian Giordano | last post by:
Hi guys, does anyone knows which are the functions I've to use to create a file, in this case a txt or a csv, that will be asked in runtime to the user to download it? So I suppose without creating...
3
by: comp.lang.php | last post by:
Using Linux/PHP 4.3.2 CLI: $fileID = fopen('myfile.txt', 'r'); // WORKS JUST FINE Using Linux/PHP 4.3.8 CLI: $fileID = fopen('myfile.txt'); // THROWS WARNING 'no such file or directory...
56
by: Anonymous, quoting Philip Ronan | last post by:
Subject: Warning: robots.txt unreliable in Apache servers From: Philip Ronan <invalid@invalid.invalid> Newsgroups: alt.internet.search-engines Message-ID: <BF89BF33.39FDF%invalid@invalid.invalid>...
3
by: Jorge Gallardo | last post by:
Hola de nuevo a todos... Agradecido a todos los que me habeis solucionado problemas anteriores... Pero como no es novedad, me surge otro. Recientemente buscando, adquiri un codigo para juntar...
3
by: zbenta | last post by:
Hi you guys. I need some help with a program I've written. I can not seam to find the reason for the error that ocurrs. But let me explain. I have a txt file that works as a database to store...
4
emaghero
by: emaghero | last post by:
Afternoon all, I am programming in C++ and I wish to output the result of my numerical calculations to a txt file. I have a vector (one-dim array) that holds a list of numbers. This is output...
2
by: Frank Potter | last post by:
I want to change an srt file to unicode format so mpalyer can display Chinese subtitles properly. I did it like this: txt=open('dmd-guardian-cd1.srt').read() txt=unicode(txt,'gb18030')...
2
by: Dreadful | last post by:
I've got multiple Text Boxes and when I click a Button the TextBoxes.Text writes inside of a .txt file. It works fine, exactly how I want it to look: FileOpen(1, "C:\Blah\Blah\Blah\Test01.txt",...
5
by: sh26 | last post by:
I can Add and Delete (A, CName, MX and TXT) Dns records on the Dns Server using C# code. The problem I am having is if someone manaully creates a TXT record on the Dns Server, I cannot delete that...
2
tdw
by: tdw | last post by:
Hi all, I have several ideas on how to do this, but am having difficulty putting the ideas together and figuring out the most efficient way to do this. I have a database of survey coordinate...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.