Hello,
Any help would be much appreciated. I have a directory of ~2000 tab delimited .txt files. I would like to combine all these files (append to the last blank row) into a master file. The one catch is, I need the filename from which the data is drawn to be the first item in every row of the master file. For example:
I don't know how to insert tabs into this message, so I'll insert "|" where a tab would be.
File 1:
Name = "ABC.txt"
Row 1=dog|cat|mouse|
Row 2=mouse|cat|dog
File 2:
Name = "DEF.txt"
Row 1=frog|bug|grass
Row 2=grass|bug|frog
Ideally, I'd like the output (master file) to look like:
Row 1=ABC.txt|dog|cat|mouse|
Row 2=ABC.txt|mouse|cat|dog
Row 3=DEF.txt|frog|bug|grass
Row 4=DEF.txt|grass|bug|frog
The names of the 2000 text files will change from time to time, so I would like to be able to reference a directory in the code as opposed to each individual file.
I hope this is enough info. Thanks in advance for any help!
PD
6 1576
afaik you could do something like this: -
master = open(masterfile,'a') # or write, whatever you need
-
smallfile = open(smallerfile,'r')
-
-
for line in smallfile.readlines():
-
master.write(smallfilename+"\t"+line)
-
Then repeat that for every smaller file
Using the os.path module will give you a lot of flexibility when it comes to directories and file paths, including cross-platform functionality if that's what you fancy. -
>>> import os
-
>>> os.getcwd() ## This will return the current working directory
-
'C:\\Python24\\Lib\\site-packages\\wx-2.8-msw-unicode\\wx\\py'
-
>>> os.path.join('foo', 'somedir', 'bar') ## This is convenient so you don't have to worry about using / on *nix and \\ on windows
-
'foo\\somedir\\bar'
-
>>> os.chdir(os.path.join('C:\\', 'Documents and Settings', 'Administrator', 'Desktop', 'pythtests')) ## Changes the current working directory
-
>>> os.getcwd()
-
'C:\\Documents and Settings\\Administrator\\Desktop\\pythtests'
-
>>> os.listdir('.') ## Returns list of names of files and dirs in cwd
-
['bckmch.py', 'cmdtest.py', 'cobyla.py', 'elseerr.py', 'fileio.py', 'functest.py', 'graphics', 'hscore.py', 'ldict.py', 'lid', 'mainbody', 'matrixprint.py', 'matrx_print.py', 'module1.py', 'module1.pyc', 'module2.py', 'poopies.txt', 'Question', 'test.py', 'test2.py', 'tkinttxtbx.py', 'tktst.py', 'topload', 'totalbottle', 'trivgame.py', 'walkncount.py', 'wxtemplate.py']
-
>>>
-
>>> ## So let's do something like this.......
-
>>> mydirectory = os.path.join('C:\\', 'Documents and Settings', 'Administrator', 'Desktop', 'pythtests') ## You can set this to your desired path
-
>>>
-
>>> myfiles = os.listdir(mydirectory)
-
>>> for file in myfiles: ## We'll iterate through each file in the list
-
... if os.path.isfile(os.path.join(mydirectory, file)): ## Check it is a file and not a directory
-
... fh = open(os.path.join(mydirectory, file), 'r')
-
... ## Do your stuff here
-
Another level of security you could write into the for loop would be to take the file name (represented by string in file) and do file.split('.')[-1] == 'txt' . This will ensure you've got a .txt file and not something else hidden in your directory.
Hope that helps.
P.S. like micmast said tab is represented by the escape character \t
Thanks a ton! This will hopefully make my life a lot easier. So if I understand correctly, I will tack on micmast's code to the bottom of jlm699's code and modify where necessary. I will try this out ASAP.
Thanks again!
Hi again,
I'm sorry- I don't want to ask anyone to have to hold my hand through the whole thing, but I've tried the suggested code, and I can't get it to work. I must be doing something wrong.
If all the text files I'm combining are in the directory "C:\Documents and Settings\pd\Desktop\Pythontest", what exactly should the full code look like?
Thanks so much again!
What's wrong with what you've tried? What are the errors that you are getting?
-
import glob
-
for filename in glob.glob("*.txt"):
-
for lines in open(filename):
-
print filename,lines.strip()
-
on the command line -
c:\test> python script.py > outfile.txt
-
Post your reply Sign in to post your reply or Sign up for a free account.
Similar topics
3 posts
views
Thread by NotGiven |
last post: by
|
5 posts
views
Thread by Raj |
last post: by
|
1 post
views
Thread by tom lewton |
last post: by
|
1 post
views
Thread by JS |
last post: by
| |
9 posts
views
Thread by Wolfgang Draxinger |
last post: by
|
1 post
views
Thread by jo3c |
last post: by
| |
13 posts
views
Thread by lawpoop |
last post: by
| | | | | | | | | | |