470,613 Members | 2,315 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,613 developers. It's quick & easy.

How can I combine a directory of .txt files into one AND insert their filenames


Any help would be much appreciated. I have a directory of ~2000 tab delimited .txt files. I would like to combine all these files (append to the last blank row) into a master file. The one catch is, I need the filename from which the data is drawn to be the first item in every row of the master file. For example:

I don't know how to insert tabs into this message, so I'll insert "|" where a tab would be.

File 1:
Name = "ABC.txt"
Row 1=dog|cat|mouse|
Row 2=mouse|cat|dog

File 2:
Name = "DEF.txt"
Row 1=frog|bug|grass
Row 2=grass|bug|frog

Ideally, I'd like the output (master file) to look like:
Row 1=ABC.txt|dog|cat|mouse|
Row 2=ABC.txt|mouse|cat|dog
Row 3=DEF.txt|frog|bug|grass
Row 4=DEF.txt|grass|bug|frog

The names of the 2000 text files will change from time to time, so I would like to be able to reference a directory in the code as opposed to each individual file.

I hope this is enough info. Thanks in advance for any help!

Apr 4 '08 #1
6 1576
144 100+
afaik you could do something like this:

Expand|Select|Wrap|Line Numbers
  1. master = open(masterfile,'a') # or write, whatever you need
  2. smallfile = open(smallerfile,'r')
  4. for line in smallfile.readlines():
  5.        master.write(smallfilename+"\t"+line)
Then repeat that for every smaller file
Apr 4 '08 #2
314 100+
Using the os.path module will give you a lot of flexibility when it comes to directories and file paths, including cross-platform functionality if that's what you fancy.
Expand|Select|Wrap|Line Numbers
  1. >>> import os
  2. >>> os.getcwd()  ## This will return the current working directory
  3. 'C:\\Python24\\Lib\\site-packages\\wx-2.8-msw-unicode\\wx\\py'
  4. >>> os.path.join('foo', 'somedir', 'bar')  ## This is convenient so you don't have to worry about using / on *nix and \\ on windows
  5. 'foo\\somedir\\bar'
  6. >>> os.chdir(os.path.join('C:\\', 'Documents and Settings', 'Administrator', 'Desktop', 'pythtests'))  ## Changes the current working directory
  7. >>> os.getcwd()
  8. 'C:\\Documents and Settings\\Administrator\\Desktop\\pythtests'
  9. >>> os.listdir('.')  ## Returns list of names of files and dirs in cwd
  10. ['bckmch.py', 'cmdtest.py', 'cobyla.py', 'elseerr.py', 'fileio.py', 'functest.py', 'graphics', 'hscore.py', 'ldict.py', 'lid', 'mainbody', 'matrixprint.py', 'matrx_print.py', 'module1.py', 'module1.pyc', 'module2.py', 'poopies.txt', 'Question', 'test.py', 'test2.py', 'tkinttxtbx.py', 'tktst.py', 'topload', 'totalbottle', 'trivgame.py', 'walkncount.py', 'wxtemplate.py']
  11. >>> 
  12. >>> ## So let's do something like this.......
  13. >>> mydirectory = os.path.join('C:\\', 'Documents and Settings', 'Administrator', 'Desktop', 'pythtests')  ## You can set this to your desired path
  14. >>>
  15. >>> myfiles = os.listdir(mydirectory)
  16. >>> for file in myfiles:  ## We'll iterate through each file in the list
  17. ...     if os.path.isfile(os.path.join(mydirectory, file)):  ## Check it is a file and not a directory
  18. ...         fh = open(os.path.join(mydirectory, file), 'r')
  19. ...         ## Do your stuff here
Another level of security you could write into the for loop would be to take the file name (represented by string in file) and do file.split('.')[-1] == 'txt' . This will ensure you've got a .txt file and not something else hidden in your directory.
Hope that helps.

P.S. like micmast said tab is represented by the escape character \t
Apr 4 '08 #3
Thanks a ton! This will hopefully make my life a lot easier. So if I understand correctly, I will tack on micmast's code to the bottom of jlm699's code and modify where necessary. I will try this out ASAP.

Thanks again!
Apr 4 '08 #4
Hi again,

I'm sorry- I don't want to ask anyone to have to hold my hand through the whole thing, but I've tried the suggested code, and I can't get it to work. I must be doing something wrong.

If all the text files I'm combining are in the directory "C:\Documents and Settings\pd\Desktop\Pythontest", what exactly should the full code look like?

Thanks so much again!
Apr 4 '08 #5
314 100+
What's wrong with what you've tried? What are the errors that you are getting?
Apr 6 '08 #6
511 Expert 256MB
Expand|Select|Wrap|Line Numbers
  1. import glob
  2. for filename in glob.glob("*.txt"):
  3.     for lines in open(filename):
  4.         print filename,lines.strip()
on the command line
Expand|Select|Wrap|Line Numbers
  1. c:\test> python script.py > outfile.txt
Apr 7 '08 #7

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

1 post views Thread by tom lewton | last post: by
6 posts views Thread by bindslind | last post: by
9 posts views Thread by Wolfgang Draxinger | last post: by
1 post views Thread by jo3c | last post: by
13 posts views Thread by lawpoop | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.