By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,595 Members | 966 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,595 IT Pros & Developers. It's quick & easy.

How can I combine a directory of .txt files into one AND insert their filenames

P: 3
Hello,

Any help would be much appreciated. I have a directory of ~2000 tab delimited .txt files. I would like to combine all these files (append to the last blank row) into a master file. The one catch is, I need the filename from which the data is drawn to be the first item in every row of the master file. For example:

I don't know how to insert tabs into this message, so I'll insert "|" where a tab would be.

File 1:
Name = "ABC.txt"
Row 1=dog|cat|mouse|
Row 2=mouse|cat|dog

File 2:
Name = "DEF.txt"
Row 1=frog|bug|grass
Row 2=grass|bug|frog

Ideally, I'd like the output (master file) to look like:
Row 1=ABC.txt|dog|cat|mouse|
Row 2=ABC.txt|mouse|cat|dog
Row 3=DEF.txt|frog|bug|grass
Row 4=DEF.txt|grass|bug|frog

The names of the 2000 text files will change from time to time, so I would like to be able to reference a directory in the code as opposed to each individual file.

I hope this is enough info. Thanks in advance for any help!

PD
Apr 4 '08 #1
Share this Question
Share on Google+
6 Replies

micmast
100+
P: 144
afaik you could do something like this:

Expand|Select|Wrap|Line Numbers
  1. master = open(masterfile,'a') # or write, whatever you need
  2. smallfile = open(smallerfile,'r')
  3.  
  4. for line in smallfile.readlines():
  5.        master.write(smallfilename+"\t"+line)
  6.  
Then repeat that for every smaller file
Apr 4 '08 #2

jlm699
100+
P: 314
Using the os.path module will give you a lot of flexibility when it comes to directories and file paths, including cross-platform functionality if that's what you fancy.
Expand|Select|Wrap|Line Numbers
  1. >>> import os
  2. >>> os.getcwd()  ## This will return the current working directory
  3. 'C:\\Python24\\Lib\\site-packages\\wx-2.8-msw-unicode\\wx\\py'
  4. >>> os.path.join('foo', 'somedir', 'bar')  ## This is convenient so you don't have to worry about using / on *nix and \\ on windows
  5. 'foo\\somedir\\bar'
  6. >>> os.chdir(os.path.join('C:\\', 'Documents and Settings', 'Administrator', 'Desktop', 'pythtests'))  ## Changes the current working directory
  7. >>> os.getcwd()
  8. 'C:\\Documents and Settings\\Administrator\\Desktop\\pythtests'
  9. >>> os.listdir('.')  ## Returns list of names of files and dirs in cwd
  10. ['bckmch.py', 'cmdtest.py', 'cobyla.py', 'elseerr.py', 'fileio.py', 'functest.py', 'graphics', 'hscore.py', 'ldict.py', 'lid', 'mainbody', 'matrixprint.py', 'matrx_print.py', 'module1.py', 'module1.pyc', 'module2.py', 'poopies.txt', 'Question', 'test.py', 'test2.py', 'tkinttxtbx.py', 'tktst.py', 'topload', 'totalbottle', 'trivgame.py', 'walkncount.py', 'wxtemplate.py']
  11. >>> 
  12. >>> ## So let's do something like this.......
  13. >>> mydirectory = os.path.join('C:\\', 'Documents and Settings', 'Administrator', 'Desktop', 'pythtests')  ## You can set this to your desired path
  14. >>>
  15. >>> myfiles = os.listdir(mydirectory)
  16. >>> for file in myfiles:  ## We'll iterate through each file in the list
  17. ...     if os.path.isfile(os.path.join(mydirectory, file)):  ## Check it is a file and not a directory
  18. ...         fh = open(os.path.join(mydirectory, file), 'r')
  19. ...         ## Do your stuff here
  20.  
Another level of security you could write into the for loop would be to take the file name (represented by string in file) and do file.split('.')[-1] == 'txt' . This will ensure you've got a .txt file and not something else hidden in your directory.
Hope that helps.

P.S. like micmast said tab is represented by the escape character \t
Apr 4 '08 #3

P: 3
Thanks a ton! This will hopefully make my life a lot easier. So if I understand correctly, I will tack on micmast's code to the bottom of jlm699's code and modify where necessary. I will try this out ASAP.

Thanks again!
Apr 4 '08 #4

P: 3
Hi again,

I'm sorry- I don't want to ask anyone to have to hold my hand through the whole thing, but I've tried the suggested code, and I can't get it to work. I must be doing something wrong.

If all the text files I'm combining are in the directory "C:\Documents and Settings\pd\Desktop\Pythontest", what exactly should the full code look like?

Thanks so much again!
Apr 4 '08 #5

jlm699
100+
P: 314
What's wrong with what you've tried? What are the errors that you are getting?
Apr 6 '08 #6

Expert 100+
P: 511
Expand|Select|Wrap|Line Numbers
  1. import glob
  2. for filename in glob.glob("*.txt"):
  3.     for lines in open(filename):
  4.         print filename,lines.strip()
  5.  
on the command line
Expand|Select|Wrap|Line Numbers
  1. c:\test> python script.py > outfile.txt
  2.  
Apr 7 '08 #7

Post your reply

Sign in to post your reply or Sign up for a free account.