By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,009 Members | 1,767 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,009 IT Pros & Developers. It's quick & easy.

Splitting a file into multiple file based on some pattern

P: 7
the text file is like:
shfhsgfkshkjg
gkjsfkgkjfgkfg
model1
lkgjhllghfjgh
kjfgkjjfghg
endmodel
model2
jfhkjhcgbkcjg
xhbgkxfkgh
endmodel

i want between each model and endmodel ,what is the text should be in the new file.and file name should be like model1,model2....model may be 100 or more.please help me.
Aug 9 '14 #1
Share this Question
Share on Google+
13 Replies


sicarie
Expert Mod 2.5K+
P: 4,677
Between the two, it looks like "model" is the same between. You could either write a regex or parse the file for matching characters.
Aug 9 '14 #2

P: 7
but how to write for making the new text file each time.
Aug 9 '14 #3

bvdet
Expert Mod 2.5K+
P: 2,851
The file could be written something like:
Expand|Select|Wrap|Line Numbers
  1. fn = "model1"
  2. f = open(fn, "w")
  3. f.write("\n".join(content))
  4. f.close()
"content" would be a list of the text items between "modelX" and "endmodel".

A possible regex pattern could be:
Expand|Select|Wrap|Line Numbers
  1. import re
  2. patt = re.compile(r"(model\d+)\n([0-9a-zA-Z\n]+?)endmodel",  re.MULTILINE)
Then the file names and content could be extracted to a list of lists:
Expand|Select|Wrap|Line Numbers
  1. contents = patt.findall(s)
Aug 9 '14 #4

P: 7
bro does it give 100 FILE if 100 models are there...
Aug 10 '14 #5

bvdet
Expert Mod 2.5K+
P: 2,851
rokstar24,

Yes, if the models are uniquely identified.
Aug 10 '14 #6

P: 7
can anyone give me proper code which will run directly..because i m trying but i am not getting it.
Aug 12 '14 #7

P: 7
Expand|Select|Wrap|Line Numbers
  1. f=open("E:\pra.txt","r+")
  2.  
  3.  
  4. import re,sys
  5. k= re.findall(r"(model\d+)\n([0-9a-zA-Z\n]+?)endmodel",f.read())
  6.  
  7. count = 1
  8. fwrite = open("filename%s" %(count), 'w')
  9. for line in f:
  10.     if k in line:
  11.         # close open file object, increment count, open new file object
  12.  
  13.         count += 1
  14.         fwrite = open("filename%s" %(count), 'w')
  15.         fwrite.write(k)
  16. fwrite.close()
  17. f.close()
i have written this..tell me where i m wrong??
Aug 12 '14 #8

P: 7
how to make this k to be written in new file..because it is showing k is not string.
Aug 12 '14 #9

bvdet
Expert Mod 2.5K+
P: 2,851
You are right - 'k' is not a string. It is a list of lists, and you have to get the appropriate individual elements, which are strings, and write them to disk. 'k' may look something like this:
Expand|Select|Wrap|Line Numbers
  1. [('model1', 'lkgjhllghfjgh\nkjfgkjjfghg\n'), ('model2', 'jfhkjhcgbkcjg\nxhbgkxfkgh\n')]
Aug 12 '14 #10

Expert 100+
P: 619
k= re.findall(r"(model\d+)\n([0-9a-zA-Z\n]+?)endmodel",f.read())

and
if k in line:

are redundant. Forget the findall and just use if "model" in line, or since the line starts with "model" and we don't want "endmodel"
if line.startswith(model)

Expand|Select|Wrap|Line Numbers
  1. test_data = """the text file is like:
  2.  shfhsgfkshkjg
  3.  gkjsfkgkjfgkfg
  4.  model1
  5.  lkgjhllghfjgh
  6.  kjfgkjjfghg
  7.  endmodel
  8.  model2
  9.  jfhkjhcgbkcjg
  10.  xhbgkxfkgh
  11.  endmodel"""
  12.  
  13. file_input = test_data.split("\n")
  14. model_list = []
  15. ctr = 0
  16. for line in file_input:
  17.     line = line.strip()
  18.     if line.startswith("model"):
  19.         if ctr:     ## first group is junk & not copied
  20.             with open(model_list[0], "w") as fp_out:
  21.                 for rec in model_list:
  22.                     fp_out.write("%s \n" % (rec))
  23.         ctr += 1
  24.         model_list = []
  25.     model_list.append(line)
  26.  
  27. ## final list
  28. with open(model_list[0], "w") as fp_out:
  29.     for rec in model_list:
  30.         fp_out.write("%s \n" % (rec)) 
Aug 12 '14 #11

P: 7
can anyone tell
if i have to give if condition for
Expand|Select|Wrap|Line Numbers
  1. model        1
means 8 spaces exactly are between model and 1.then what should i do exactly to give condition for finding it.
Aug 13 '14 #12

bvdet
Expert Mod 2.5K+
P: 2,851
Assuming you want to use the regex solution:
Expand|Select|Wrap|Line Numbers
  1. patt = re.compile(r"(model *\d+)\n([0-9a-zA-Z\n]+?)endmodel",  re.MULTILINE)
Aug 13 '14 #13

Expert 100+
P: 619
Or use split and join which is also a "standard" way.
Expand|Select|Wrap|Line Numbers
  1. test_string="model        1"
  2. print test_string
  3. print  "becomes", "".join(test_string.split()) 
Aug 13 '14 #14

Post your reply

Sign in to post your reply or Sign up for a free account.