473,473 Members | 1,523 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Splitting a file into multiple file based on some pattern

7 New Member
the text file is like:
shfhsgfkshkjg
gkjsfkgkjfgkfg
model1
lkgjhllghfjgh
kjfgkjjfghg
endmodel
model2
jfhkjhcgbkcjg
xhbgkxfkgh
endmodel

i want between each model and endmodel ,what is the text should be in the new file.and file name should be like model1,model2....model may be 100 or more.please help me.
Aug 9 '14 #1
13 1785
sicarie
4,677 Recognized Expert Moderator Specialist
Between the two, it looks like "model" is the same between. You could either write a regex or parse the file for matching characters.
Aug 9 '14 #2
rokstar24
7 New Member
but how to write for making the new text file each time.
Aug 9 '14 #3
bvdet
2,851 Recognized Expert Moderator Specialist
The file could be written something like:
Expand|Select|Wrap|Line Numbers
  1. fn = "model1"
  2. f = open(fn, "w")
  3. f.write("\n".join(content))
  4. f.close()
"content" would be a list of the text items between "modelX" and "endmodel".

A possible regex pattern could be:
Expand|Select|Wrap|Line Numbers
  1. import re
  2. patt = re.compile(r"(model\d+)\n([0-9a-zA-Z\n]+?)endmodel",  re.MULTILINE)
Then the file names and content could be extracted to a list of lists:
Expand|Select|Wrap|Line Numbers
  1. contents = patt.findall(s)
Aug 9 '14 #4
rokstar24
7 New Member
bro does it give 100 FILE if 100 models are there...
Aug 10 '14 #5
bvdet
2,851 Recognized Expert Moderator Specialist
rokstar24,

Yes, if the models are uniquely identified.
Aug 10 '14 #6
rokstar24
7 New Member
can anyone give me proper code which will run directly..because i m trying but i am not getting it.
Aug 12 '14 #7
rokstar24
7 New Member
Expand|Select|Wrap|Line Numbers
  1. f=open("E:\pra.txt","r+")
  2.  
  3.  
  4. import re,sys
  5. k= re.findall(r"(model\d+)\n([0-9a-zA-Z\n]+?)endmodel",f.read())
  6.  
  7. count = 1
  8. fwrite = open("filename%s" %(count), 'w')
  9. for line in f:
  10.     if k in line:
  11.         # close open file object, increment count, open new file object
  12.  
  13.         count += 1
  14.         fwrite = open("filename%s" %(count), 'w')
  15.         fwrite.write(k)
  16. fwrite.close()
  17. f.close()
i have written this..tell me where i m wrong??
Aug 12 '14 #8
rokstar24
7 New Member
how to make this k to be written in new file..because it is showing k is not string.
Aug 12 '14 #9
bvdet
2,851 Recognized Expert Moderator Specialist
You are right - 'k' is not a string. It is a list of lists, and you have to get the appropriate individual elements, which are strings, and write them to disk. 'k' may look something like this:
Expand|Select|Wrap|Line Numbers
  1. [('model1', 'lkgjhllghfjgh\nkjfgkjjfghg\n'), ('model2', 'jfhkjhcgbkcjg\nxhbgkxfkgh\n')]
Aug 12 '14 #10
dwblas
626 Recognized Expert Contributor
k= re.findall(r"(model\d+)\n([0-9a-zA-Z\n]+?)endmodel",f.read())

and
if k in line:

are redundant. Forget the findall and just use if "model" in line, or since the line starts with "model" and we don't want "endmodel"
if line.startswith(model)

Expand|Select|Wrap|Line Numbers
  1. test_data = """the text file is like:
  2.  shfhsgfkshkjg
  3.  gkjsfkgkjfgkfg
  4.  model1
  5.  lkgjhllghfjgh
  6.  kjfgkjjfghg
  7.  endmodel
  8.  model2
  9.  jfhkjhcgbkcjg
  10.  xhbgkxfkgh
  11.  endmodel"""
  12.  
  13. file_input = test_data.split("\n")
  14. model_list = []
  15. ctr = 0
  16. for line in file_input:
  17.     line = line.strip()
  18.     if line.startswith("model"):
  19.         if ctr:     ## first group is junk & not copied
  20.             with open(model_list[0], "w") as fp_out:
  21.                 for rec in model_list:
  22.                     fp_out.write("%s \n" % (rec))
  23.         ctr += 1
  24.         model_list = []
  25.     model_list.append(line)
  26.  
  27. ## final list
  28. with open(model_list[0], "w") as fp_out:
  29.     for rec in model_list:
  30.         fp_out.write("%s \n" % (rec)) 
Aug 12 '14 #11
rokstar24
7 New Member
can anyone tell
if i have to give if condition for
Expand|Select|Wrap|Line Numbers
  1. model        1
means 8 spaces exactly are between model and 1.then what should i do exactly to give condition for finding it.
Aug 13 '14 #12
bvdet
2,851 Recognized Expert Moderator Specialist
Assuming you want to use the regex solution:
Expand|Select|Wrap|Line Numbers
  1. patt = re.compile(r"(model *\d+)\n([0-9a-zA-Z\n]+?)endmodel",  re.MULTILINE)
Aug 13 '14 #13
dwblas
626 Recognized Expert Contributor
Or use split and join which is also a "standard" way.
Expand|Select|Wrap|Line Numbers
  1. test_string="model        1"
  2. print test_string
  3. print  "becomes", "".join(test_string.split()) 
Aug 13 '14 #14

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Subodh | last post by:
Hi, Currently we get data from more then 200 different sources and all of our vendors provide data in different file formats. The problem is we have more then 100 DTS packages now and the...
4
by: Tony Liu | last post by:
Hi, how can I create multiple new file handles of a file without having to share to file to the other processes? I have a file that will be accessed by multiple threads in my application, each...
3
by: Rich Denis | last post by:
Hello, I am in need of assistance trying to figure out how to 'Unit Test' my Event Based Async Pattern (http://msdn2.microsoft.com/e7a34yad.aspx) web service calls. Specifically how to test the...
0
by: tolcis | last post by:
Hi! I need to know the proper way to split existing databases into multiple file groups. How do I move existing tables into different file groups (keeping all constrains intact) and move indexes...
1
by: Query Builder | last post by:
I have one of our production Accounting Databases starting from 2 GB now grown into a 20 GB Database over the period of a few years... I have been getting timeouts when transactions are trying to...
3
by: =?Utf-8?B?aGVyYmVydA==?= | last post by:
I need to build an event-based asynchronous pattern (around a send/receive messaging API). Is there a step-by-step guidance about how to write code for the EBAP ? Does any book cover this theme...
6
by: Gaijinco | last post by:
I'm having a weird error compiling a multiple file project: I have three files: tortuga.h where I have declared 5 global variables and prototypes for some functions. tortuga.cpp where I...
2
by: Prashant | last post by:
Hi All, we are using <input id="testFile" runat="server" type="file" /> control to select file from local machine. A problem with this control is at a time we can select only single file from...
3
by: Vinda | last post by:
Hi Bytes, Using a previous question as a base Access 2000 Inserting multiple rows based on a date range. I also wanted to insert multiple rows into a table according to a date range supplied by a...
1
by: kffacs | last post by:
Multiple rows based on a date range I have an MSAccess 2007 DB to record our employees Personal Days Off (PDO). Until now I have only had a form to record each single day taken. This results in...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.