By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,963 Members | 1,260 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,963 IT Pros & Developers. It's quick & easy.

Extract Values between two strings in a text file using python

P: 8
Lets say I have a Text file (input_file.txt, file size is ~10GB ).

Now I need to write a Python code which will read the text file and copy the contents between Start and end to another file.

I wrote the following code.

Expand|Select|Wrap|Line Numbers
  1. import re  
  2.  
  3. with open(r'C:\Python27\log\master_input.txt', 'r') as infile, open(r'C:\Python27\log\output', 'w') as outfile:  
  4.    copy = False  
  5.    for line in infile:  
  6.       if re.match("Jun  6 17:58:16(.*)", line):  
  7.          copy = True  
  8.       elif re.match("Jun  6 17:58:31(.*)", line):  
  9.          copy = False  
  10.       elif copy:  
  11.          outfile.write(line)  
I'm not getting the desired output as expected:

Output of the code ( output_of_my_code.txt ):

Expected output is ( Expected_output.txt ):

Pls help me here to do it in best way
Attached Files
File Type: txt input_file.txt (1.5 KB, 228 views)
File Type: txt output_of_my_code.txt (215 Bytes, 229 views)
File Type: txt Expected_output.txt (1.1 KB, 213 views)
Jun 10 '15 #1
Share this Question
Share on Google+
3 Replies


bvdet
Expert Mod 2.5K+
P: 2,851
To achieve the output you need, use re to determine an integer representing the seconds and compare to the lower and upper boundaries. Here's an example:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. data = """Jun  6 17:58:13 other strings
  4. Jun  6 17:58:13 other strings
  5. Jun  6 17:58:14 other strings
  6. Jun  6 17:58:14 other strings
  7. Jun  6 17:58:15 other strings
  8. Jun  6 17:58:15 other strings
  9. Jun  6 17:58:15 other strings
  10. Jun  6 17:58:15 other strings
  11. Jun  6 17:58:16 other strings
  12. Jun  6 17:58:16 other strings
  13. Jun  6 17:58:16 other strings
  14. Jun  6 17:58:16 other strings
  15. Jun  6 17:58:16 other strings
  16. Jun  6 17:58:16 other strings
  17. Jun  6 17:58:17 other strings
  18. Jun  6 17:58:17 other strings
  19. Jun  6 17:58:17 other strings
  20. Jun  6 17:58:17 other strings
  21. Jun  6 17:58:18 other strings
  22. Jun  6 17:58:18 other strings
  23. Jun  6 17:58:18 other strings
  24. Jun  6 17:58:18 other strings
  25. Jun  6 17:58:18 other strings
  26. Jun  6 17:58:19 other strings
  27. Jun  6 17:58:19 other strings
  28. Jun  6 17:58:20 other strings
  29. Jun  6 17:58:20 other strings
  30. Jun  6 17:58:21 other strings
  31. Jun  6 17:58:21 other strings
  32. Jun  6 17:58:21 other strings
  33. Jun  6 17:58:21 other strings
  34. Jun  6 17:58:22 other strings
  35. Jun  6 17:58:23 other strings
  36. Jun  6 17:58:24 other strings
  37. Jun  6 17:58:27 other strings
  38. Jun  6 17:58:28 other strings
  39. Jun  6 17:58:28 other strings
  40. Jun  6 17:58:29 other strings
  41. Jun  6 17:58:29 other strings
  42. Jun  6 17:58:29 other strings
  43. Jun  6 17:58:29 other strings
  44. Jun  6 17:58:30 other strings
  45. Jun  6 17:58:31 other strings
  46. Jun  6 17:58:31 other strings
  47. Jun  6 17:58:32 other strings
  48. Jun  6 17:58:33 other strings
  49. Jun  6 17:58:33 other strings
  50. Jun  6 17:58:33 other strings
  51. Jun  6 17:58:33 other strings"""
  52.  
  53. patt = re.compile("Jun  6 17:58:(\d+?) (.*)")
  54. upper = 31
  55. lower = 16
  56.  
  57. for line in data.split("\n"):
  58.     m = patt.match(line)
  59.     if m:
  60.         i = int(m.group(1))
  61.         if i >= lower and i <= upper:
  62.             print line
Jun 10 '15 #2

P: 8
@bvdet: Thanks for the solution. Here i do not know the upper and lower value... How did you get those values...
Jun 11 '15 #3

bvdet
Expert Mod 2.5K+
P: 2,851
You knew the upper and lower values in your original post. How did you know them? If you are dealing with dates and times instead of strictly formatted data, look into using the time and datetime modules. Example of creating a datetime object from the date/time string:
Expand|Select|Wrap|Line Numbers
  1. >>> datetime.datetime.strptime("Jun  6 17:58:13", "%b  %d %H:%M:%S")
  2. datetime.datetime(1900, 6, 6, 17, 58, 13)
  3. >>>
From there you can create timedelta objects:
Expand|Select|Wrap|Line Numbers
  1. >>> d1 = datetime.datetime.strptime("Jun  6 17:58:13", "%b  %d %H:%M:%S")
  2. >>> d2 = datetime.datetime.strptime("Jun  7 12:55:48", "%b  %d %H:%M:%S")
  3. >>> d1-d2
  4. datetime.timedelta(-1, 18145)
  5. >>> d2-d1
  6. datetime.timedelta(0, 68255)
  7. >>> dt1 = d1-d2
  8. >>> dt1.days
  9. -1
  10. >>> dt1.total_seconds()
  11. -68255.0
  12. >>> dt2 = d2-d1
  13. >>> dt2.days
  14.  
  15. >>> dt2.total_seconds()
  16. 68255.0
  17. >>> 
Jun 11 '15 #4

Post your reply

Sign in to post your reply or Sign up for a free account.