473,395 Members | 1,763 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Delete paragraphs which do not contain specific word

Knut Ole
I have endless paragraphs of data, in which only 10% are to be kept, rest discarded. Each "entry," ie. paragraph, are of this format:
Expand|Select|Wrap|Line Numbers
  1. <parameter> 
  2. text over several lines sometimes 
  3. containing key word 
  4. </parameter>
I guess it should be possible to find the(lack of) keyword, find previous and next <p...> and delete paragraph? Is this possible in a /bin/bash/ script for linux/unix?

Hoping for helpful input! Thank you!
Oct 18 '11 #1
2 2737
jabbah
63
feels to me as if this would be tough in bash, but i guess it should be doable in perl. just as a rough concept:
read the file line by line and store the current paragraph in some temp variable and check for the keyword. once the paragraph has ended either discard it or print it
Jan 13 '12 #2
You can use grep to check if the word is in the file. What I would do is split each paragraph into it's own file, & then find only the files you wish to keep.

This quick script will point out any line that does not start with a letter. Feel free to edit it as need be.

Expand|Select|Wrap|Line Numbers
  1. #/bin/sh
  2.  
  3. file=test.txt
  4.  
  5. keep=`cat ${file} | grep -inv "^[a-z]"`
  6. for line in ${keep}
  7. do
  8.   echo Line number ${line} can be ignored
  9. done
The arguments passed to grep are i for ignore case, n for display number & v for ignore results. Mixed with ^, this line will ignore any line that begins with a character (ignoring case) of a through z. It then passes the line number of any output that doesn't meet that requirement.

You should then be able to cat the file, search for the line numbers not in that line set, output the contents to individual files passed through grep of the keyword, & you have files of each paragraphs with your chosen key word.
Mar 4 '12 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

9
by: shieldsco | last post by:
I want to delete a colum in a table using the following sql statement: ALTER TABLE tblMarketing DROP COLUMN ThisYearMonthEnd10. In addition I only want to delete the column if it meets specific...
2
by: mahesr | last post by:
Frds, how to crawl a specific word in whole website using php.need which page it has Example: want to crawl a 144 items found,on somepage.html. Pls if u have code pls mail email address...
2
by: lekshminair | last post by:
U]can u solve it I am trying to take a specific word in textfile. for example: new .txt "Please follow these guidelines when posting questions. Submitting clear and concise questions allows...
3
by: bhavanabadhe | last post by:
I want to find all statements which contain printf word from file and that line i want in a buffer
75
by: ume$h | last post by:
/* I wrote the following program to calculate no. of 'a' in the file c:/1.txt but it fails to give appropriate result. What is wrong with it? */ #include"stdio.h" int main(void) { FILE *f;...
2
by: kardon33 | last post by:
What I need to do is take a multi dimensional array and delete all elements with the word "Total" in . I think i would use array_filter but dont know exactly how. Could some one please help me. ...
2
by: Francesco Pietra | last post by:
Please, how to adapt the following script (to delete blank lines) to delete lines containing a specific word, or words? f=open("output.pdb", "r") for line in f: line=line.rstrip() if line:...
0
by: Francesco Pietra | last post by:
I forgot to add that the lines to strip are in present case of the type of the following block HETATM 7007 O WAT 446 27.622 34.356 55.205 1.00 0.00 O HETATM 7008 H1 WAT...
2
by: weldeslasai | last post by:
I need help on preparation of c++ code that could serve as for finding specific word from.In other words the code for very simple dictionary code. Thanks
1
by: csumit80 | last post by:
hello!!!!!! iam trying to delete a particular alphabet from a word but the problem is that other letters are automatically getting deleted so what is the solution????????
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.