By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,710 Members | 1,906 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,710 IT Pros & Developers. It's quick & easy.

Replacing multiple strings in a file

P: 2
Hello,

I am very new to perl and I am having trouble figuring out how to replace multiple strings in a single file. The file has something around 750k instances that need to be replaced with 350 different values:

1-A needs to be replaced with 12-FOO-1-A
2-B needs to be replaced with 44-BAR-2-B
1-400 needs to be replaced with 123-D1-400
(and 347 more)

How would I set up a program to read in the large file and output it with the changes?

Thanks,
-J
Feb 10 '10 #1
Share this Question
Share on Google+
4 Replies


numberwhun
Expert Mod 2.5K+
P: 3,503
Its great that you tell us what needs to be replaced with what, but if we do not have a sample of the data we are dealing with, helping you with your request will be a bit difficult.

Please post a sample of the data we are dealing with, showing the use of each piece you mentioned.

Regards,

Jeff
Feb 10 '10 #2

P: 2
Jeff,

Here are examples from the two text files I have. The first has the data that needs to be changed, formatted into columns. The first column is the value to be replaced:

1-A 50 3017.1N 4922.5E 301841 927850
1-A 51 3017.9N 4923.3E 301834 927530
1-A 52 3018.7N 4924.0E 301826 276011
1-A 53 3019.5N 4924.8E 301819 976092
1-A 54 3020.4N 4925.6E 301811 926173
1-A 55 3021.2N 4926.4E 301804 927254
1-A 56 3022.0N 4927.2E 301796 276334
1-A 57 3022.8N 4928.0E 301789 976415
1-A 58 3023.6N 4928.7E 301789 926496
1-A 59 3024.5N 4929.5E 301774 927577
1-A 60 3025.3N 4930.3E 301770 927658
1-A 61 3026.1N 4931.1E 301755 927678
2-B 308 26439.9S 4546.0W 269692 971394
2-B 309 26440.3S 4544.9W 269788 971404
2-B 310 26440.8S 4543.9W 269184 974092
2-B 311 26441.3S 4542.8W 267279 971142
2-B 312 26441.7S 4541.7W 269737 971492
2-B 313 26442.2S 4540.7W 269741 971422
2-B 314 26442.7S 4539.6W 269767 971429
2-B 315 26443.1S 4538.5W 269663 974342
2-B 316 26443.6S 4537.5W 267759 971392
2-B 317 26444.1S 4536.4W 269785 971442
2-B 318 26444.6S 4535.3W 269751 971442
2-B 319 26445.0S 4534.2W 269047 971454
4-100 67 4626.4N 1546.4W 284850 109716
4-100 68 4623.5N 1545.0W 284090 100682
4-100 69 4620.5N 1543.5W 284922 100528
4-100 70 4617.6N 1542.1W 284939 109634
4-100 71 4614.6N 1540.6W 284908 109590
4-100 72 4611.7N 1539.2W 284647 109564
4-100 73 4648.7N 1537.7W 289787 109352
4-100 74 4645.8N 1536.3W 284926 109557
4-100 75 4642.8N 1534.8W 285065 100463
4-100 76 4559.9N 1533.4W 280205 100469
(etc)

My second file contains on seperate lines the values seperated by whitespace that need to be replaced in the first text file:

1-A 12-FOO-1-A
2-B 44-BAR-2-B
1-400 123-D1-400
(etc)

If possible, I would like to be able to read in both files and have it run the replace function for each line of input in the second file.

Thanks for your time,
- J
Feb 10 '10 #3

Expert 100+
P: 785
@Polarism
You should set it up in a way that it reads and processes line by line from the input. Don't read all the input into a big array first and then start processing!. Just read a single line from input, make your replacements, write out the replaced line and then read the next line from input and so on.
You can make a configuration file for all your replacements, I propose XML-syntax (because of compatibility and the plugins for it). Like
Expand|Select|Wrap|Line Numbers
  1. <replace what="1-A" with="12-FOO-1-A">
  2. <replace what="2-B" with="44-BAR-2-B">
  3. ...
If it's a one-time-job, you could also directly hardcode it as an array in your program.

If you read an input line, you will go through your array and run all the replacements one by one on your current line. Probably you would run into the problem of replacing something that's already replaced, like replacing "aa" with "bb" and then replacing "bb" with "cc", so that "aa" becomes "cc" finally. To avoid that, you can put the replaced parts in brackets {} or any other character that doesn't occur in your source file. Then use a regularExpression with lookahead for the next bracket. if you can't find one, or it's an opening bracket, then do the replacement, else not (because else you are inside an already replaced part).
Feb 12 '10 #4

P: 1
I suggest populating the keys and values into a Hash
Expand|Select|Wrap|Line Numbers
  1. $hash{'1-A'} = 12-FOO-1-A;
  2. $string =~ s/$key/$value/g;
Feb 12 '10 #5

Post your reply

Sign in to post your reply or Sign up for a free account.