I have two text file as below:
file1.txt (4 columns)
Expand|Select|Wrap|Line Numbers
- test1 1000 2000 +
- test2 1000 2000 -
- test1 1000 2000 +
- test3 1000 2000 +
- test1 1000 2000 -
- test2 2000 3000 +
- test1 1000 2000 +
- test1 1000 3000 -
The first step in the processing is that I want to collect all the data. Hence I want to use column1 as key and column2,3 and 4 as values.
For all test1, all other three columns will be added after removing duplicates. if two lines as below are available, then only one value should be added.
Expand|Select|Wrap|Line Numbers
- test1 1000 2000 -
- test1 1000 2000 -
Since I have always used perl hash with two columns, the first column as key and second into hash and at the same time adding them in perl hash helped me to get rid of duplicate values of the second column, I want to know how that can be applicable for the above dataset. column 1 as key and column 2,3and4 as values (basically to remove duplicates) Can I cat them into one string with some pattern as
"1000:2000:-". Please let me know.
My second query would be how to compare perl hash1 (data from file1) and hash2(data from file2).
For eaxmple for test1 (every key), I have to compare the values between two hashes.
Please let me know as I am not familiar with perl hash of hash. Do I have to use that?
My basic motivation is to remove duplicates from two files and then compare two hashes to find how many of the column2:column2:column3 are present in both files as well the ones that are unique to each data set. Or any other way to handle the data?
And it is highly confusing. A small example woule be easy for me to proceed.
Thanks in advance.