437,903 Members | 1,086 Online Need help? Post your question and get tips & solutions from a community of 437,903 IT Pros & Developers. It's quick & easy.

# perl regex

 P: 89 I have a data file and 4th column looks like below: Some examples 34899939-34899967 34899939-34899967:34905554-34905559 34899939-34899967:34905554-34905559:34905560-34905574 I have to extract like below: For the first line: \$start = 34899939 \$end = 34899967 \$block_size = 1 For the 2nd line: \$start = 34899939 \$end = 34905559 \$block_size = 2 \$n1=34899939 \$n2=34899967 \$n3=34905554 \$n4=34905559 For the 3rd line: \$start = 34899939 \$end = 34905574 \$block_size = 3 \$n1=34899939 \$n2=34899967 \$n3=34905554 \$n4=34905559 \$n5=34905560 \$n6=34905574 I am able to differentiate 1 block and 2 block depending upon : character and able to find the solution for the 3rd line as below: Expand|Select|Wrap|Line Numbers sub special {           chomp \$_;         my @v = split(/\s+/,\$_);                 if(\$v =~ /\:/) {         \$num1 = \$`;         \$num2 = \$';                 if(\$num1 =~ /\-/) {                         \$n1 = \$`;                         \$n2 = \$';                 }                 if(\$num2 =~ /\-/) {                         \$n3 = \$`;                         \$n4 = \$';                 }         }         \$start = \$n1;         \$end = \$n4;         print "\$n1 \t \$n2 \t \$n3 \t \$n4 \n";   }   But how do I generalise the numbers with : to generate the \$n(i)? Thanks. Feb 19 '09 #1
4 Replies

 Expert 2.5K+ P: 4,059 Your data is confusing. What do you mean "the 4th column looks like this"? You have posted three seperate lines of data. Are they part of a larger line of data? Why in your sub special() are you first splitting on spaces when there is no spaces in the data you posted? Feb 19 '09 #2

 P: 89 Hi Kevin, sorry for the confusion. Let me try to explain. My 4th column can contain data as given in my previous post. It can contain without : or with one or two or three set of numbers separated by : Hence they are different kinds of data available in 4th column of each data. Every line I parse it to get the 4th column hence I split using space to get my 4th column data. And still parse with a special character : and then further split the example. If my 4th column is like 1st example, then it is easy for me to split the numbers and put them into \$n1 and \$n2. If my 4th column is like 2nd line example with one :, then my subroutine special does the job and assign \$n1,\$n2,\$n3 and \$n4. But I want to write a generalised routine which can handle like line 3 or even with more number of: I hope I have explained clearly. Regards Feb 19 '09 #3

 Expert 2.5K+ P: 4,059 That helped clear it up. I hope I am not doing your school work for you. Expand|Select|Wrap|Line Numbers while() {    special(\$_); }   sub special {    local (\$_) = @_;     chomp \$_;    my \$col4 = (split(/\s+/));    my @blocks = split(/:/,\$col4);    my @temp;    for (@blocks) {       push @temp, split(/-/);    }    print "start = \$temp\n";    print "end = \$temp[-1]\n";    print 'blocks = ', scalar @blocks,"\n";    for (@temp) {       print "\t\$_\n";    }    print "\n"; } __DATA__ dummy dummy dummy 34899939-34899967 dummy dummy dummy dummy 34899939-34899967:34905554-34905559 dummy  dummy dummy dummy 34899939-34899967:34905554-34905559:34905560-34905574 dummy    Apply your own file I/O inplace of DATA Feb 19 '09 #4

 Expert 2.5K+ P: 4,059 output is: Expand|Select|Wrap|Line Numbers start = 34899939 end = 34899967 blocks = 1     34899939     34899967   start = 34899939 end = 34905559 blocks = 2     34899939     34899967     34905554     34905559   start = 34899939 end = 34905574 blocks = 3     34899939     34899967     34905554     34905559     34905560     34905574   edit the output for your needs to display it how you need to. I added "start", "end" and "blocks" just to make it easier to read. Feb 19 '09 #5 