Hi All,
I have a string of 1500 or more characters. but I want to go through this string and split it at the 160th but it should not split in the middle of the word. If the 160th character is not the end of a word it should go back to the next word and split from that. My code is below but its not working well -
sub get_horoscope {
-
my($dbh,$thetype, $hname, $phone)=@_;
-
my($val,$sth,$rec,$horoscope,$subset,$i);
-
$sth=$dbh->prepare("SELECT ? FROM horoscope_table WHERE `horoscope_name` = '$hname'");
-
my($more)=$dbh->prepare("INSERT INTO chat_outbox (`sender`, `phone`,`text`,`insertdate`) VALUES ('466',TRIM(?),TRIM(?),NOW())");
-
$sth->execute($thetype);
-
$horoscope="";
-
my(@types);
-
my($start, $end) = (0, 160);
-
my $set = length($rec->{'$thetype'}) % 160;
-
while($rec=$sth->fetchrow_hashref()) {
-
if(length($rec->{'$thetype'}) > 160){
-
for($i = 0; $i <= $set; $i++){
-
$subset = substr($rec->{'$thetype'}, $start, $end);
-
$start+=160;
-
$end+=160;
-
$more->execute($subset);
-
}else{
-
$horoscope = "$horoscope\n$rec->{'$thetype'}";
-
}
-
}
-
}
-
$sth->finish();
-
syslog('info',__LINE__."::[get_horoscope] $horoscope");
-
return ($horoscope);
-
}
11 5022
Would you be able to provide a sample of the data with which you are working? That way we can have something to work with that you are also using.
Regards,
Jeff
-
$i=22;
-
$String="A Computer is an Electronic Device";
-
until(substr($String, $i, 1)eq " ") {
-
$i++;
-
}
-
print substr($String, 0, $i);
-
-
# here the 22nd character is 't' or 'r' in Electronic Word... But it will print the whole "A Computer is an Electronic" as output... Just try...
-
-
Answers from EMAIL REMOVED
Thanks for that I really appreciate your efforts..
You can do it with a regular expression.
I have developed following regular expression and used it successfully with Java (I haven't tried it yet, but it should also work with Perl). The regular expression got a bit complicated, because I wanted it to work for any line length (not only 160) and for any text (also for empty text or short text or text with words that are longer than a line etc.), so I commented it well. The spaces should be preserved in a way that concatenationg all parts should give back the old string exactly as it was. If possible, the split should be done after the space.
If the string contains newline-characters, split it first by newline-character and then apply the regular expression on each part.
With help of the regular expression I will insert a newline-character everywhere where the split should occur, so I avoid passing arrays around (and concatenating array parts when saving to database or searching inside etc. later on).
In the explanation below, "4" is an example value and should be replaced with the value of the constant MAX_LINE_LENGTH.
The following regular expression splits the string into parts with length of maximum 4 characters using following rules in following order - don't split if the whole line (or remainder) is less than 4 characters. regEx="(?s).{1,4}$"
- if the word is bigger than 4 characters, split the word inside. regEx="[^\s]{4}"
- split before last space if a space follows exactly after 4 characters. regEx="(?s).{4}(?=\s)"
- split behind last space before maximum 4 character regEx="(?s).{0,3}\s"
Example: splitting "12345 67 8 9ab c de wxyz1 2 4 5 678" yields "1234", "5 67", " 8 ", "9ab ", "c de", " ", "wxyz", "1 2 ", "4 5 ", "678".
In Java: -
int maxLineLength=4; // usually you will get this value from your configuration file
-
String oldText="12345 67 8 9ab c de wxyz1 2 4 5 678"; // this is the String you want to split
-
-
// verify parameters.
-
// The line should be splitted at newline-characters already before.
-
if (maxLineLength < 1) throw new Exception("ERROR: maxLineLength is " + maxLineLength + ", but must be greater than 0!");
-
if (oldText == null) throw new Exception("ERROR: text must not be null!");
-
if (oldText.indexOf("\n") != -1) throw new Exception("ERROR: text must not contain newline characters!");
-
-
// quick way out to increase performance
-
if (oldText.isEmpty()) return "";
-
-
final String regularExpression = "(?s).{1," + maxLineLength + "}$|[^\\s]{" + maxLineLength + "}|(?s).{" + maxLineLength + "}(?=\\s)|(?s).{0," + (maxLineLength - 1) + "}\\s";
-
-
// insert newlines where we want to split the string into parts.
-
// Note: Trailing empty strings are not included in the resulting array of the split() method. So the newline-char at the end of newText will have no effect.
-
String newText = oldText.replaceAll(regularExpression, "$0\n"); // append newline
-
-
return newText;
-
Now you know the logic, so I leave it for you as an exercise to convert this program to Perl. Sorry, but I have no time any more to do it myself today. If you have difficulties, come back to me tomorrow and I will help you doing it.
it should go back to the next word
Do you mean: "go back to the previous word" or "go forward to the next word"?
back to the previous word, because the subsentence should not be more than 160 characters..
Then, as toolic already suggested, Text::Wrap should do it.
Module Text:Wrap is exactly the same what my solution is doing, too. So if this module is not installed on production environment or you are not allowed to install it, you can use my solution instead. Also my solution can be easier modified in case you want its behaviour slightly changed.
Here it is in Perl: -
#!/usr/local/perl-5.9-64/bin/perl
-
-
package test;
-
-
use warnings;
-
use strict;
-
-
# main program
-
my $newText = wrap(4, "12345 67 8 9ab c de wxyz1 2 4 5 678");
-
print "splitted String:\n";
-
my @stringParts = split(/\n/, $newText);
-
map {print "$_\n";} @stringParts;
-
-
sub wrap {
-
my ($maxLineLength, $oldText) = @_;
-
-
my $regularExpression = '.{1,' . $maxLineLength . '}$|[^\\s]{' . $maxLineLength . '}|.{' . $maxLineLength . '}(?=\\s)|.{0,' . ($maxLineLength - 1) . '}\\s';
-
my $newText = $oldText;
-
$newText =~ s/$regularExpression/$&\n/gs;
-
-
return $newText;
-
}
And here is the output when you run it:
splitted String: - splitted String:
-
1234
-
5 67
-
8
-
9ab
-
c de
-
-
wxyz
-
1 2
-
4 5
-
678
So if this module is not installed on production environment
At least on Perl 5.10 or later, Text::Wrap is a standard module.
Sign in to post your reply or Sign up for a free account.
Similar topics
by: Yuriy |
last post by:
Hi,
Can anybody explain the following?
Say I have the following source XML and XSLT (see below). No matter what
this XSLT does. It is just a sample to show a problem. the idea is that
XSLT...
|
by: Justin |
last post by:
I have a string of dates seperated by commas and I need to seperate them and
put them into a dropdown control. With the code below I am able to only
remove the comma which results in one big long...
|
by: Anony |
last post by:
Hi All,
I'm trying to chunk a long string SourceString into lines of LineLength
using this code:
Dim sReturn As String = ""
Dim iPos As Integer = 0
Do Until iPos >= SourceString.Length -...
|
by: CharChabil |
last post by:
Using Vb.net 2005,
I want to read each part in this string in an array (splitting the string)
-----------
A1/EXT "BK82 LB73 21233" 105 061018 1804
-----------
That Code that i used is as follow:...
|
by: ronrsr |
last post by:
I have a single long string - I'd like to split it into a list of
unique keywords. Sadly, the database wasn't designed to do this, so I
must do this in Python - I'm having some trouble using the...
|
by: Henrik |
last post by:
The problem is (using MS Access 2003) I am unable to retrieve long
strings (255 chars) from calculated fields through a recordset.
The data takes the trip in three phases:
1. A custom public...
|
by: =?iso-8859-2?Q?K=F8i=B9tof_=AEelechovski?= |
last post by:
How do I split a title attribute value into lines within the source code
so that the paragraph gets reassembled by the browser when it is being displayed?
Microsoft Internet Explorer 7 preserves...
|
by: manishsharma1 |
last post by:
Hi,
I have one long String like this:
015EnvironmentData1........
I want to read first three character from it and convert them to
integer. This will be the length of the Name of the field...
|
by: huohaodian |
last post by:
Hi,
How can I define a long string variable in C# with multiple lines ?
For example
private string longName = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
...
|
by: spreadbetting |
last post by:
I'm trying to split a string into an separate arrays but the data is
only delimited by a comma. The actual data is one long string but the
info is in a regular format and repeats after every five...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: CloudSolutions |
last post by:
Introduction:
For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
| |