473,395 Members | 1,869 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Splitting paragraphs in a text.

I am splitting a text block into paragraphs, to be able to add images and stuff
like that to a specific paragraph in a content management system.

Well, right now I'm splittin on two or more newlines, so this text block:

Hello, my nickname is Sandman and I am coding
some PHP

Call me

Would be split into two parts, with "Call me" being the second one.

My problem now is that if I have a text block like below:

Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>

Call me

The above would, given the rules I use now, yield four parts, as such:

---------------------------------------------
Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:
---------------------------------------------
<code>
print "Hello World!";
---------------------------------------------
print "Foo";
</code>
---------------------------------------------
Call me
---------------------------------------------

But I would want it to end up in three parts, as such:

---------------------------------------------
Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:
---------------------------------------------
<code>
print "Hello World!";

print "Foo";
</code>
---------------------------------------------
Call me
---------------------------------------------

So, basically, what I want to do is to split the text block up with the
delimiter "\n{2,}" but not when it is inside an *unclosed* html tag. Some
examples:
<div class='quote'>
Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>

Call me
</div>

Ends up in:

---------------------------------------------
<div class='quote'>
Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>

Call me
</div>
---------------------------------------------

And

<div class='quote'>
Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>
</div>

Call me

Ends up in:

---------------------------------------------
<div class='quote'>
Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>
</div>
---------------------------------------------
Call me
---------------------------------------------
Hopefully you get the idea.

Any ideas on how to solve it?

--
Sandman[.net]
Jul 17 '05 #1
3 3256
Sandman <mr@sandman.net> wrote in message news:<mr**********************@individual.net>...
I am splitting a text block into paragraphs, to be able to add images and stuff
like that to a specific paragraph in a content management system.

Well, right now I'm splittin on two or more newlines, so this text block:

Hello, my nickname is Sandman and I am coding
some PHP

Call me

Would be split into two parts, with "Call me" being the second one.

My problem now is that if I have a text block like below:

Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>

Call me

The above would, given the rules I use now, yield four parts, as such:

---------------------------------------------
Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:
---------------------------------------------
<code>
print "Hello World!";
---------------------------------------------
print "Foo";
</code>
---------------------------------------------
Call me
---------------------------------------------

But I would want it to end up in three parts, as such:

---------------------------------------------
Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:
---------------------------------------------
<code>
print "Hello World!";

print "Foo";
</code>
---------------------------------------------
Call me
---------------------------------------------

So, basically, what I want to do is to split the text block up with the
delimiter "\n{2,}" but not when it is inside an *unclosed* html tag. Some
examples:
<div class='quote'>
Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>

Call me
</div>

Ends up in:

---------------------------------------------
<div class='quote'>
Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>

Call me
</div>
---------------------------------------------

And

<div class='quote'>
Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>
</div>

Call me

Ends up in:

---------------------------------------------
<div class='quote'>
Hello, my nickname is Sandman and I am coding
some PHP. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>
</div>
---------------------------------------------
Call me
---------------------------------------------
Hopefully you get the idea.

Any ideas on how to solve it?


the easiest way would be using empty lines only where you want a
split. if you want empty lines inside tags because of readability you
could simply mask them by i. e. putting in an empty comment. this
works of course only if you can control the output of your cms to that
degree.

micha
Jul 17 '05 #2
In article <78*************************@posting.google.com> ,
ch*********@web.de (chotiwallah) wrote:
the easiest way would be using empty lines only where you want a
split. if you want empty lines inside tags because of readability you
could simply mask them by i. e. putting in an empty comment. this
works of course only if you can control the output of your cms to that
degree.


But I can't. Thousands of people are writing stuff into the CMS, and I can't
tell them to first replace newlines with character X when they cut'n'paste into
the CMS.

--
Sandman[.net]
Jul 17 '05 #3
In article <mr**********************@individual.net>, I wrote about splitting a
block of text into an array, but being careful about nested tags.

I have now solved all by myself, and thought I'd might share it (so everyone in
the future won't find an empty thread on google :)

Here is a working example:

#!/usr/bin/php
<?
$text=<<<END
Hello, my nickname is Sandman, and I like PHP, some examples:

<code>
print "Hello World";

print "Foobar";
</code>

Here are nested tags:

<quote>
<quote>
He said he liked flowers
</quote>

Well, he doesn't, ok.

<quote>I like them</quote>

Good for you
</quote>

<div class="paragraph">
Nice paragraph
</div>

<img src="foo.jpg"> <- Nice pic!
END;

$inside = 0; # we begin "outside"
$paragraphs = array();

# no multiple newlines, 2 is max - which means there is a new paragraph
$text = preg_replace("/\n{2,}/", "\n\n", $text);

# These are the container tags we wan't to spare.
$cont="(quote|div|ul|ol|code|pre)";

foreach (split("\n", "$text\n") as $line){
if (preg_match("#<$cont(.*?)>#", $line, $m)){
if (!preg_match("#</$m[1]>#", $line)){
# at this point, we've found the beginning tag
# but not the ending tag, so we're definately inside
$inside++;
}
}
if (preg_match("#</$cont>#", $line, $m)){
if (!preg_match("#<$m[1](.*?)>#", $line)){
# Aha, we have no found an ending tag, but not the tag
# that started it, which means we've stepped out one step
$inside--;
}
}
if (($line == "") && ($inside == 0) && ($agg)){
# Ok, we've reached a "\n\n" place in the textblock
# and we're not inside anything, and we have aggregated
# some text.
$paragraphs[] = $agg;
$agg=""; # reset the aggregation.
continue; # don't process the empty line.
}
$agg.="$line\n"; # Aggregate everything.
print "$inside: $line\n"; # Debugging!
}

foreach ($paragraphs as $p){
print "---------------\n";
print "$p";
}
?>

And the output is:

0: Hello, my nickname is Sandman, and I like PHP, some examples:
1: <code>
1: print "Hello World";
1:
1: print "Foobar";
0: </code>
0: Here are nested tags:
1: <quote>
2: <quote>
2: He said he liked flowers
1: </quote>
1:
1: Well, he doesn't, ok.
1:
1: <quote>I like them</quote>
1:
1: Good for you
0: </quote>
1: <div class="paragraph">
1: Nice paragraph
0: </div>
0: <img src="foo.jpg"> <- Nice pic!
---------------
Hello, my nickname is Sandman, and I like PHP, some examples:
---------------
<code>
print "Hello World";

print "Foobar";
</code>
---------------
Here are nested tags:
---------------
<quote>
<quote>
He said he liked flowers
</quote>

Well, he doesn't, ok.

<quote>I like them</quote>

Good for you
</quote>
---------------
<div class="paragraph">
Nice paragraph
</div>
---------------
<img src="foo.jpg"> <- Nice pic!

--
Sandman[.net]
Jul 17 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: somaBoy MX | last post by:
I'm building a site where I need to pull very large blocks from a database. I would like to make navigation a little more user friendly by splitting text in pages which can then be navigated. I...
40
by: | last post by:
Could someone cite some offical rule or documentation with regard to the <P> tag? I've seen folks put it in between paragraphs... and others wrap it around a paragraph. I'd think to use it...
2
by: needhelp | last post by:
This one ought to be simple, but I have dug myself deeper and deeper, and gone farther and farther from what I wanted and I'm giving up. And my html that I've written is so bolloxed up now that...
8
by: lkrubner | last post by:
Am I wrong, or do paragraphs have extra space on top in FireFox, when compared to Microsoft IE. The top of this page is an example: http://www.publicdomainsoftware.org/index.php?pageId=299 The...
0
by: André Minhorst | last post by:
Hi, I use the RTF2-Control from Stephen Lebans to store richtext in memofields. The problem is that text longer than one page is cut and not continued on the next page. So I break the text...
7
by: Jukka K. Korpela | last post by:
After noticing that IE 7 beta supports selectors like p+p, I started wondering how to achieve a simple rendering of paragraphs so that 1) there is no vertical spacing between paragraphs (i.e. the...
4
by: cryoburned | last post by:
I need to convert paragraphs of random 16 bit lines. they will be formated with a single blank line in between (between each paragraph. Each line will continue to the next line.) How can I get all...
5
by: keoo | last post by:
Im putting the part of the code which is relevant to my question......The program is opening the texfile which i already have created..Im not quite managing to make the program count the number of...
5
by: chantelle89 | last post by:
I am only a beginner in C. I am trying to find the number of paragraphs in a text file, and so I tried to count the number of empty lines and increment the counter. I searched for ' \n ' in the file...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.