473,608 Members | 2,074 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

"smart" quotes in PHP

Hello all,

I've been struggling for a few days with the question of how to convert
"smart" (curly) quotes into straight quotes. I tried playing with the
htmlentities() function, but all that is doing is changing the smart
quotes into nonsense characters. I also searched the web for quite a
while and was unsuccessful in finding a solution.

What puzzles me is that doing it the other way around is simple enough.
For example, this works fine in converting a straight quote into an
"open" smart quote:

if ($content[$k] == "\"")
$content = substr($content , 0, $k) . "“" . substr
($content, $k+1, strlen($content )-$k+1);

But the other way around doesn't work. Any ideas?

Thanks,

Martin Goldman
My e-mail addresse's correct domain name is mgoldman.com.
Jul 17 '05 #1
9 12271
Martin Goldman <ww*@nowhere.fo o> wrote:
I've been struggling for a few days with the question of how to convert
"smart" (curly) quotes into straight quotes.
Smart/curly quotes? straight quotes? What are these?
What puzzles me is that doing it the other way around is simple enough.
For example, this works fine in converting a straight quote into an
"open" smart quote:

if ($content[$k] == "\"")
$content = substr($content , 0, $k) . "“" . substr
($content, $k+1, strlen($content )-$k+1);


Funny way to do a str_replace :)

What character is represented by #147? AFAIK it's not in any characters
set I know (ASCII or ISO-8859-x). So your actual problem might be that
you are using an other encoding for the character you want to preplace
that PHP is actually using!

BTW 3rd parameter in htmlentities specifies the character set.

--

Daniel Tryba

Jul 17 '05 #2
On Fri, 14 Nov 2003 17:42:08 GMT, Martin Goldman <ww*@nowhere.fo o> wrote:
I've been struggling for a few days with the question of how to convert
"smart" (curly) quotes into straight quotes. I tried playing with the
htmlentities () function, but all that is doing is changing the smart
quotes into nonsense characters. I also searched the web for quite a
while and was unsuccessful in finding a solution.
You've got to work out what character set the text is encoded in, for
starters, since 'smart quotes' exist in Microsoft's Codepage 1522 but not in
the standard ISO 8859 character sets, e.g. iso-8859-15.

In codepage 1522:

hex dec Unicode Unicode name
91 145 8216 LEFT SINGLE QUOTATION MARK
92 146 8217 RIGHT SINGLE QUOTATION MARK
93 147 8220 LEFT DOUBLE QUOTATION MARK
94 148 8221 RIGHT DOUBLE QUOTATION MARK

But in iso-8859-15, 145-148 aren't defined as printable characters; 128-159
are reserved for control characters.

So if you change it to &#147, but output your page encoded in iso-8859-1,
you're just changing it to the code for a non-printable character. The same
entity will appear as a left double quotation mark if encoded in Windows-1522
though.
What puzzles me is that doing it the other way around is simple enough.
For example, this works fine in converting a straight quote into an
"open" smart quote:

if ($content[$k] == "\"")
$content = substr($content , 0, $k) . "“" . substr
($content, $k+1, strlen($content )-$k+1);

But the other way around doesn't work. Any ideas?


In what way doesn't it work? What does str_replace($co ntent, chr(147), '"');
appear to do in your setup?

--
Andy Hassall (an**@andyh.co. uk) icq(5747695) (http://www.andyh.co.uk)
Space: disk usage analysis tool (http://www.andyhsoftware.co.uk/space)
Jul 17 '05 #3
Martin Goldman wrote:
I've been struggling for a few days with the question of how to convert
"smart" (curly) quotes into straight quotes.
As D. Tryba hinted at, str_replace should work fine. After all,
you're replacing one character with another.

$string = str_replace($ch r,'"',$string)

where $chr is the character you want to replace.
I tried playing with the htmlentities() function, but all that is doing
is changing the smart quotes into nonsense characters.
I'd be interested in seeing what you actually tried. Since so-called
smart quotes aren't in the Latin-1 repertoire, you'd have to specify
a charset other than the default ISO-8859-1. Say you typed smart
quotes on a bog standard Windows system by holding down Alt and
pressing 0, 1, 4, and 7 (or 8) on the numeric keypad, you'd use

$string = htmlentities($s tring,ENT_COMPA T,'cp1252')

where $string is the string containing smart quotes. That converts
smart quotes to their respective entity references.
What puzzles me is that doing it the other way around is simple enough.
Eek! I'd have thought that was *more* difficult...
if ($content[$k] == "\"")
$content = substr($content , 0, $k) . "“" . substr
($content, $k+1, strlen($content )-$k+1);


How does your script know that the quotation mark was intended as an
opening quotation mark? ;-)

In HTML, the character reference “ is undefined. The LEFT DOUBLE
QUOTATION MARK can be represented using the character reference
“ or the entity reference &ldquo;. The RIGHT DOUBLE QUOTATION
MARK can be represented using the character reference ” or the
entity reference &rdquo;.

--
Jock
Jul 17 '05 #4
John Dunlop <jo*********@jo hndunlop.info> wrote in
news:MP******** *************** *@news.freeserv e.net:
Martin Goldman wrote: I'd be interested in seeing what you actually tried. Since so-called
smart quotes aren't in the Latin-1 repertoire, you'd have to specify
a charset other than the default ISO-8859-1. Say you typed smart
quotes on a bog standard Windows system by holding down Alt and
pressing 0, 1, 4, and 7 (or 8) on the numeric keypad, you'd use

$string = htmlentities($s tring,ENT_COMPA T,'cp1252')

where $string is the string containing smart quotes. That converts
smart quotes to their respective entity references.
This results in the smart quotes being replaced with nonsense characters.
The thing is, though, that I'm totally unfamiliar with character sets,
the differences between them, etc. I've never had any reason to care
about them. So I'm a little confused about what you guys are talking
about when it comes to them.
How does your script know that the quotation mark was intended as an
opening quotation mark? ;-)

Well, I didn't paste the whole thing. :) I wrote a loop that goes through
the string. It toggles a flag each time a quotation mark is found. If the
flag is set, it makes it an open quote; if it's not, it makes it a closed
quote. Hence the reason I'm not just using a str_replace for that. :)

Oh, and to answer Mr. Hassall's question -- str_replace(chr (147), "\"",
$content) doesn't do anything. The exact same string is returned.

-Martin
Jul 17 '05 #5
Martin Goldman <ww*@nowhere.fo o> wrote:
[consufed about charsets]
Oh, and to answer Mr. Hassall's question -- str_replace(chr (147), "\"",
$content) doesn't do anything. The exact same string is returned.


That might mean that there is nog chr(147) in the string although you
_see_ a character that might be represented as the character you know as
147 in cp1252! Another fine example is the eurosymbol, IIRC its 128 in
cp1252 and 204 in iso-8859-15, in iso-8859-1 204 is a generic symbol and
totally lacks the eurosymbol. Thats why if you want to display the uero
symbol one is encouraged to use the htmlentitie &euro;, which can be
rendered in any font and any character set (with a fallback to EUR).

So you job is to figure out how you quote is encoded (just step through
the string and print the chr value for each character)...

BTW unicode kind of solves the problem by defining every known character
in one set, the problem is that not every program supports it yet. But
unicode also introduces an other problem, the way the characters are
encoded (eg utf7, utf8, utf16...), I don't know if PHP supports utf16+.

--

Daniel Tryba

Jul 17 '05 #6
Daniel Tryba <ne************ ****@canopus.nl > wrote in news:bp5nhq$d0e $1
@news.tue.nl:
That might mean that there is nog chr(147) in the string although you
_see_ a character that might be represented as the character you know as 147 in cp1252! Another fine example is the eurosymbol, IIRC its 128 in
cp1252 and 204 in iso-8859-15, in iso-8859-1 204 is a generic symbol and totally lacks the eurosymbol. Thats why if you want to display the uero
symbol one is encouraged to use the htmlentitie &euro;, which can be
rendered in any font and any character set (with a fallback to EUR).

So you job is to figure out how you quote is encoded (just step through
the string and print the chr value for each character)...

Interesting you should suggest this, because I just did that. And indeed,
it's not coming out as 147. It's coming out as 226, followed by 128,
followed by 156. I suppose I could do a str_replace for these 3
characters and replace it with 147. Although, then I'd have to do that
for every character I want to support. What a drag.

Thanks,
Martin
Jul 17 '05 #7
On Sat, 15 Nov 2003 19:57:14 GMT, Martin Goldman <ww*@nowhere.fo o> wrote:
Daniel Tryba <ne************ ****@canopus.nl > wrote in news:bp5nhq$d0e $1
@news.tue.nl :
That might mean that there is nog chr(147) in the string although you
_see_ a character that might be represented as the character you know
as 147 in cp1252! Another fine example is the eurosymbol, IIRC its 128 in
cp1252 and 204 in iso-8859-15, in iso-8859-1 204 is a generic symbol
and totally lacks the eurosymbol. Thats why if you want to display the uero
symbol one is encouraged to use the htmlentitie &euro;, which can be
rendered in any font and any character set (with a fallback to EUR).

So you job is to figure out how you quote is encoded (just step through
the string and print the chr value for each character)...


Interesting you should suggest this, because I just did that. And indeed,
it's not coming out as 147. It's coming out as 226, followed by 128,
followed by 156. I suppose I could do a str_replace for these 3
characters and replace it with 147. Although, then I'd have to do that
for every character I want to support. What a drag.


Your text is encoded in UTF-8. Going back to the characters again:

hex dec Unicode Unicode name
91 145 8216 LEFT SINGLE QUOTATION MARK
92 146 8217 RIGHT SINGLE QUOTATION MARK
93 147 8220 LEFT DOUBLE QUOTATION MARK
94 148 8221 RIGHT DOUBLE QUOTATION MARK

226,128,147 in binary is:

11100010
10000000
10011100

'1110' in the first few bits of the first byte indicates it is a lead byte for
a three-byte character. The remaining two are trail bytes, as they start with
10. So separating out the data gets:

1110 0010
10 000000
10 011100

=> 001000000001110 0 (binary)
= 8220 (decicmal)

Which is LEFT DOUBLE QUOTATION MARK.

--
Andy Hassall (an**@andyh.co. uk) icq(5747695) (http://www.andyh.co.uk)
Space: disk usage analysis tool (http://www.andyhsoftware.co.uk/space)
Jul 17 '05 #8
Andy Hassall <an**@andyh.co. uk> wrote:
So you job is to figure out how you quote is encoded (just step through
the string and print the chr value for each character)...


Interesting you should suggest this, because I just did that. And indeed,
it's not coming out as 147. It's coming out as 226, followed by 128,
followed by 156. I suppose I could do a str_replace for these 3
characters and replace it with 147. Although, then I'd have to do that
for every character I want to support. What a drag.


Your text is encoded in UTF-8. Going back to the characters again:

[in depth UTF-8 decoding :)]

So Martin, you should take a look at iconv or if your server lacks
support utf8_decode(). The latter has also a usercontrib on how to use
str_replace on UTF-8 encoded string.

--

Daniel Tryba

Jul 17 '05 #9
Daniel Tryba <ne************ ****@canopus.nl > wrote in
news:bp******** **@news.tue.nl:
Andy Hassall <an**@andyh.co. uk> wrote: So Martin, you should take a look at iconv or if your server lacks
support utf8_decode(). The latter has also a usercontrib on how to use
str_replace on UTF-8 encoded string.


Great. Thanks to everyone to replied.

-Martin
my correct domain name is mgoldman.com
Jul 17 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
4263
by: Tim Hochberg | last post by:
During the recent, massive, painful Lisp-Python crossposting thread the evils of Python's whitespace based indentation were once again brought to light. Since Python' syntax is so incredibly brittle, and failure prone, it's amazing that we don't have more editor support for our feeble minds. To help reduce the severity of this crisis, I decided to take a whack at creating smart block copy and paste functionality. Anyway, I wrote some...
14
2664
by: David B. Held | last post by:
I wanted to post this proposal on c.l.c++.m, but my news server apparently does not support that group any more. I propose a new class of exception safety known as the "smart guarantee". Essentially, the smart guarantee promises to clean up resources whose ownership is passed into the function, for whatever defintion of "clean up" is most appropriate for the resource passed. Note that this is different from both the basic and the...
11
7017
by: Ron | last post by:
Hello, I'm having an aggravating time getting the "html" spewed by Word 2003 to display correctly in a webpage. The situation here is that the people creating the documents only know Word, and aren't very computer savvy. I created a system where they can save their Word documents as "html" and upload them to a certain directory, and the web page dynamically runs them through tidylib using the tidy extension to php4, thus causing the...
2
2368
by: BobAchgill | last post by:
Is there a way to let the User click on a button on a web site and have that download and install my prepackaged compressed data directory and place it nicely under my existing VB .Net Form application on the User's computer? Maybe another way of asking the question is. Can I build a smart .msi "data" installer that will when clicked on as "Run" on the web page will load into the desktop's memory ... find the location of my VB .Net Form...
3
3213
by: red floyd | last post by:
I've got some code where somebody cut&pasted some comments from MS Word, and so these comments have "smart quotes" (in particular apostrophes) embedded. The apostrophe is character hex 0x92. 2.1 indicates that characters not in the source character set are converted to the universal character name that designates the character. So far, so good. The non-source character gets translated. No big deal.
5
2885
by: Noozer | last post by:
I'm looking for a "smart folder" program to run on my Windows XP machine. I'm not having any luck finding it and think the logic behind the program is pretty simple, but I'm not sure how I'd implement this. I've done some VB6 programming and dabbled in VS.Net. Can someone share some pointers in how I could implement the following? Basically, you drag a file to the "smart" folder and, depending on the type of file and settings for that...
0
8025
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8493
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8365
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6847
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6023
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5499
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
3993
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2493
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
0
1363
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.