473,405 Members | 2,445 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Questions about character entities in XML and PCI security compliance

Hi all.

This is a rather long posting but I have some questions concerning the
usage of character entities in XML documents and PCI security
compliance.

The company I work for is using a third party ecommerce service for
hosting its online store. A few months ago this third party commerce
site began using PGP file encryption on XML files (e.g. web orders)
transferred to us as part of the ongoing PCI security compliance.
Basically we only need to add a PGP decryption process before we can
parse the incoming XML files so there should not have been any
technical issue.

However, we noticed that XML files they created since PGP encryption
was implemented contain some unusual character entities.

For example, if a XML file have elements containing characters such as
<, >, &, -, /, ' and so on, the XML file will use the following
character entities to represent them as shown below:

Character Unusal Character Entities
< &amp;lt;
&amp;gt;
& &amp;amp;
- &amp;#45;
/ &amp;#47;
' &amp#39;

No matter how you look at them, they are NOT the proper character
entities for the original characters shown.

The problem with these bad character entities is that when we use .Net
Framework components such as XmlReader to load the XML file, character
entities are not expanded back to the original characters they
represent.

Instead I would get the following result:

Unusal Character Entities Expanded Result:
&amp;lt; &lt;
&amp;gt; &gt;
&amp;#38; &
&amp;#45; -
&amp;#47; /
&amp;#39; '

If you take a close look at the expanded results, you would see that
they are the normal character entities you would expect to see.

It seems to me that XML export process used by the ecommerce site has
applied character entities "encoding" twice.

For example, the proper character entity for / is /.
However, if you treat / as data string and not as character entity
and apply another "encoding", you would get &amp;#47;.

This means that whenever a online customer enter characters such as &
or / in their name or shipping address, the XML file we parsed will
not give us the correct text.

For example, if customer entered "Christian & Cruz" on their shipping
address the XML file we downloaded will show them as "Christian
&amp;#38; Cruz". And when the XML file is parsed the resulting string
we get would be "Christian & Cruz".

Another example. If a customer entered "c/o R. Fenton, M.D." in their
shipping address, the XML file will show this string as "c&amp;#47;o
R. Fenton, M.D.". And the resulting string we parsed would be
"c/o R. Fenton, M.D.".

When we reported this problem to the ecommerse hosting company, their
response was that these character entities were "encoded" per PCI
security policy and thus they have no plan to "fix" them.

Their reply sounds strange because these weird character entities they
use in XML files are NOT data encryption nor do they provide security
benefits.

Can anyone tell me if there is in fact some kind of special character
entities used in XML file per PCI security compliancy?

Or is our ecommerce hosting company wrong?

Any information would be appreciated.
Thank you.
Aug 7 '08 #1
7 2529


<te*****@ucla.eduwrote in message
news:55********************************@4ax.com...
Hi all.

This is a rather long posting but I have some questions concerning the
usage of character entities in XML documents and PCI security
compliance.

The company I work for is using a third party ecommerce service for
hosting its online store. A few months ago this third party commerce
site began using PGP file encryption on XML files (e.g. web orders)
transferred to us as part of the ongoing PCI security compliance.
Basically we only need to add a PGP decryption process before we can
parse the incoming XML files so there should not have been any
technical issue.

However, we noticed that XML files they created since PGP encryption
was implemented contain some unusual character entities.

For example, if a XML file have elements containing characters such as
<, >, &, -, /, ' and so on, the XML file will use the following
character entities to represent them as shown below:

Character Unusal Character Entities
< &amp;lt;
> &amp;gt;
& &amp;amp;
- &amp;#45;
/ &amp;#47;
' &amp#39;

No matter how you look at them, they are NOT the proper character
entities for the original characters shown.

The problem with these bad character entities is that when we use .Net
Framework components such as XmlReader to load the XML file, character
entities are not expanded back to the original characters they
represent.

Instead I would get the following result:

Unusal Character Entities Expanded Result:
&amp;lt; &lt;
&amp;gt; &gt;
&amp;#38; &
&amp;#45; -
&amp;#47; /
&amp;#39; '

If you take a close look at the expanded results, you would see that
they are the normal character entities you would expect to see.

It seems to me that XML export process used by the ecommerce site has
applied character entities "encoding" twice.

For example, the proper character entity for / is /.
However, if you treat / as data string and not as character entity
and apply another "encoding", you would get &amp;#47;.

This means that whenever a online customer enter characters such as &
or / in their name or shipping address, the XML file we parsed will
not give us the correct text.

For example, if customer entered "Christian & Cruz" on their shipping
address the XML file we downloaded will show them as "Christian
&amp;#38; Cruz". And when the XML file is parsed the resulting string
we get would be "Christian & Cruz".

Another example. If a customer entered "c/o R. Fenton, M.D." in their
shipping address, the XML file will show this string as "c&amp;#47;o
R. Fenton, M.D.". And the resulting string we parsed would be
"c/o R. Fenton, M.D.".

When we reported this problem to the ecommerse hosting company, their
response was that these character entities were "encoded" per PCI
security policy and thus they have no plan to "fix" them.

Their reply sounds strange because these weird character entities they
use in XML files are NOT data encryption nor do they provide security
benefits.

Can anyone tell me if there is in fact some kind of special character
entities used in XML file per PCI security compliancy?

Or is our ecommerce hosting company wrong?

Any information would be appreciated.
Thank you.
Well we have similar files and I've never seen that happen. As you say they
seem to be escaping twice. In my opinion they're wrong but I'd need to know
their process etc.
Pragmatically you may need to un-escape once before treating the file as
XML.

--

Joe Fawcett (MVP - XML)
http://joe.fawcett.name

Aug 8 '08 #2


<te*****@ucla.eduwrote in message
news:55********************************@4ax.com...
Hi all.

This is a rather long posting but I have some questions concerning the
usage of character entities in XML documents and PCI security
compliance.

The company I work for is using a third party ecommerce service for
hosting its online store. A few months ago this third party commerce
site began using PGP file encryption on XML files (e.g. web orders)
transferred to us as part of the ongoing PCI security compliance.
Basically we only need to add a PGP decryption process before we can
parse the incoming XML files so there should not have been any
technical issue.

However, we noticed that XML files they created since PGP encryption
was implemented contain some unusual character entities.

For example, if a XML file have elements containing characters such as
<, >, &, -, /, ' and so on, the XML file will use the following
character entities to represent them as shown below:

Character Unusal Character Entities
< &amp;lt;
> &amp;gt;
& &amp;amp;
- &amp;#45;
/ &amp;#47;
' &amp#39;

No matter how you look at them, they are NOT the proper character
entities for the original characters shown.

The problem with these bad character entities is that when we use .Net
Framework components such as XmlReader to load the XML file, character
entities are not expanded back to the original characters they
represent.

Instead I would get the following result:

Unusal Character Entities Expanded Result:
&amp;lt; &lt;
&amp;gt; &gt;
&amp;#38; &
&amp;#45; -
&amp;#47; /
&amp;#39; '

If you take a close look at the expanded results, you would see that
they are the normal character entities you would expect to see.

It seems to me that XML export process used by the ecommerce site has
applied character entities "encoding" twice.

For example, the proper character entity for / is /.
However, if you treat / as data string and not as character entity
and apply another "encoding", you would get &amp;#47;.

This means that whenever a online customer enter characters such as &
or / in their name or shipping address, the XML file we parsed will
not give us the correct text.

For example, if customer entered "Christian & Cruz" on their shipping
address the XML file we downloaded will show them as "Christian
&amp;#38; Cruz". And when the XML file is parsed the resulting string
we get would be "Christian & Cruz".

Another example. If a customer entered "c/o R. Fenton, M.D." in their
shipping address, the XML file will show this string as "c&amp;#47;o
R. Fenton, M.D.". And the resulting string we parsed would be
"c/o R. Fenton, M.D.".

When we reported this problem to the ecommerse hosting company, their
response was that these character entities were "encoded" per PCI
security policy and thus they have no plan to "fix" them.

Their reply sounds strange because these weird character entities they
use in XML files are NOT data encryption nor do they provide security
benefits.

Can anyone tell me if there is in fact some kind of special character
entities used in XML file per PCI security compliancy?

Or is our ecommerce hosting company wrong?

Any information would be appreciated.
Thank you.
Well we have similar files and I've never seen that happen. As you say they
seem to be escaping twice. In my opinion they're wrong but I'd need to know
their process etc.
Pragmatically you may need to un-escape once before treating the file as
XML.

--

Joe Fawcett (MVP - XML)
http://joe.fawcett.name

Aug 8 '08 #3
On Fri, 8 Aug 2008 07:55:19 +0100, "Joe Fawcett"
<jo********@newsgroup.nospamwrote:
>Well we have similar files and I've never seen that happen. As you say they
seem to be escaping twice. In my opinion they're wrong but I'd need to know
their process etc.
Pragmatically you may need to un-escape once before treating the file as
XML.
I think I will just do what you suggested and write an extra process
to convert ("un-escape") bad character entities to proper entities
first before passing parsing XML files.

At least I am glad that someone agrees with me that the third party
ecommerce site is not exporting proper character entnites in their XML
file. They refused to fix the problem and used PCI security policy as
their excuse.

I spent several hours on Google tyring to find if there is any
relevancy at all between the use of XML character entities and PCI
security. And I found none.
Aug 8 '08 #4
Joe Fawcett wrote:
[snip]
Well we have similar files and I've never seen that happen. As you say
they seem to be escaping twice. In my opinion they're wrong but I'd need
to know their process etc.
I would suspect they are not used to dealing with XML, and have been
told by some less-than-well-informed person that "you always have to do
this with those funny characters in web pages". But as Joe says, without
knowing their process it's hard to be sure.

What *is* sure is that they are wrong to do this. The file when
decrypted should be the file that was encrypted. They have corrupted it,
and they must stop doing that.
Pragmatically you may need to un-escape once before treating the file as
XML.
That may not be possible if parts of the document already use numeric
character references or the &amp;amp; escapement for other reasons (eg
in CDATA sections). But with luck you may just be able to reconvert it
until your hosting bods fix the bug.

///Peter

Aug 8 '08 #5
te*****@ucla.edu wrote:
At least I am glad that someone agrees with me that the third party
ecommerce site is not exporting proper character entnites in their XML
file. They refused to fix the problem and used PCI security policy as
their excuse.
Then they are guilty of adding insolence to their ignorance.
I'd get out of using them as quickly as possible.
Can you please let us know who they are so that we can avoid them?
I spent several hours on Google tyring to find if there is any
relevancy at all between the use of XML character entities and PCI
security. And I found none.
There is none.

///Peter
Aug 9 '08 #6
On Sat, 09 Aug 2008 16:51:49 +0100, Peter Flynn
<pe********@m.silmaril.iewrote:
>Can you please let us know who they are so that we can avoid them?
If you want to know, the ecommerce service provider is MarketLive.
According to our management, they are one of the better ecommerce
providers out there and the reason our company use them.

Since I have not been able to find similar problems on Google, I have
a feeling it's just bad luck that MarketLive is exporting improper XML
files to us (and probably only us) perhaps because of mistakes by
their programmers. And their technical support manager who is in
charge of handling our technical support issues insists that those
character entities are part of their PCI security policy.
Aug 11 '08 #7
On Fri, 08 Aug 2008 23:49:24 +0100, Peter Flynn
<pe********@m.silmaril.iewrote:
>I would suspect they are not used to dealing with XML, and have been
told by some less-than-well-informed person that "you always have to do
this with those funny characters in web pages". But as Joe says, without
knowing their process it's hard to be sure.
The ecommerce provider is actually very knowledgeable as far as XML is
concerned. When comapred to other provider we have delt with in the
past, they use a very large and complicated set of XML schemas which
appear to be well thought.
>What *is* sure is that they are wrong to do this. The file when
decrypted should be the file that was encrypted. They have corrupted it,
and they must stop doing that.
I agree but I am powerless to convince them that they are wrong.
Aug 11 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: news.hunterlink.net.au | last post by:
(* note the escaped ampersand and the character reference have extra spaces to avoid being converted when viewed) I have a job that requires the following <ThisElement>Here is some text & a m...
11
by: Albretch | last post by:
Hi HTML gurus, I understand that you would use HTML character entities for &auml; and &euro; but why on earth would anyone encode: a colon: ":", a semicolon ";", or a gramatical period...
76
by: Zenobia | last post by:
How do I display character 151 (long hyphen) in XHTML (utf-8) ? Is there another character that will substitute? The W3C validation parser, http://validator.w3.org, tells me that this character...
19
by: Ian | last post by:
I'm using the following meta tag with my documents: <meta http-equiv="Content-Type" content= "text/html; charset=us-ascii" /> and yet using character entities like &rsquo; and &mdash; It...
50
by: The Bicycling Guitarist | last post by:
A browser conforming to HTML 4.0 is required to recognize &#number; notations. If I use XHTML 1.0 and charset UTF-8 though, does &eacute; have as much support as é ? Sometimes when I run...
40
by: Shmuel (Seymour J.) Metz | last post by:
I'd like to include some Hebrew names in a web page. HTML 4 doesn't appear to include character attributes for ISO-8859-8. I'd prefer avoiding numeric references, e.g.,...
2
by: Diilb | last post by:
I am using DOM to create an rss feed. The problem I am running into is "special characters" such as é è ç. If I try adding them to the XML as character data (CData), DOM chokes and throws out...
3
by: bsagert | last post by:
Some web feeds use decimal character entities that seem to confuse Python (or me). For example, the string "doesn't" may be coded as "doesn’t" which should produce a right leaning apostrophe....
7
by: tempest | last post by:
Hi all. This is a rather long posting but I have some questions concerning the usage of character entities in XML documents and PCI security compliance. The company I work for is using a...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.