Mapforce: mapping to CSV without column header line inserts hex FF FE FF FE

Lukas

Hi Group,

In Mapforce 2005 R3, when mapping to CSV with the "First row contains
field names" option UN-checked on the CSV target component settings,
the characters (hex) FF FE FF FE are inserted in the beginning of the
first line when running Java code autogenerated by Mapforce.

In the output tab of the Mapforce application, this problem doesn't
occur. I've not checked whether it occurs when running C#,C++ or XSLT
autogenerated code.

I've encountered this problem when mapping XML to CSV and CSV to CSV.

Does anyone know whether this is this a known bug? Is it fixed in a
later release?
Any known workarounds?

Not holding my breath,

Lukas

Dec 9 '05 #1

Subscribe Post Reply

5029

Lukas

Correction:

My editor was displaying those bytes incorrectly.
The bytes inserted are actually:

EF BB BF

Dec 12 '05 #2

Peter Flynn

Lukas wrote:

Hi Group,

In Mapforce 2005 R3, when mapping to CSV with the "First row contains
field names" option UN-checked on the CSV target component settings,
the characters (hex) FF FE FF FE are inserted in the beginning of the
first line when running Java code autogenerated by Mapforce.

In the output tab of the Mapforce application, this problem doesn't
occur. I've not checked whether it occurs when running C#,C++ or XSLT
autogenerated code.

I've encountered this problem when mapping XML to CSV and CSV to CSV.

Does anyone know whether this is this a known bug? Is it fixed in a
later release?
Any known workarounds?

It's not a bug, it's part of XML. It's the Byte Order Mark (BOM) which
is designed to signal to a processor before processing starts which
16-bit character encoding is in use. It's being output because your
processor is emitting UCS-2 which is probably unnecessary unless you
are using a very wide range of character repertoire planes. Check the
Mapforce output settings and switch to UTF-8 instead.

///Peter
--
See FAQ: http://xml.silmaril.ie/appendix/glossary/#bom

Dec 12 '05 #3

Richard Tobin

In article <11**********************@g14g2000cwa.googlegroups .com>,
Lukas <lu*******@yahoo.com> wrote:

My editor was displaying those bytes incorrectly.
The bytes inserted are actually:

EF BB BF

I can't help you directly, but EF BB BF is the UTF-8 code for a
byte-order mark (or "BOM"). Maybe you can look that up in the manual
for your software.

-- Richard

Dec 13 '05 #4

Lukas

Sorry for the confusion. The sequence was actually EF BB BF (UTF-8 BOM,
as Richard notes).

What confuses me about the UTF-8 BOM issue:

A) In XML: Since I'm using UTF-8, which is a 7 bit encoding, and the
xml processing instruction says so explicitly, why would I want to have
nasty binary at the start of my document?

B)
* In Text (CSV): some articles claim that Windows Notepad handles the
BOM gracefully, but in our project the issue would've not even been
raised if our editors had not displayed spurious characters;
... "ï»¿" (if you view this in ISO 8859-1) in Notepad, a dot in
Ultraedit 8.2. When switching to hex in Ultraedit, completely wrong
values are being displayed throug the length of the doc.

* The issue did not occur when (in Mapforce) the option "First row
contains field names" was checked for the output CSV, although we
viewed the output files with the same editors.

* Mapforce ITSELF doesn't handle the BOM gracefully. If the CSV output
with BOM from one Mapforce code-gen mapping is fed as input to another,
the BOM is visible in the first field and trips up functions operating
on that field.

Dec 14 '05 #5

Lukas

Sorry, something doesn't display in my last post. It's meant to read:

...

* * * * * * *
* * * *
* * * *
* * * *
* * * *
* * * * *
* * * ****

(if you view this in ISO 8859-1) in Notepad, a dot ...

Dec 14 '05 #6

Richard Tobin

In article <11**********************@g44g2000cwa.googlegroups .com>,
Lukas <lu*******@yahoo.com> wrote:

A) In XML: Since I'm using UTF-8, which is a 7 bit encoding, and the
xml processing instruction says so explicitly, why would I want to have
nasty binary at the start of my document?
UTF-8 is not a 7-bit encoding! It corresponds to ASCII for characters
up to 127, but uses bytes with the high bit set to encode the rest of
Unicode.
* In Text (CSV): some articles claim that Windows Notepad handles the
BOM gracefully, but in our project the issue would've not even been
raised if our editors had not displayed spurious characters;
.. "ï»¿" (if you view this in ISO 8859-1) in Notepad

I don't know anything about Notepad, but if you see those characters -
i with diaeresis, double greater-than, inverted question mark - it
means that the program is interpreting the document as 8859-1 rather
than UTF-8. Of course, the whole point of the UTF-8 BOM is to let it
know that it's in UTF-8!

-- Richard

Dec 14 '05 #7

Peter Flynn

Lukas wrote:

Sorry for the confusion. The sequence was actually EF BB BF (UTF-8
BOM, as Richard notes).

What confuses me about the UTF-8 BOM issue:

A) In XML: Since I'm using UTF-8, which is a 7 bit encoding,
Whoah there. UTF-8 uses all 8 bits in the byte. Where did you get the
information that it's 7-bit? The only 7-bit encoding in widespread
use is US-ASCII.
and the
xml processing instruction says so explicitly, why would I want to
have nasty binary at the start of my document?
To identify that it is UTF-8 as opposed to UTF-16 or UTF-32.
If your XML software can't handle it, it's broken and should be
replaced.
B)
* In Text (CSV): some articles claim that Windows Notepad handles the
BOM gracefully, but in our project the issue would've not even been
raised if our editors had not displayed spurious characters;
.. "ï»¿" (if you view this in ISO 8859-1) in Notepad, a dot in
Ultraedit 8.2. When switching to hex in Ultraedit, completely wrong
values are being displayed throug the length of the doc.
While most plaintext editors will display ASCII or ISO-8859-1
adequately, large numbers of them spit blood when faced with anything
else. Notepad is suitable for shopping lists and not much else.
* The issue did not occur when (in Mapforce) the option "First row
contains field names" was checked for the output CSV, although we
viewed the output files with the same editors.

* Mapforce ITSELF doesn't handle the BOM gracefully. If the CSV output
with BOM from one Mapforce code-gen mapping is fed as input to
another, the BOM is visible in the first field and trips up functions
operating on that field.

Sounds like Mapforce is broken and you should complain to the vendor.

///Peter
--
XML FAQ: http://xml.silmaril.ie/

Dec 14 '05 #8

Shmuel (Seymour J.) Metz

In <dn**********@pc-news.cogsci.ed.ac.uk>, on 12/14/2005
at 12:59 PM, ri*****@cogsci.ed.ac.uk (Richard Tobin) said:

I don't know anything about Notepad, but if you see those characters
-
i with diaeresis, double greater-than, inverted question mark - it
means that the program is interpreting the document as 8859-1 rather
than UTF-8. Of course, the whole point of the UTF-8 BOM is to let it
know that it's in UTF-8!

Why would you need a BOM for UTF-8? It's only needed for characters
larger than an octet, e.g., UTF-16, raw UCS4.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to sp******@library.lspace.org

Dec 19 '05 #9

Richard Tobin

In article <43***************************@news.patriot.net> ,
Shmuel (Seymour J.) Metz <sp******@library.lspace.org.invalid> wrote:

Why would you need a BOM for UTF-8? It's only needed for characters
larger than an octet, e.g., UTF-16, raw UCS4.

It also serves to indicate the encoding, as well as which byte-order
variant.

-- Richard

Dec 19 '05 #10

Shmuel (Seymour J.) Metz

In <do***********@pc-news.cogsci.ed.ac.uk>, on 12/19/2005
at 08:58 PM, ri*****@cogsci.ed.ac.uk (Richard Tobin) said:

It also serves to indicate the encoding, as well as which byte-order
variant

What byte-order variant? UTF-8 uses a stream of 8-bit bytes (octets),
not a stream of 16-bit bytes; there is no byte ordering issue. The BOM
is needed for UTF-16 and raw Unicode, not for UTF-8.

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to sp******@library.lspace.org

Jan 3 '06 #11

Richard Tobin

In article <43***************************@news.patriot.net> ,
Shmuel (Seymour J.) Metz <sp******@library.lspace.org.invalid> wrote:

It also serves to indicate the encoding, as well as which byte-order
variant
What byte-order variant? UTF-8 uses a stream of 8-bit bytes (octets),
not a stream of 16-bit bytes; there is no byte ordering issue.

The obvious use of a BOM - as the name implies - is to indicate which
byte order variant of an encoding is being used. It is *also* used to
indicate the encoding itself. Obviously for UTF-8 only this second
fuction is relevant.

-- Richard

Jan 4 '06 #12

by: Michael Herman $Parallelspace$ | last post by:

Suppose I have two RDF files (and corresponding RDF Schema files) that are used to store calendar appointment information using different (RDF) schemas. Can Altova mapforce...

.NET Framework

Is this a good place for topics concerning Altova Mapforce?

by: lukaslatz | last post by:

Hi group I'm fed up with the Altova Mapforce user forum because posts go missing and it's slow. Is there a group especially for vendor-specific topics/issues or are these things ok in...

.NET Framework

Altova Mapforce - xml 2 xml map: empty elements output although input element is not empty

by: Lukas | last post by:

title: xml to xml mapping: empty elements output although input element is not empty Why is is that when mapping from a XML schema to another XML schema, when drawing a default connection...

.NET Framework

SQL: INSERT INTO without specifying any column

by: Alexander | last post by:

Hi there! I need to write an sql statement, that inserts a blank (empty) row into a table. I tried "Insert Into tabname", "Insert Into tabname () VALUES ()" "Insert Into tabname VALUES ()"

Microsoft Access / VBA

multiline column header in DataGrid

by: Alex K. | last post by:

How do I define multiline column header in DataGrid control? I need two-line headers for some columns. Thank you

C# / C Sharp

Datagrid column size/width without headerstyle

by: Drew | last post by:

Hey, thanks in advance for helping me out with my problem: I have a datagrid which is embedded in another datagrid. The datagrid is filled directly by a dataset generated from a sql query. So...

ASP.NET

header questions

by: Jeff | last post by:

I'm having some trouble wrapping my mind around header(), I'm used to just printing the header and leaving a blank line. Does php ignore whitespace before a header: <?php // no output here...

PHP

Deviation from object-relational mapping (pySQLFace)

by: sulyokpeti | last post by:

I have made a simple python module to handle SQL databases: https://fedorahosted.org/pySQLFace/wiki Its goal to separate relational database stuff (SQL) from algorythmic code (python). A SQLFace...

Python

php form field inserts return empty string

by: wizardry | last post by:

hello - i've created a form that has multiple inserts. it inserts the data fine if i manually parse the data to it but when i use the form to test the inserts it errors out. it errors out at...

PHP

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Mapforce: mapping to CSV without column header line inserts hex FF FE FF FE

Similar topics