By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,537 Members | 1,750 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,537 IT Pros & Developers. It's quick & easy.

How to fix PHP/HTML webpages that display Word resumes with funky characters

P: n/a
I have a textarea where people can cut & paste their resume.
Unfortunately they often cut & paste their Word resume into the
textarea, funky characters and all.

This causes the display to be mangled from the HTML end when people
view pages with these resumes stored as a MySQL text field entry.

How do I fix this, also, how do I fix the displays of those already
entered this way?

Thanx
Phil

Mar 20 '06 #1
Share this Question
Share on Google+
12 Replies


P: n/a
By "funky characters" I assume you mean the letter 'e' with the cute
little French decoration on top. Try this:

$new_string=str_replace('','e',$old_string);

To replace the decorated e's with plain vanilla e's.

(The first e is the decorated on, in case it doesn't show up in your
newsreader. For windows users that's entered by holding down the alt
ket and typing 0233 on the num pad.)

You might also need to do it for the upper case 'E' inc ase people
write "RESUM" in caps.

--gary

Mar 20 '06 #2

P: n/a
Yikes! That's scary! Fraid I have no suggestions for that situation,
unless you can identify the cases where it happens and actualy do a
string replace on that long cryptic string. If it's invariant, that is,
which it probably isn't. :-( --gary

Mar 20 '06 #3

P: n/a
It is a huge task to decode Word from it's proprietary format to plain
text or regular old HTML. All online resume's I've submitted in a form
have required me to paste in either a plain text or rich text format.
That would be the easiest approach. User just has to Save as... from
the Word File menu.
comp.lang.php wrote:
fiziwig wrote:
By "funky characters" I assume you mean the letter 'e' with the cute
little French decoration on top. Try this:

$new_string=str_replace('','e',$old_string);

To replace the decorated e's with plain vanilla e's.

(The first e is the decorated on, in case it doesn't show up in your
newsreader. For windows users that's entered by holding down the alt
ket and typing 0233 on the num pad.)

You might also need to do it for the upper case 'E' inc ase people
write "RESUM" in caps.

--gary


No that's not the case. What is happening is when someone copies and
pastes a Word document "as-is", like with its own proprietary spacing,
fonts, etc., this is what you'll see on an HTML page:

University of North Carolina
?��?¯?¿?½?��?�?¢?�ï ¿½?�?¯?��?�?¿?��?� ?½?��?�?¯?��?�?¿?�ï ¿½?�?½
Charlotte

Phil


Mar 21 '06 #4

P: n/a

Roger Dodger wrote:
It is a huge task to decode Word from it's proprietary format to plain
text or regular old HTML. All online resume's I've submitted in a form
have required me to paste in either a plain text or rich text format.
That would be the easiest approach. User just has to Save as... from
the Word File menu.


Right. I know looking forward that I can give the person the option to
upload their Word doc or PDF (a security issue in and of itself -
yikes!) alongside cutting & pasting, but there are two unanswered
questions:

1) What about those already in the database as text field values? What
do I do about those?

2) What is to stop the "challenged among us" from cutting & pasting a
Word doc even though they have the option to upload, aside from telling
them to do so?

Phil
comp.lang.php wrote:
fiziwig wrote:
By "funky characters" I assume you mean the letter 'e' with the cute
little French decoration on top. Try this:

$new_string=str_replace('','e',$old_string);

To replace the decorated e's with plain vanilla e's.

(The first e is the decorated on, in case it doesn't show up in your
newsreader. For windows users that's entered by holding down the alt
ket and typing 0233 on the num pad.)

You might also need to do it for the upper case 'E' inc ase people
write "RESUM" in caps.

--gary


No that's not the case. What is happening is when someone copies and
pastes a Word document "as-is", like with its own proprietary spacing,
fonts, etc., this is what you'll see on an HTML page:

University of North Carolina
?��?¯?¿?½?��?�?¢?�ï ¿½?�?¯?��?�?¿?��?� ?½?��?�?¯?��?�?¿?�ï ¿½?�?½
Charlotte

Phil


Mar 21 '06 #5

P: n/a
To prevent people from cutting and pasting Word docs, why not just scan
the contents of the textarea for some teltale characters or sequences
that would indicate a Word doc, and then treat that as an error and not
put it into the DB.

--gary

Mar 21 '06 #6

P: n/a

fiziwig wrote:
To prevent people from cutting and pasting Word docs, why not just scan
the contents of the textarea for some teltale characters or sequences
that would indicate a Word doc, and then treat that as an error and not
put it into the DB.

--gary


If that were a single pattern, then I would do so, but I don't know
what that pattern is, moreover, then what prevents them from copying
and pasting literally anything else they can think of?

Phil

Mar 21 '06 #7

P: n/a
Perhaps limit them to pasting in text that contains only some set of
printable characters. a..z, A..Z, 0..9, and the standard punctuation
and math symbols ,.:;'"+-=><&^%... etc. If any character is found that
is not in the set then reject the text. Presumably those Word docs have
some binary info in them, or at least it looks like they contained a
lot of characters not in the standard set. Of course you'd have to take
other languages into account if you planned to allowing posting in
other than English.

--gary

Mar 22 '06 #8

P: n/a

fiziwig wrote:
Perhaps limit them to pasting in text that contains only some set of
printable characters. a..z, A..Z, 0..9, and the standard punctuation
and math symbols ,.:;'"+-=><&^%... etc. If any character is found that
is not in the set then reject the text. Presumably those Word docs have
some binary info in them, or at least it looks like they contained a
lot of characters not in the standard set. Of course you'd have to take
other languages into account if you planned to allowing posting in
other than English.

--gary


Exactly, and that will make that kind of check nearly impossible to
perform. Besides, you have to remember that their resume already
exists as ASCII (just screwed-up ASCII) because it will display FROM
the database table text field query. That is, when you get the resume,
it's already in ASCII, so there are no non-ASCII characters to find
anymore, just a bunch of funky, screwed-up, yet 100% ASCII characters.
Those already-submitted resumes are the problem we're dealing with,
preventing others from doing so is only half the battle.

Phil

Mar 22 '06 #9

P: n/a
"comp.lang.php" <ph**************@gmail.com> wrote:
I have a textarea where people can cut & paste their resume.
Unfortunately they often cut & paste their Word resume into the
textarea, funky characters and all.

This causes the display to be mangled from the HTML end when people
view pages with these resumes stored as a MySQL text field entry.

How do I fix this, also, how do I fix the displays of those already
entered this way?


You might consider using this as an intelligence test when evaluating the
resume...
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Mar 23 '06 #10

P: n/a
"comp.lang.php" <ph**************@gmail.com> wrote:
Exactly, and that will make that kind of check nearly impossible to
perform.


Do all your character stripping and such, then present the information
back to them and activate a "Are you sure ?" button.

What's to prevent someone from typing in someone elses resume ?
Do you collect information such as a phone number or email address?

IF you collect an email address, send an automatic response to the
enduser which asks them to click upon an activation link.

You can store the garbage in an unapproved table and once it's
been approved you can move it to the legitimate recordset.

Hope this helps.

Jim Carlock
Post replies to the group.
Mar 23 '06 #11

P: n/a
comp.lang.php wrote:
I have a textarea where people can cut & paste their resume.
Unfortunately they often cut & paste their Word resume into the
textarea, funky characters and all.

This causes the display to be mangled from the HTML end when people
view pages with these resumes stored as a MySQL text field entry.

How do I fix this, also, how do I fix the displays of those already
entered this way?

Thanx
Phil


Phil,

I guess I look at the problem differently.

If I request a resume in Word format, I expect it in Word format. If someone
else sends in a plain text file, they aren't even considered for employment.
And vice versa.

I mean - they're trying to find a job. If you ask them for plain text and they
can't even follow that simple direction, could they follow more complicated
instructions? Would you want to hire them?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Mar 23 '06 #12

P: n/a

Jerry Stuckle wrote:
comp.lang.php wrote:
I have a textarea where people can cut & paste their resume.
Unfortunately they often cut & paste their Word resume into the
textarea, funky characters and all.

This causes the display to be mangled from the HTML end when people
view pages with these resumes stored as a MySQL text field entry.

How do I fix this, also, how do I fix the displays of those already
entered this way?

Thanx
Phil

Phil,

I guess I look at the problem differently.

If I request a resume in Word format, I expect it in Word format. If someone
else sends in a plain text file, they aren't even considered for employment.
And vice versa.

I mean - they're trying to find a job. If you ask them for plain text and they
can't even follow that simple direction, could they follow more complicated
instructions? Would you want to hire them?


You do have a point. Unfortunately, it was never a requirement before
for them to upload a non-text resume, in fact, it wasn't set up so that
they could do so beforehand until recently, so they had no choice but
to copy and paste, even if we told them text-only.

However, I can see how if the instructions say "copy and paste a
text-based resume only" and you can only cut and paste, perhaps you
might think that a Word resume isn't text.

But future managers in college don't know this. At least those we've
encountered.

Be afraid.

Phil
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================


Mar 24 '06 #13

This discussion thread is closed

Replies have been disabled for this discussion.