473,320 Members | 1,958 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

How to fix PHP/HTML webpages that display Word resumes with funky characters

I have a textarea where people can cut & paste their resume.
Unfortunately they often cut & paste their Word resume into the
textarea, funky characters and all.

This causes the display to be mangled from the HTML end when people
view pages with these resumes stored as a MySQL text field entry.

How do I fix this, also, how do I fix the displays of those already
entered this way?

Thanx
Phil

Mar 20 '06 #1
12 3810
By "funky characters" I assume you mean the letter 'e' with the cute
little French decoration on top. Try this:

$new_string=str_replace('é','e',$old_string);

To replace the decorated e's with plain vanilla e's.

(The first e is the decorated on, in case it doesn't show up in your
newsreader. For windows users that's entered by holding down the alt
ket and typing 0233 on the num pad.)

You might also need to do it for the upper case 'E' inc ase people
write "RESUMÉ" in caps.

--gary

Mar 20 '06 #2
Yikes! That's scary! Fraid I have no suggestions for that situation,
unless you can identify the cases where it happens and actualy do a
string replace on that long cryptic string. If it's invariant, that is,
which it probably isn't. :-( --gary

Mar 20 '06 #3
It is a huge task to decode Word from it's proprietary format to plain
text or regular old HTML. All online resume's I've submitted in a form
have required me to paste in either a plain text or rich text format.
That would be the easiest approach. User just has to Save as... from
the Word File menu.
comp.lang.php wrote:
fiziwig wrote:
By "funky characters" I assume you mean the letter 'e' with the cute
little French decoration on top. Try this:

$new_string=str_replace('é','e',$old_string);

To replace the decorated e's with plain vanilla e's.

(The first e is the decorated on, in case it doesn't show up in your
newsreader. For windows users that's entered by holding down the alt
ket and typing 0233 on the num pad.)

You might also need to do it for the upper case 'E' inc ase people
write "RESUMÉ" in caps.

--gary


No that's not the case. What is happening is when someone copies and
pastes a Word document "as-is", like with its own proprietary spacing,
fonts, etc., this is what you'll see on an HTML page:

University of North Carolina
����¯�¿�½������¢��ï ¿½���¯������¿�����à ?½������¯������¿��ï ¿½���½
Charlotte

Phil


Mar 21 '06 #4

Roger Dodger wrote:
It is a huge task to decode Word from it's proprietary format to plain
text or regular old HTML. All online resume's I've submitted in a form
have required me to paste in either a plain text or rich text format.
That would be the easiest approach. User just has to Save as... from
the Word File menu.


Right. I know looking forward that I can give the person the option to
upload their Word doc or PDF (a security issue in and of itself -
yikes!) alongside cutting & pasting, but there are two unanswered
questions:

1) What about those already in the database as text field values? What
do I do about those?

2) What is to stop the "challenged among us" from cutting & pasting a
Word doc even though they have the option to upload, aside from telling
them to do so?

Phil
comp.lang.php wrote:
fiziwig wrote:
By "funky characters" I assume you mean the letter 'e' with the cute
little French decoration on top. Try this:

$new_string=str_replace('é','e',$old_string);

To replace the decorated e's with plain vanilla e's.

(The first e is the decorated on, in case it doesn't show up in your
newsreader. For windows users that's entered by holding down the alt
ket and typing 0233 on the num pad.)

You might also need to do it for the upper case 'E' inc ase people
write "RESUMÉ" in caps.

--gary


No that's not the case. What is happening is when someone copies and
pastes a Word document "as-is", like with its own proprietary spacing,
fonts, etc., this is what you'll see on an HTML page:

University of North Carolina
����¯�¿�½������¢��ï ¿½���¯������¿�����à ?½������¯������¿��ï ¿½���½
Charlotte

Phil


Mar 21 '06 #5
To prevent people from cutting and pasting Word docs, why not just scan
the contents of the textarea for some teltale characters or sequences
that would indicate a Word doc, and then treat that as an error and not
put it into the DB.

--gary

Mar 21 '06 #6

fiziwig wrote:
To prevent people from cutting and pasting Word docs, why not just scan
the contents of the textarea for some teltale characters or sequences
that would indicate a Word doc, and then treat that as an error and not
put it into the DB.

--gary


If that were a single pattern, then I would do so, but I don't know
what that pattern is, moreover, then what prevents them from copying
and pasting literally anything else they can think of?

Phil

Mar 21 '06 #7
Perhaps limit them to pasting in text that contains only some set of
printable characters. a..z, A..Z, 0..9, and the standard punctuation
and math symbols ,.:;'"+-=><&^%... etc. If any character is found that
is not in the set then reject the text. Presumably those Word docs have
some binary info in them, or at least it looks like they contained a
lot of characters not in the standard set. Of course you'd have to take
other languages into account if you planned to allowing posting in
other than English.

--gary

Mar 22 '06 #8

fiziwig wrote:
Perhaps limit them to pasting in text that contains only some set of
printable characters. a..z, A..Z, 0..9, and the standard punctuation
and math symbols ,.:;'"+-=><&^%... etc. If any character is found that
is not in the set then reject the text. Presumably those Word docs have
some binary info in them, or at least it looks like they contained a
lot of characters not in the standard set. Of course you'd have to take
other languages into account if you planned to allowing posting in
other than English.

--gary


Exactly, and that will make that kind of check nearly impossible to
perform. Besides, you have to remember that their resume already
exists as ASCII (just screwed-up ASCII) because it will display FROM
the database table text field query. That is, when you get the resume,
it's already in ASCII, so there are no non-ASCII characters to find
anymore, just a bunch of funky, screwed-up, yet 100% ASCII characters.
Those already-submitted resumes are the problem we're dealing with,
preventing others from doing so is only half the battle.

Phil

Mar 22 '06 #9
"comp.lang.php" <ph**************@gmail.com> wrote:
I have a textarea where people can cut & paste their resume.
Unfortunately they often cut & paste their Word resume into the
textarea, funky characters and all.

This causes the display to be mangled from the HTML end when people
view pages with these resumes stored as a MySQL text field entry.

How do I fix this, also, how do I fix the displays of those already
entered this way?


You might consider using this as an intelligence test when evaluating the
resume...
--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Mar 23 '06 #10
"comp.lang.php" <ph**************@gmail.com> wrote:
Exactly, and that will make that kind of check nearly impossible to
perform.


Do all your character stripping and such, then present the information
back to them and activate a "Are you sure ?" button.

What's to prevent someone from typing in someone elses resume ?
Do you collect information such as a phone number or email address?

IF you collect an email address, send an automatic response to the
enduser which asks them to click upon an activation link.

You can store the garbage in an unapproved table and once it's
been approved you can move it to the legitimate recordset.

Hope this helps.

Jim Carlock
Post replies to the group.
Mar 23 '06 #11
comp.lang.php wrote:
I have a textarea where people can cut & paste their resume.
Unfortunately they often cut & paste their Word resume into the
textarea, funky characters and all.

This causes the display to be mangled from the HTML end when people
view pages with these resumes stored as a MySQL text field entry.

How do I fix this, also, how do I fix the displays of those already
entered this way?

Thanx
Phil


Phil,

I guess I look at the problem differently.

If I request a resume in Word format, I expect it in Word format. If someone
else sends in a plain text file, they aren't even considered for employment.
And vice versa.

I mean - they're trying to find a job. If you ask them for plain text and they
can't even follow that simple direction, could they follow more complicated
instructions? Would you want to hire them?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Mar 23 '06 #12

Jerry Stuckle wrote:
comp.lang.php wrote:
I have a textarea where people can cut & paste their resume.
Unfortunately they often cut & paste their Word resume into the
textarea, funky characters and all.

This causes the display to be mangled from the HTML end when people
view pages with these resumes stored as a MySQL text field entry.

How do I fix this, also, how do I fix the displays of those already
entered this way?

Thanx
Phil

Phil,

I guess I look at the problem differently.

If I request a resume in Word format, I expect it in Word format. If someone
else sends in a plain text file, they aren't even considered for employment.
And vice versa.

I mean - they're trying to find a job. If you ask them for plain text and they
can't even follow that simple direction, could they follow more complicated
instructions? Would you want to hire them?


You do have a point. Unfortunately, it was never a requirement before
for them to upload a non-text resume, in fact, it wasn't set up so that
they could do so beforehand until recently, so they had no choice but
to copy and paste, even if we told them text-only.

However, I can see how if the instructions say "copy and paste a
text-based resume only" and you can only cut and paste, perhaps you
might think that a Word resume isn't text.

But future managers in college don't know this. At least those we've
encountered.

Be afraid.

Phil
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================


Mar 24 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: LRW | last post by:
I manage our mySQL database through putty (SSH terminal client). And whenever I do a select * from the table that contains ENCODEd passwords, the funky characters do funky things with the display....
5
by: Donald Firesmith | last post by:
Are html tags allowed within meta tags? Specifically, if I have html tags within a <definition> tag within XML, can I use the definition as the content within the <meta content="description> tag? ...
20
by: Al Moritz | last post by:
Hi all, I was always told that the conversion of Word files to HTML as done by Word itself sucks - you get a lot of unnecessary code that can influence the design on web browsers other than...
4
by: someone | last post by:
hi A friend just sent me a text translation in norwegian, that she saved with WORD 9, as an html file It's loaded with Microsoft code like this : <p class=MsoNormal><span...
81
by: sinister | last post by:
I wanted to spiff up my overly spartan homepage, and started using some CSS templates I found on a couple of weblogs. It looks fine in my browser (IE 6.0), but it doesn't print right. I tested...
1
by: gene.ellis | last post by:
Put simply, I have a text box, and people commonly cut + paste information into this text box from Microsoft word. The problem is that word has all types of funky characters (smart quotes,...
7
by: kapdan01 | last post by:
Greetings, I am presently trying to display chinese characters on a webpage (HTML). I have tried changing the charset to big5, UTF-8 to no avail. Word documents display correctly, as does...
2
by: CM | last post by:
Hi, Could anyone please help me? I am completing my Master's Degree and need to reproduce a Webpage in Word. Aspects of the page are lost and some of the text goes. I would really appreciate it....
0
by: DougBatch | last post by:
I am storing resumes in SQL2005 so that I can make use of full text search. The resumes (word and txt docs) are stored as images. I would like to be able to highlight the keywords that are found in...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.