By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,057 Members | 1,407 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,057 IT Pros & Developers. It's quick & easy.

Conversion of historical material to HTML

P: n/a
Hi all,

A number of questions:

1) I'm in the process of converting some historical type-written
newsletters to HTML and one of the things that came up while doing so
was the fact that some headings underline every other letter. I can
just go ahead and bold them instead, but that feels a bit 1984'ish, or
I can emulate this using <span> tags,
but this leads to horribly (too) long lines. Is there an easier
method?

2) Given the limitations of browsers, is there an easy way of getting
(as close as possible to) the original fontsizes. I guess using
appropriately set up span.xxx classes is the only way to influence
line-spacing?

3) I'm currently using CSS2 & <div>'s to get the paging right. This
works OK, but is there an easier way?

4) Does anyone have/know where to find some simple examples of pages
(probably using tables) using different font-sizes for the various
cells - I need to use them to emulate the small-size output of thermal
printers.

5) OT(ish) My scanner came with Paperport Deluxe V8 and ReadIris was
on the PC. Both are OK(ish) for this type of material, but choke on
more complicated (and I have heaps of it) stuff, can anyone recommend
some really good OCR software?

6) Would UltraEdit be a good choice to edit HTML/CSS or are there more
suitable alternatives? FWIW, I'm now using an old DOS editor that I
can use almost blindfolded, but that doesn't have any fancy features
(other than being tiny and fast) and has a line-length limit of 255
characters (see 1) above...

Reply here, but if you can cc: any reply to <pr***@onetel.net.uk> I
would appreciate it.

Thanks,

Robert
--
Robert AH Prins
pr***@onetel.net.uk
Jul 20 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Robert AH Prins wrote:
A number of questions:


Not really answering any of the questions: but what are your trying to
achieve here? I think you're going to put in lots of work for an
unsatisfactory result. My advice would be:

Convert the letters to the plainest HTML you can, with no attempt at
presentation or replicating the original look. The goal here is to convert
the information.

Next, if appropriate, prepare high-resolution scans of the originals,
either as individual images or as multi-page PDFs. This is to convert the
visual appearance, if that is an important part of the history.

--
Mark.
http://tranchant.plus.com/
Jul 20 '05 #2

P: n/a
On 20 Apr 2004 07:59:20 -0700, Robert AH Prins <pr***@bigfoot.com> wrote:
Hi all,

A number of questions:

1) I'm in the process of converting some historical type-written
newsletters to HTML and one of the things that came up while doing so
was the fact that some headings underline every other letter. I can
just go ahead and bold them instead, but that feels a bit 1984'ish, or
I can emulate this using <span> tags,
but this leads to horribly (too) long lines. Is there an easier
method?
What's your goal? To make the web page look just like the document? Can't
really be done, you can get close. Or to present the factual information
in the documents? Then mark up headings like headings.
2) Given the limitations of browsers, is there an easy way of getting
(as close as possible to) the original fontsizes. I guess using
appropriately set up span.xxx classes is the only way to influence
line-spacing?
Monitor resolution will be an issue here - you won't actually know the end
product, will you?
3) I'm currently using CSS2 & <div>'s to get the paging right. This
works OK, but is there an easier way?
The page layout? This is appropriate.
4) Does anyone have/know where to find some simple examples of pages
(probably using tables) using different font-sizes for the various
cells - I need to use them to emulate the small-size output of thermal
printers.


Even if you're adopting the look and feel of these old documents, remember
usability and accessibility. Don't make those fonts too small. You could
apply font-size to the specific element.

Bottom line - if you are looking to present historical documents, worry
about the accessibility of the content first, and the nuance of the
presentation later.
Jul 20 '05 #3

P: n/a
"Mark Tranchant" <ma**@tranchant.plus.com> wrote in message
news:40**************@tranchant.plus.com...
Robert AH Prins wrote:
A number of questions:
Not really answering any of the questions: but what are your trying

to achieve here? I think you're going to put in lots of work for an
unsatisfactory result. My advice would be:
My stepdaughter is doing most of the work to earn some extra
pocketmoney,
and there are no deadlines. It's a labour of love to keep this
material
available for posterity.
Convert the letters to the plainest HTML you can, with no attempt at
presentation or replicating the original look. The goal here is to convert the information.
I agree, but having an index and some cross-linking is extremely
useful.
Next, if appropriate, prepare high-resolution scans of the originals, either as individual images or as multi-page PDFs. This is to convert the visual appearance, if that is an important part of the history.


There are already PDFs availalable, but creating multi megabyte files
(the
largest in in excess of 3Mb) for a mere 6 pages A4 (text content about
24k) is utter madness, even more so because it is just a picture and
cannot
be searched.

Robert
--
Robert AH Prins
pr***@onetel.net.uk
Jul 20 '05 #4

P: n/a
pr***@bigfoot.com (Robert AH Prins) wrote:
1) I'm in the process of converting some historical type-written
newsletters to HTML and one of the things that came up while doing so
was the fact that some headings underline every other letter.
I would suggest avoiding any attempt to reproduce that in HTML. Any
underlining, even broken underline, would easily be misunderstood as
denoting a link in HTML. So it's better to use some other styling for a
heading, if needed. I would simply use a suitable element, like h2, and
add as much CSS as reasonable to make the appearance resemble the
original - e.g. in fonts, bolding, etc., but not issues like underlining.

The look & feel might matter, so some styling, even detailed, might be
nice. But don't overdo it.

But should you wish to imitate underlining, then CSS code like
h2 { border-bottom: dashed black thin; }
might be a reasonable compromise. It's not really underline but bottom
border, and that's one reason why it wouldn't be taken as link underline
so easily. Other options include setting suitable background image
containing just a short vertical line at the level of underlining and
some transparent stuff on the right, and repeating in x direction.
I can
just go ahead and bold them instead, but that feels a bit 1984'ish, or
I can emulate this using <span> tags,
but this leads to horribly (too) long lines.
It gets rather awkward. But you could put every other character between
<u> and </u>. I wouldn't be too puristic here.

You could also use increased font size instead of bolding, e.g.
h2 { font-weight: normal; font-size: 115%; }
2) Given the limitations of browsers, is there an easy way of getting
(as close as possible to) the original fontsizes.
Why would you do _that_? Don't fight against the strengths of the Web.
Just as the Web is a way to make the data technically accessible
worldwide over the network, not setting any font sizes (except relatively
e.g. for headings) is part of the way of making it humanly accessible to
people with different properties and preferences.
I guess using
appropriately set up span.xxx classes is the only way to influence
line-spacing?
You cannot influence line spacing in HTML - though you could create very
coarse simulations. Just use line-height in CSS.
3) I'm currently using CSS2 & <div>'s to get the paging right. This
works OK, but is there an easier way?


Paging, too, is an area where you should utilize the strengths of the Web
and not fight against them. Do you know what paper sizes people have, or
their printer settings?

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #5

P: n/a
"Jukka K. Korpela" <jk******@cs.tut.fi> wrote in message news:<Xn****************************@193.229.0.31> ...
pr***@bigfoot.com (Robert AH Prins) wrote:
1) I'm in the process of converting some historical type-written
newsletters to HTML and one of the things that came up while doing so
was the fact that some headings underline every other letter.
I would suggest avoiding any attempt to reproduce that in HTML. Any
underlining, even broken underline, would easily be misunderstood as
denoting a link in HTML.


That's something I didn't consider. I'll stick to bold.
So it's better to use some other styling for a
heading, if needed. I would simply use a suitable element, like h2, and
As I mentioned in my original post, these were type-written
newsletters,
the only text decoration they use are the alternate underline and ALL
CAPS.
add as much CSS as reasonable to make the appearance resemble the
original - e.g. in fonts, bolding, etc., but not issues like underlining.

The look & feel might matter, so some styling, even detailed, might be
nice. But don't overdo it.
I have no plans to go all the way. My stepdaughter is
scanning/correcting the
material, I'm just adding very basic HTML.
But should you wish to imitate underlining, then CSS code like
h2 { border-bottom: dashed black thin; }
Everything is enclosed in <pre> ... </pre> tags, I don't even use
<p>'s.

<snip>
I wouldn't be too puristic here.


For these newsletters, which you can find at
<http://www.rskey.org/tippc.htm> under the heading 52-Notes, I will
stick to basic text, for others I will have to add a bit more fancy
stuff (two columns and graphics)
2) Given the limitations of browsers, is there an easy way of getting
(as close as possible to) the original fontsizes.


Why would you do _that_? Don't fight against the strengths of the Web.

Just as the Web is a way to make the data technically accessible
worldwide over the network, not setting any font sizes (except relatively
e.g. for headings) is part of the way of making it humanly accessible to
people with different properties and preferences.


OK, try again: These things use two fontsizes, one for the body,
another slightly smaller for the colofon on page 1. I'd like them to
look like the
originals (assuming a medium font in the browser)
I guess using
appropriately set up span.xxx classes is the only way to influence
line-spacing?


You cannot influence line spacing in HTML - though you could create very
coarse simulations. Just use line-height in CSS.


Sorry, that's what I meant, and I'm using only full, 80 and 50%, which
I guess
should be OK in most modern browsers
3) I'm currently using CSS2 & <div>'s to get the paging right. This
works OK, but is there an easier way?


Paging, too, is an area where you should utilize the strengths of the Web
and not fight against them. Do you know what paper sizes people have, or
their printer settings?


The originals were on US 8.5x11", which enough margins to fit in A4.
Using
medium font sixe in IE, they print OK on either format.

Robert
--
Robert AH Prins
pr***@onetel.net.uk
Jul 20 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.