(this is getting off-topic and probably belongs somewhere like
comp.arch.embedded...)
"Chris Torek" <no****@torek.net> wrote in message
news:c1*********@enews2.newsguy.com...
[snippage] ... If the compiler-writer makes
"volatile" do everything, it often does too much; if not, it often
does too little. "Too little", however, can be augmented, while
"too much" is hard to undo afterward. :-)
In article <news:F8*********************@bgtnsc04-news.ops.worldnet.att.net>
Carsten Hansen <ha******@worldnet.att.net> wrote:In the code
reg->cmd = RESET;
while (reg->csr & BUSY)
continue;
how do you guarantee that the read from reg->csr happens after the write to
reg->cmd at the physical level without some I/O synchronization? ...
But if you are actually controlling hardware, the order can be essential as
I'm sure you are aware of.
Indeed.
On the V9 SPARC, rather than a single EIEIO instruction, the machine
offers a generalized MEMBAR ("memory barrier") instruction, which
takes operands. Memory barriers come in four "memory" flavors --
load/load, load/store, store/load, and store/store -- and several
additional forms, named "memissue" and "sync". For the above case,
one needs only a single "store/load" barrier between the write to
the command register and the read from the status register.
The effect of a "store/load" barrier is, roughly, "any loads
following the barrier may not be moved above any stores that preceded
the barrier." Stores before the barrier may be rearranged and
write-combined, however. Thus, *before* writing the command
register, we also need one *more* barrier if the device might refer
to memory (most likely a "memissue").
The one instruction that "always works" is "membar #sync", which
does a full CPU pipeline flush and empties the write aggregation
machinery entirely. This is, however, a hugely expensive instruction,
to be used only when absolutely necessary. Less-expensive barriers
("membar #StoreLoad|StoreStore", if I remember right offhand)
suffice for most cases. (It does not for cases that change the
CPU mode, i.e., you need a #sync for fussing about with certain
internal CPU registers. Then again, much of this is either entirely
assembly-coded, or requires heavy use of inlined assembly, and one
can insert the membar directly there.) In so-called "total store
order" -- TSO -- the CPU implicitly does such a memory barrier for
you after each instruction. In "partial store order" and "relaxed
memory order" models -- which are selectable with those internal
CPU registers -- the CPU will (or is supposed to) run faster, and
most existing (barrierless) device drivers will run fine in TSO.
As one converts device drivers, one can allow them to run in PSO
or RMO provided they have the correct "membar"s inserted.
We never got around to doing this for BSD/OS, but the general plan
was to abstract away the details by having device drivers use macros
when "talking to" device registers. (The hardware also provides
special bits in the MMUs to mark "device register" pages as having
"abnormal" memory semantics, so that even less code would have to
change -- but as the V9 architecture appendices note, these only
help in certain situations; some devices will still run into
write-combiner problems.)
(Much of the above hair, with different kinds of synchronization
instructions, comes about today because gigahertz CPU clock speeds
produce sub-nanosecond instruction timings, while I/O devices,
including on-board registers, often have response times best measured
in milliseconds. Five years ago it might take several hundred CPU
clock cycles to talk to a floppy device register; today it may take
thousands. In this race between the tortoise and the hare, the
hare must constantly stop and wait for the tortoise to catch up.
Main memory is also something of a "tortoise", with 150 ns cycle
times being more than 150 instruction times -- as many as 1200 on
a 4 GHz CPU running at two instructions per cycle. Main memory
speeds are more important, even though device speeds are more
shocking, because main memory is used so often, relatively speaking.)
(Incidentally, mainframes dealt with similar problems in the 1960s
and 1970s. Given that Computing Science folks never seem to study
their own history, I expect these same ideas will all be reinvented
soon. :-) )
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it
http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.