Huge Z180 Arrays
The Z180's banking scheme is great for handling code; data is
a bit more complex. Here's example code.
Published in Embedded Systems Programming, August 1990
For hints, tricks and ideas about better ways to build embedded systems, subscribe to The Embedded Muse, a free biweekly e-newsletter. No hype, just down to earth embedded talk. Click here to subscribe.
Zilog's Z180 (aka 64180) is sometimes used in applications where
huge quantities of data are collected or analyzed. Their CMOS
low power design makes them perfect for battery powered data acquisition
equipment. Or, as part of an instrument, their architecture is
ideal for storing large amounts of data during analysis.
As I mentioned in a previous article, these processors include
a powerful Memory Management Unit (MMU) designed to extend the
address space to a full megabyte. With some understanding of the
intricacies of the MMU, it's not too hard to upgrade an old Z80
application to make use of the extra memory. Of course, the HD64180
still uses a 64k logical address (for compatibility with the Z80),
so a Z80 program doesn't suddenly get a nice 1 Mb linear addressing
range. The software must be somewhat restructured if huge programs
or data areas are needed.
(As a refresher, "logical" addresses are those issued
by the program; on the HD64180 these can never exceed 64k, the
limit imposed by using 16 bit addresses in load and jump instructions.
"Physical" addresses are those appearing on the CPU's
pins; the internal Memory Management Unit converts Logical to
Physical, using a translation algorithm controlled by the programmer.)
Suppose you're designing a remote data collection device that
gets one 16 bit A/D reading every second. This doesn't sound like
much data, but after only a day the system will have acquired
86,400 words (172,800 bytes) - far more than the logical address
space of 64k.
Managing a 64180's memory in this sort of application demands
careful analysis of both the logical and physical address space
needs. There is no way all 176k will fit within the 64k logical
space, so the program will have to remap the MMU, possibly often,
to get to the array.
Accessing Arrays
This problem requires "sequential" access to the data,
since the current data point is appended immediately after the
last. It is a special case of the more general array accessing
situation, where a program might need any arbitrary data point,
in no particular order. "Random" array access is analogous
to that employed by disk drives, while sequential is typical of
tapes.
Byte-oriented data (like strings) is easy to work with. Element
I is the "Ith" address in the string; in other words,
the address of DATA(8) is the start address of the array plus
8 (assuming DATA(0) is a valid element). This is called a vector
- it has only one dimension, that given by the subscript.
Of course, in real applications we often work with data that is
16 or more bits long. Integers are often 2 bytes; floating point
values typically are 4 or even 8 bytes. Array element I of a vector
is found from:
base + I * element size,
where "base" is the start address of the array.
Again, this assumes element 0 is valid. Element 0 is at the first
address of the array, followed sequentially by each other element.
Multidimensional arrays use an extension of this formula. In most
systems these arrays are stored in "row major" order:
given A(row,column), all column elements of row 0 are stored first,
then the column elements of row 1, etc. We can get to any element
A(I,J) using the formula:
address=base + I * (Jmax * Esize) + J * Esize
Jmax is the number of columns in the array and Esize is the number
of bytes per element of the array.
As you can imagine, this can be a computationally expensive way
to get to an array. Multiplications are slow. The HD64180's multiply
instruction only handles 8 bit operations, and so by itself it
is not adequate for indexing into an array. Generally you can
count on the quantity Jmax * Esize to be a constant; for speed
critical applications it may be wise to set this to a convenient
value (a power of two), even if some memory is wasted. Similarly,
aim for an element size that is 1, 2, 4, or 8 so shifts can be
substituted for multiplies.
Any number of dimensions can be accommodated by extending the
formula. Higher dimension arrays require even more math to access,
so try to limit them to one or two.
In real time applications it's often nice to support two forms
of data access. A perfectly general form is useful for offline
data reduction; your application program can request any array
element in any sequence. During data acquisition you might need
a shortcut to avoid the computational overhead of computing an
array index. If data is gathered in some sequential form, it's
easy to visualize the data as a one dimensional vector (instead
of a multidimensional array), and store each value sequentially.
We'll explore both methods.
The Memory Manager
Sure, it's easy to write code to compute an array index along
the lines presented. We'll run into trouble when the array gets
too big. Suppose your code uses most of a 27256 PROM, leaving
only 32k of logical address space for data. If all of this were
dedicated to a 2 dimensional array consisting of 4 byte-long elements,
then only 8192 elements fit in memory - ARRAY(100,100) exceeds
address 64k.
Here the MMU comes to the rescue, but not without a lot of help
from you the programmer. Obviously, we can simply reprogram the
MMU's control registers every so often and bank switch portions
of RAM into the processor's address space. The secret to success
is careful planning.
If you've never really blown a software project by immediately
jumping into coding, then you are probably sick of hearing about
software methodologies. In fact, as processors and programs get
more complicated careful design is far more important than it
once was. Oh for the days of 4k programs! We could crank out a
few thousand lines of code in a couple of weeks and be done. Now,
when 300k+ programs are then norm, a carelessly designed system
will be a disaster. Guaranteed.
This is certainly true when using any sort of memory manager.
One penalty of the MMU is a segmentation of your logical address
space that must be designed in up front, and that very likely
can NEVER be changed, without completely rethinking the entire
structure of the program.
Using huge arrays forces you into a three bank memory model on
the HD64180 (note that on other chips other options exist - e.g.,
the Z280's fantastically complex and powerful MMU will let you
have up to 16 banks in the logical address space). One bank points
to the ROMed program; in most cases this logical to physical mapping
will never change. Another bank accesses an area of RAM for program
variables and transients. In all but extreme cases this never
be remapped, because the stack will reside here. Finally, one
bank points to the huge array(s).
***************************** FFFF
* *
* huge array section *
* *
***************************** F000
* * EFFF
* *
* *
* *
* Program data and stack *
* *
* *
* *
***************************** 8000
* * 7FFF
* *
* *
* Program (ROM) *
* *
***************************** 0000
Figure 1: Organization of Logical Address Space
Figure 1 shows one possible configuration of the memory management
unit. This assumes 32k of ROM from logical 0 to 7FFF, 28k of RAM
from 8000 to EFFF, and 4k of huge array RAM at the end of memory.
Wait a minute - 4k of RAM for huge arrays? This doesn't seem like
much!
This 4k section of the system's address space is essentially a
"window" into the huge array. We have to provide a section
of logical space to allow access to the data; the 4k window is
this section. This is much like a disk buffer used in operating
systems - data is written to the buffer and then flushed to disk
when full. It's size is a result of the mapping resolution of
the MMU - 4k is the minimum amount we can allocate. You can always
make this larger, which will cut down on MMU remapping, but you'll
sacrifice either variable or program area.
The window is for storage of huge arrays. Never, never store the
stack or other variables here, since remapping will invalidate
the data.
The idea behind using this window is to compute the physical (20
bit) address of the proper array element, and then to position
the window to bring the data into view. Earlier we saw how to
compute an index into any array. Now, this windowing step is also
introduced. The algorithm is not complex, but a wise programmer
will insulate his code from the machinations of array element
lookup by hiding the details in subroutines.
The routine "PUT1D" stores a byte to a huge one dimensional
array using the map shown in figure 1. The code to get a byte
from the array is nearly identical and so is not shown. PUT1D
accepts an array index in HL that can range all the way up to
65k. Thus, the one 4k window in logical address space gives the
routine access to a 65k hunk of data. Note that it supports random
accesses - it stores the data in B into the array at the index
in HL. HL can be 3 on one invocation and 40000 on the next.
PUT1D locates the array in physical memory at address: 0F000 +
BASE_CBR*4096. Thus, if BASE_CBR is 1, then the array runs from
10000 to 1FFFF. This implies that the memory manager's CBAR register
must be set to Fx (where "x" defines the logical start
of the base area) so that logical addresses from F000 to FFFF
(our window) are translated into common area 1.
For the sake of simplicity the code stores a one byte value. A
word version would require the "index" to be multiplied
by 2 before it is used in the computations. Use a shift to do
the multiply, but be wary of overflows.
Multiple dimension arrays can be programmed just as easily, but
you must compute the address using the formula previously discussed.
Multiplies are slow and should be avoided; try to pick values
of jmax that are powers of two, and then use a series of shifts.
Fast access will require some thought to optimize the computations.
One easy speed trick is shown in "NEXT1D". Especially
when gathering data, we often just put one value in the next location
- random array accesses are not needed. This means we can skip
all of the multiplications needed to compute the address; we just
increment the last address. NEXT1D illustrates this approach with
the one dimensional array we've already looked at. PUT1D saves
the current CBR and logical addresses in SAVE_CBR and SAVE_LOG
for NEXT1D. After one call to PUT1D to set things up, all further
sequential accesses can use the faster NEXT1D. With the single
indexes shown, the speed difference is not important; with a 2
or 3 dimensional array a tremendous time saving will result.
What if your program uses a number of big arrays? You can't assign
more windows since the HD64180 limits mapping to three sections.
Probably the easiest solution is to write a "PUT" and
"GET" routine for each array, using a different BASE_CBR
for each. Avoid using sequential accesses unless you can be really
sure that accesses will be restricted to one array at a time.
;
; Put an element in a huge 1 dimensional array.
; Here we assume:
;
; The "window" is from F000 to FFFF.
; BASE_CBR is the first (lowest) CBR value
; Register B has the data to store
; Register HL has the array index
;
; cbr = base_cbr + (index AND ff000) / 4096
; logical address= f000 + (index AND 0fff)
;
put1d:
ld a,h ; get high order index value
and a,0f0h ; mask off upper 4 bits (cbr bits)
rra ; shift right 4 bits
rra ; (since we ignore lower 8 bits, this
rra ; a 12 bit shift)
rra ; a= (index AND ff000)/4096
add a,base_cbr
out0 (cbr),a ; set new cbr value
ld (save_cbr),a ; save cbr for next1d
ld a,h ; high part of index
or a,0f0h ; mask off high nibble and or in f0
ld h,a ; hl is now right logical addr in window
ld (hl),b ; save data
ld (save_log),hl ; save logical addr for next1d
ret
;
; put register B in the huge dimensional array at the next
; open location. This assumes that put1d was once called to
; index a specific value.
;
; On entry, B is the data to store
; save_cbr was set by put1d or next1d
; save_log was set by put1d or next1d
;
next1d:
ld hl,(save_log) ; last logical address used
ld a,(save_cbr) ; last cbr used
inc hl ; skip to next element
jr nz,ne1 ; jp if we have not exceeded the window
ld hl,0f000h ; re-init to the start of the window
inc a ; next cbr value
ld (save_cbr),a ; save current cbr
ne1: ld (save_log),hl ; save current logical address
out0 (cbr),a ; set cbr
ld (hl),b ; save data
ret
|