[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Phase5 news - UMA



    phase 5 News

A Statement on the matters of the advantages and disadvantages of the 
United Memory Architecture or in other words, "Why develop something 
new, if you can just buy it instead?

Since the release of the very basic system specifications of our 
A\BOXsystem
and the "still under development" CAIPIRINHA-Chip the public has
beendiscussing the sense or nonsense of this development. Most of 
these
discussion topics were why a Unified Memory Architecture should be 
used
and if the standard designs with available components wouldn't make 
more
sense.
One reason that is often brought against the CAIPIRINHA-Concept with 
UMA-Design
is the Memory-Bandwidth that is used for certain system functions, the
Processor-Access
and the Video-Outputs. Therefore there have been many hot discussions 
where
the critics of the UMA-Design have shown extremely simple examples to 
point
out possible disadvantages of the UMA-Design like; "1600x1200Pixel
in 24 bit by 75hz=432MB/sec permanent usage in addition to a second 
video
output, 3D-Calculating with tons of textures, multichannel-audio plus 
more
is all it takes to slow the CPU access down." With this reasoning some
like to support the concept of a separated Bus and/or graphic on PCI 
or
AGP. Other arguments try to make believe the future security of other 
cheaper
modular solutions or bus extensions. In the following we will comment 
these
points, though with a slight grin on our face.
1.) The everyday architecture with Memory and graphics, on the PCI 
for example,
have much lower bandwidths (that do not add up together of course). 
By deviding
the memory there is the need to transfer the data from the main 
memory into
the videoboard's ram. Here are three examples.
- A PC-Processor calculates a animated 3D-scenario. Therefore it 
reads ten
thousand's of coordinate values for every screen, makes heavy 
calculations
and writes the data back into the main memory. After that the data 
must
be read out of the main memory again to be sorted properly and then 
sent
over the PCI bus to the 3D-Graphicsboard. Since the Scenario is 
rather complex
and the GFXBoard unfortunately has only 2 or 4 Megs of ram, loads of 
new
textures have to be transferred into the texture memory where the 3D-
chip
needs them to calculate the polygons. Another way is to just do very 
simple
scenarios with textures small enough that fit into 1 meg of reserved 
memory
permanently on the GFXBoard.. a real High-end solution.
-A Video digitizer writes its real-time picture data into the main 
memory
of the PC - since that's where they're supposed to be edited. To show 
these
as an animated window they must be again copied into the graphics 
board
memory - 25 times a second about 1 meg of data equals 25 Mbyte/sec or 
about
the half of the actual usable bandwidth of many PCI-Systems. What a 
pity
that the other half is already being used by the video digitizer ...
 - A 4000x4000 Pixel x 24bit (=48 Mbyte) screen is displayed on the 
GFXBoard
with a resolution of 1280x1024 and you would like to scroll around on 
this
screen (panning). Ofcourse that's possible on a PC Standard 
architecture,
disregarding the fact that the PCI-Bus is totally "jammed", because
the Processor is too busy transferring the Screendata from the main 
memory.
Anyway - the fact that the databus is "Jammed" doesn't really
matter, because the CPU couldn't use the free bandwidth for real 
calculations
since it's busy transferring data.
On all examples - the list could go on and on - the UMA-Design has 
obvious
advantages as often the transferring of huge masses of data becomes 
obsolete
since they are already there where all function units may have access 
to
them - in a unified memory. Under usage of UMA/DLRP combination (see 
below),
display data that may lay at any address in the memory can be 
displayed
on any screen position without the need of making use of Bandwidth 
and CPU
to copy them into a "Videomemory". The same goes for other data
for example 3D-Coordinates, Textures, Sounddata and much more. At 
last there
we can only say this: A well implemented UMA-Design does not only 
offer
obviously more memory bandwidth than nowadays (and future) standard-
solutions,
but also strongly reduces the need for more memory bandwidth and so 
offers
more power and resources for High-end applications.
2.) The simple bandwidth calculation depends on the conventional 
design
of Graphics boards, which need the Picture data to be in one piece on 
a
single block in the memory. Therefore the amount of data and color 
depth
is always at the maximum, which is a totally senseless concept. The 
advanced
technic of the Display List RISC Processor (DLRP) of the CAIPIRINHA-
Chips
offers a completely different concept where the displayed screen must 
not
be in the memory in this form. Here the - while flexible colordepth - 
Dataflow
is much less. A single DLRP command may for example instruct 100 
Pixels
in a row with a surten color to be displayed. In a system - as 
possible
with CAIPIRINHA - where you may have 24bit-windows in whatever form 
and
size, the user may choose if he wants to use memory and bandwidth 
resources
for a 24bit background image; if he chooses a color reduced or even a 
one
color or grid background, he obviously saves resources. The display 
of a
1600-pixel line could, roughly translated into human language, look 
like
the following DLRP sequence.
{
Show 312 Pixel RGBA 128,128,256,0
; This is Background
Show 10  Pixel with 1 Byte since Cache address $xxxxxxxx
; Here a line of the Scrollbar is being displayed from the cache
Show 700 Pixel RGBA with 4 Byte since address $yyyyyyyyy
; 700 Pixel of a 24 bit picture
Show 350 Pixel Palette with 1Byte since address $zzzzzzz
; Here is a Window infront of the Picture e.g Controlpanel with 256 
colors
displayed.
Show 312 Pixel RGBA 128,128,256,0
; Here's the Background again to the right edge
}
In this example it takes about 3150 bytes for one line plus a few 
instructions
from the main memory while on a "Traditional" 1600 Pixel 24Bit-line
there would have been 6.400 bytes, this equals the total screen 
display
resulting in a reduction of the highest needed bandwidth of about 432 
MBs
down to approximately 214 MB/s. As explained in this example, and 
other
similar cases (Which makes the main part of the everyday 
applications),
by intelligent programming and configuration of the combination of 
UMA and
DLRP a better usage of system resources is achieved. This is, as we 
believe,
a preferred concept to that of other common resource wasting GUI 
based systems
which demand new processor generations on a regular basis.
3.) Many critics of the A\BOX concept or of the UMA design of our 
CAIPIRINHA
prefer to argue with high end demands to the system performance and 
the
graphics display where they - not knowing of the CAIPIRINHA-Design 
concept
- assume possible bottlenecks and limitations, and then compare it 
with
the performance of current affordable Graphicboards with standard 
components.
Herefore they like to take the example of a complex 3d-display using 
the
highest resolution and refreshrate. Disregarding the praising of the 
everyday
standard concepts there still are various facts left.
- Today's PCI-Graphicboards already can't stand up to the needs of 
Multimedia
and 3D applications, the highly praised PCI-Bus is already at the 
limit
of it's bandwidth. This doesn't matter since the industries already 
have
the solution at hand with AGP and almost 400mb/s for really fast 3D-
applications.
With this solution  the demand for new graphicboards is given. In 
addition
to that you can also sell new motherboards with AGP-Port to the users 
..
Its going to be really interesting in one or two years when the 
boundaries
of AGP are reached by the definition of marketing strategies who 
propagate
a new and more efficient software generation which will unpredictably 
demand
a new hardware generation then.
- Current PCI 3D-Graphicboards at affordable prices are rarely able 
to display
higher resolutions than 1280x1024 in 24Bit - even the new EDO-Ram 
based
cards. For a better results you have to buy a High-end graphic board 
with
VRAM or WRAM. These are the only cards that may at least *slightly* 
be compared
with the A\BOX system and that only in terms of resolution but not 
with other
features.
- Many Graphicboards with chips from leading manufacturers already 
offer
fast 3D-Graphic - using low resolutions and reduced colordepth. In 
other
words: Many of the 3D Engines do not use the highest possible 
resolution
that the chips support, but mostly only 800x600x16 Bit (Some 3D 
Engines
cant manage 3D in 24Bit at all). These resolutions could easily be 
done
in 150hz refreshrate on the CAIPIRINHA System while only having less 
than
15% usage of the bandwidth. Actually this has nothing to do with REAL 
3D
graphics (the same goes for those neat looking and fast consoles); 
for that,
most current systems are not even equipped yet.
- For a more realistic point of view you will have to keep the 
limitations
of the current system design in mind. The often mentioned theoretical 
peak
performances of standard systems are further away from reality than 
the
CAIPIRINHA design from its theoretical maximum performance.
Again we must remind that even the industries find the PCI Bus 
outdated
and that in future developments will be replaced by AGP for example 
in Power
PC or x86 based systems which will have a speed increase by factor 3. 
That
will still not overcome the boundaries nor will it even come close to 
the
performance of the fast UMA-Design or the CAIPIRINHA.
4.) As a reason against UMA some have said that the CPU might have a 
limited
memory access. A bandwidth of 400 MB/s with a bus speed of 50MHZ x 8 
byte
(64bit bus) was estimated. On a CAIPIRHINA System with a theoretical 
CPU-Bus
performance of 100MHZ (as soon as the PowerPC processors are that 
fast on
the bus, of which they are not yet capable) the needed bandwidth may 
even
be estimated at 800 MB/s. Compared to the currently estimated 
bandwidth
of 1.600 MB/s of the UMA-Memory it was put off as a theoretical 
maximum
performance that practically could never be reached, which is 
ofcourse true.
This fact counts much more on the theoretical bandwidth of 400 or 800 
MB/s,
since even the fastest PowerPC-processors on the market can not 
handle such
data masses in reasonable applications (and since the simple but 
performance
eating job of data transfer is done by CAIPIRHINA the CPU may be used 
for
more valuable stuff)
Further than that the current PC system controllers have compared to 
the
UMA design of CAIPIRHANA and referring to test results of various 
independent
magazines a actual Main memory access less than 100 MB/s which goes 
for
the fastest Pentium and Pentium pro systems. But even the standard 
controller
MPC106 by Motorola (a combination of memory / cache / PCI-Bus 
controller
for PowerPC machines) with 60ns ram and a 64bit databus does not 
exceed
a maximum of 133 MB/s (which is about the performance of a zero 
waitstate
ram controller with 16mhz) and will actually be much slower in 
reality.
Even if the PowerPC processor would only receive data with a speed of 
200-300
MB/s by CAIPIRHINA due to extreme heavy system activity, this would 
still
outrun any current standard design on the market by all means (even 
those
which will be available in 1997).
5.) Another argument against the high integration required by the 
CAIPIRHINA
concept is the expandability. People like to criticize that the 
controller
(including graphics and audio) is integrated on the motherboard and 
not
exchangeable over a standard bus system (which currently is not 
possible
due to the unavailability of a standardized bus-system that can 
deliver
the performance required). Besides that there is still the fact that 
the
CAIPIRHINA design (due to be finished in 1997) will use the available
technologies
to its limit, such as 100mhz srams which have been available for 2 
years
but are just now being implemented, and a 100MHZ CPU BUS, that no 
processor
at present can push to its maximum. Due to the unique and innovative 
design
CAIPIRINHA will offer years of leading performance. This can not be 
expected
of many current modular systems. One who today for example buys a PCI 
graphics
card invests in a quickly outdated technology. where as the next 
generation
of boards with faster AGP graphics requires a new generation of 
motherboards
for PowerPC and x86 systems. Meaning that the user must change the 
graphics
card and motherboard including all controllers. But these next 
generation
motherboards with AGP only enlarges the bottleneck from 132 MB/s to
approximately
400 MB/s still dealing with an expansion limitation what leaves these 
systems
with a limited future security. Other concepts where the processor is 
being
used as a module together with memory and cache (which by the way is 
similar
to the Accelerator concept such as the CYBERSTORM that use onboard 
memory
due to the awful slow memory design of the A4000) costs the user much 
larger
amount of money for upgrading since that usually includes the 
purchase of
a new processor, cache and systemcontroller plus new sockets for 
memory
and cache modules. The sense of these extra costs is more than 
questionable
since if you want to upgrade the performance to a more advanced cache 
and
memory design, such as SDRAM it is much likely that you will have to 
replace
the cache and memory modules as well. As you see, it still is not yet 
proven
by how far these modular concepts will be sufficient and up to date 
in the
future.