[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Phase5 news - UMA
phase 5 News
A Statement on the matters of the advantages and disadvantages of the
United Memory Architecture or in other words, "Why develop something
new, if you can just buy it instead?
Since the release of the very basic system specifications of our
A\BOXsystem
and the "still under development" CAIPIRINHA-Chip the public has
beendiscussing the sense or nonsense of this development. Most of
these
discussion topics were why a Unified Memory Architecture should be
used
and if the standard designs with available components wouldn't make
more
sense.
One reason that is often brought against the CAIPIRINHA-Concept with
UMA-Design
is the Memory-Bandwidth that is used for certain system functions, the
Processor-Access
and the Video-Outputs. Therefore there have been many hot discussions
where
the critics of the UMA-Design have shown extremely simple examples to
point
out possible disadvantages of the UMA-Design like; "1600x1200Pixel
in 24 bit by 75hz=432MB/sec permanent usage in addition to a second
video
output, 3D-Calculating with tons of textures, multichannel-audio plus
more
is all it takes to slow the CPU access down." With this reasoning some
like to support the concept of a separated Bus and/or graphic on PCI
or
AGP. Other arguments try to make believe the future security of other
cheaper
modular solutions or bus extensions. In the following we will comment
these
points, though with a slight grin on our face.
1.) The everyday architecture with Memory and graphics, on the PCI
for example,
have much lower bandwidths (that do not add up together of course).
By deviding
the memory there is the need to transfer the data from the main
memory into
the videoboard's ram. Here are three examples.
- A PC-Processor calculates a animated 3D-scenario. Therefore it
reads ten
thousand's of coordinate values for every screen, makes heavy
calculations
and writes the data back into the main memory. After that the data
must
be read out of the main memory again to be sorted properly and then
sent
over the PCI bus to the 3D-Graphicsboard. Since the Scenario is
rather complex
and the GFXBoard unfortunately has only 2 or 4 Megs of ram, loads of
new
textures have to be transferred into the texture memory where the 3D-
chip
needs them to calculate the polygons. Another way is to just do very
simple
scenarios with textures small enough that fit into 1 meg of reserved
memory
permanently on the GFXBoard.. a real High-end solution.
-A Video digitizer writes its real-time picture data into the main
memory
of the PC - since that's where they're supposed to be edited. To show
these
as an animated window they must be again copied into the graphics
board
memory - 25 times a second about 1 meg of data equals 25 Mbyte/sec or
about
the half of the actual usable bandwidth of many PCI-Systems. What a
pity
that the other half is already being used by the video digitizer ...
- A 4000x4000 Pixel x 24bit (=48 Mbyte) screen is displayed on the
GFXBoard
with a resolution of 1280x1024 and you would like to scroll around on
this
screen (panning). Ofcourse that's possible on a PC Standard
architecture,
disregarding the fact that the PCI-Bus is totally "jammed", because
the Processor is too busy transferring the Screendata from the main
memory.
Anyway - the fact that the databus is "Jammed" doesn't really
matter, because the CPU couldn't use the free bandwidth for real
calculations
since it's busy transferring data.
On all examples - the list could go on and on - the UMA-Design has
obvious
advantages as often the transferring of huge masses of data becomes
obsolete
since they are already there where all function units may have access
to
them - in a unified memory. Under usage of UMA/DLRP combination (see
below),
display data that may lay at any address in the memory can be
displayed
on any screen position without the need of making use of Bandwidth
and CPU
to copy them into a "Videomemory". The same goes for other data
for example 3D-Coordinates, Textures, Sounddata and much more. At
last there
we can only say this: A well implemented UMA-Design does not only
offer
obviously more memory bandwidth than nowadays (and future) standard-
solutions,
but also strongly reduces the need for more memory bandwidth and so
offers
more power and resources for High-end applications.
2.) The simple bandwidth calculation depends on the conventional
design
of Graphics boards, which need the Picture data to be in one piece on
a
single block in the memory. Therefore the amount of data and color
depth
is always at the maximum, which is a totally senseless concept. The
advanced
technic of the Display List RISC Processor (DLRP) of the CAIPIRINHA-
Chips
offers a completely different concept where the displayed screen must
not
be in the memory in this form. Here the - while flexible colordepth -
Dataflow
is much less. A single DLRP command may for example instruct 100
Pixels
in a row with a surten color to be displayed. In a system - as
possible
with CAIPIRINHA - where you may have 24bit-windows in whatever form
and
size, the user may choose if he wants to use memory and bandwidth
resources
for a 24bit background image; if he chooses a color reduced or even a
one
color or grid background, he obviously saves resources. The display
of a
1600-pixel line could, roughly translated into human language, look
like
the following DLRP sequence.
{
Show 312 Pixel RGBA 128,128,256,0
; This is Background
Show 10 Pixel with 1 Byte since Cache address $xxxxxxxx
; Here a line of the Scrollbar is being displayed from the cache
Show 700 Pixel RGBA with 4 Byte since address $yyyyyyyyy
; 700 Pixel of a 24 bit picture
Show 350 Pixel Palette with 1Byte since address $zzzzzzz
; Here is a Window infront of the Picture e.g Controlpanel with 256
colors
displayed.
Show 312 Pixel RGBA 128,128,256,0
; Here's the Background again to the right edge
}
In this example it takes about 3150 bytes for one line plus a few
instructions
from the main memory while on a "Traditional" 1600 Pixel 24Bit-line
there would have been 6.400 bytes, this equals the total screen
display
resulting in a reduction of the highest needed bandwidth of about 432
MBs
down to approximately 214 MB/s. As explained in this example, and
other
similar cases (Which makes the main part of the everyday
applications),
by intelligent programming and configuration of the combination of
UMA and
DLRP a better usage of system resources is achieved. This is, as we
believe,
a preferred concept to that of other common resource wasting GUI
based systems
which demand new processor generations on a regular basis.
3.) Many critics of the A\BOX concept or of the UMA design of our
CAIPIRINHA
prefer to argue with high end demands to the system performance and
the
graphics display where they - not knowing of the CAIPIRINHA-Design
concept
- assume possible bottlenecks and limitations, and then compare it
with
the performance of current affordable Graphicboards with standard
components.
Herefore they like to take the example of a complex 3d-display using
the
highest resolution and refreshrate. Disregarding the praising of the
everyday
standard concepts there still are various facts left.
- Today's PCI-Graphicboards already can't stand up to the needs of
Multimedia
and 3D applications, the highly praised PCI-Bus is already at the
limit
of it's bandwidth. This doesn't matter since the industries already
have
the solution at hand with AGP and almost 400mb/s for really fast 3D-
applications.
With this solution the demand for new graphicboards is given. In
addition
to that you can also sell new motherboards with AGP-Port to the users
..
Its going to be really interesting in one or two years when the
boundaries
of AGP are reached by the definition of marketing strategies who
propagate
a new and more efficient software generation which will unpredictably
demand
a new hardware generation then.
- Current PCI 3D-Graphicboards at affordable prices are rarely able
to display
higher resolutions than 1280x1024 in 24Bit - even the new EDO-Ram
based
cards. For a better results you have to buy a High-end graphic board
with
VRAM or WRAM. These are the only cards that may at least *slightly*
be compared
with the A\BOX system and that only in terms of resolution but not
with other
features.
- Many Graphicboards with chips from leading manufacturers already
offer
fast 3D-Graphic - using low resolutions and reduced colordepth. In
other
words: Many of the 3D Engines do not use the highest possible
resolution
that the chips support, but mostly only 800x600x16 Bit (Some 3D
Engines
cant manage 3D in 24Bit at all). These resolutions could easily be
done
in 150hz refreshrate on the CAIPIRINHA System while only having less
than
15% usage of the bandwidth. Actually this has nothing to do with REAL
3D
graphics (the same goes for those neat looking and fast consoles);
for that,
most current systems are not even equipped yet.
- For a more realistic point of view you will have to keep the
limitations
of the current system design in mind. The often mentioned theoretical
peak
performances of standard systems are further away from reality than
the
CAIPIRINHA design from its theoretical maximum performance.
Again we must remind that even the industries find the PCI Bus
outdated
and that in future developments will be replaced by AGP for example
in Power
PC or x86 based systems which will have a speed increase by factor 3.
That
will still not overcome the boundaries nor will it even come close to
the
performance of the fast UMA-Design or the CAIPIRINHA.
4.) As a reason against UMA some have said that the CPU might have a
limited
memory access. A bandwidth of 400 MB/s with a bus speed of 50MHZ x 8
byte
(64bit bus) was estimated. On a CAIPIRHINA System with a theoretical
CPU-Bus
performance of 100MHZ (as soon as the PowerPC processors are that
fast on
the bus, of which they are not yet capable) the needed bandwidth may
even
be estimated at 800 MB/s. Compared to the currently estimated
bandwidth
of 1.600 MB/s of the UMA-Memory it was put off as a theoretical
maximum
performance that practically could never be reached, which is
ofcourse true.
This fact counts much more on the theoretical bandwidth of 400 or 800
MB/s,
since even the fastest PowerPC-processors on the market can not
handle such
data masses in reasonable applications (and since the simple but
performance
eating job of data transfer is done by CAIPIRHINA the CPU may be used
for
more valuable stuff)
Further than that the current PC system controllers have compared to
the
UMA design of CAIPIRHANA and referring to test results of various
independent
magazines a actual Main memory access less than 100 MB/s which goes
for
the fastest Pentium and Pentium pro systems. But even the standard
controller
MPC106 by Motorola (a combination of memory / cache / PCI-Bus
controller
for PowerPC machines) with 60ns ram and a 64bit databus does not
exceed
a maximum of 133 MB/s (which is about the performance of a zero
waitstate
ram controller with 16mhz) and will actually be much slower in
reality.
Even if the PowerPC processor would only receive data with a speed of
200-300
MB/s by CAIPIRHINA due to extreme heavy system activity, this would
still
outrun any current standard design on the market by all means (even
those
which will be available in 1997).
5.) Another argument against the high integration required by the
CAIPIRHINA
concept is the expandability. People like to criticize that the
controller
(including graphics and audio) is integrated on the motherboard and
not
exchangeable over a standard bus system (which currently is not
possible
due to the unavailability of a standardized bus-system that can
deliver
the performance required). Besides that there is still the fact that
the
CAIPIRHINA design (due to be finished in 1997) will use the available
technologies
to its limit, such as 100mhz srams which have been available for 2
years
but are just now being implemented, and a 100MHZ CPU BUS, that no
processor
at present can push to its maximum. Due to the unique and innovative
design
CAIPIRINHA will offer years of leading performance. This can not be
expected
of many current modular systems. One who today for example buys a PCI
graphics
card invests in a quickly outdated technology. where as the next
generation
of boards with faster AGP graphics requires a new generation of
motherboards
for PowerPC and x86 systems. Meaning that the user must change the
graphics
card and motherboard including all controllers. But these next
generation
motherboards with AGP only enlarges the bottleneck from 132 MB/s to
approximately
400 MB/s still dealing with an expansion limitation what leaves these
systems
with a limited future security. Other concepts where the processor is
being
used as a module together with memory and cache (which by the way is
similar
to the Accelerator concept such as the CYBERSTORM that use onboard
memory
due to the awful slow memory design of the A4000) costs the user much
larger
amount of money for upgrading since that usually includes the
purchase of
a new processor, cache and systemcontroller plus new sockets for
memory
and cache modules. The sense of these extra costs is more than
questionable
since if you want to upgrade the performance to a more advanced cache
and
memory design, such as SDRAM it is much likely that you will have to
replace
the cache and memory modules as well. As you see, it still is not yet
proven
by how far these modular concepts will be sufficient and up to date
in the
future.