OT, But interesting

Sean picasso@madflower.com
Mon, 7 Aug 2000 00:14:27 -0400 (EDT)


On Mon, 7 Aug 2000, Tim Schmidt wrote:

> ----
> MHZ != RL performance increase. More precisely, I havent seen
> chip clocking give a straight ratio of MHZ increase to RL performance, its
> usually a curve and much lower then the percentage of overclocked speed.
> Then again I dont necessarily trust benchmarks either, I have seen WAY too
> many slanted test results. And manufacturers are very likely to squalsh
> your site if you do post benchmarks.
> ----
> 
> No, but under Q3 with a GF2, the Duron 600 -> 950 ran ~40% faster.  
> Real-world stuff there.

I personally dont consider Q3 a benchmarking tool..but then again I use
my money for work not 3-d games, others may vary...
 
> ----
> I havent really read up on DDR memory is DDR _kind of like_ RAID Striping
> for memory? *needs to go visit Toms* Or does it actually use a 2x bites in
> the bus width? (well it wouldnt be _quite_ 2x i wouldnt think because you
> have 2 memory addresses.) I mean your memory is only 100MHZ, you can't go
> faster then that but you can read/write ~2x as fast if you split it and
> write to two simultaneously and 2x the bus width, but that still doesnt
> increase the seek time for the actual memory itself. And you still end up
> with a timing/processor latency problem.
> 
> Apple tried something similar to this around 95-6. I forgot what they
> called it. but you put matched pairs pairs of dimms in the corresponding
> A/B slots and you got about a 3-5% performance boost, but it didnt
> increase the width of the bus..
> ----
> 
> No, DDR pushed 2x the bytes through the bus, here's how it works:
> Normal SDRAM transmits on the up of the timing signal, DDR transmits on the 
> up and down, doubling the bandwidth.  It's a 200Mhz bus using a 100Mhz 
> clock.  In other words, it's really a true 200Mhz bus.

Its technically still a 100MHZ bus, but the memory is effectively running
at 200mhz. 

If you system or FSB or whatever bus is still running at 100 or 133 MHZ,
Your still running into the clock cycling problems associated with 11:1
clocks. As you had mentioned previously.

DDR ram performs maybe 3-5% faster then SDRAM except in 3-d games where it
helps achieve about a 10% performance boost. (according to what I just
read) This probably has more to do with the inefficiency of the processor
handling the AGP bandwidth more then the ram itself. (1/2 to 2/3 of the
memory bandwidth of the AGP slot is comsumed by the processor-from TOMS) 
Which explains the benefit of the GeForce DDR results more then processor
speeds. 

Minus a little bit of latency difference running a 128 bit bus with SDRAM
would prove to be more effective then doubling the the clock speeds and
running at 64-bits.  

> Some PC motherboard now offer memory interleaving to also increase the 
> bandwidth (but if you have a double-sided DIMM, you only need one).  
> Different implementations work better than others.  Some companies are still 
> working out bugs, so your mileage may vary.

Interleaving thats what Apple called it.. *doh* its mileage and stability
varied depending considerably on the ram in question. 

 
> ----
> The whole Pentium line is based on a RISC-esque approach, basically the
> Pentium series was a risc core surrounded by a pre-processor that broke
> the cisc instructions down into risc instructions for processing. Which
> may explain the internal core running at 100mhz and the outer core of the
> processor running at a GHZ *ponders*
> ----
> 
> I meant RISC-esq in that the engineers did their best to decrease the 
> complexity/transistor count of teh chip.  No, the core is what's running at 
> GHz levels, the I/O (system bus, FSB, what ever your favorite term is) runs 
> from 66-200Mhz depending.

I thought thats what they tried with the P-Pro and Merced..

> --
> But if your doing super-scalar design you would WANT to use risc-esque
> instructions wouldnt you?
> --
> 
> not nescesarily, super-scalar design only refers to the execution of 
> commands in parallel by different units within the processor.

Well you want them all executed in the same clock cycle to keep the
timing.  You end up with the results at the same time in order to move
onto the next instruction which may depend on the results of both you dont
want to have the latency in the timing..The easiest way to implement this
is to make all the instructions the same number of cycles. 

Which to me is risc-esque instructions..

I suppose you could do this with cisc instructions but the timing gets all
screwed up and you lose the benefits of super-scalar processing. 

Its about the same scenario as Multi-processing.  

> 
> ---
> Everyone else in the industry the industry uses RISC processing, Cray,
> Apple, Sun, IBM, HP, Digital, etc.
> --
> 
> Just as everyone is using CISC which has actually become a sort of RISC/CISC 
> hybrid, as you have said, CISC processors have not been truely CISC since 
> the 586 era (specifically, a company AMD purchased -- can't remember the 
> name -- pioneered the CISC->RISC translator technique which is why the K6 
> series of chips had such stellar performance (on everything but FP))
> 
The lines are definately blurred and honestly about what is left is the
data-piplining, and the instruction sizes. Not the Number of instructions
like what was originally intended..

> --
> Pipe-lining is faster, since if your trying to guess the next instruction
> you have fewer to pick from thus increasing the odds you are going to
> guess the correct one.
> --
> 
> Yes, pipelining is faster, to a certain extent.  Current gen processors 
> execute things out-of order (a part of super-scalar design) and because of 
> this, if they make a mistake, or guess the answer to a command wrong, the 
> entire pipeline has to be cleared, and re-started.  If you have a 2 stage 
> pipelin, no prob.  even a 5 or 10 stage unit isn't that bad,

The G-4 are in the 5-10 range, I know the 604e's had 5..

> but a 20 stage 
> unit getting cleared and re-started all the time has the potential to 
> severly cripple a chip, we better hope that Intel has engineered some damn 
> good branch prediction algorythms.
 
Heh it didnt stop em from putting out MMX technology and claiming it was
the greatest..even though mixed instructions really screw up chip
performance..

I don't think anyone has ever claimed Intel has the greatest technology..
Cheaply mass produced yes. best no..

 
> ---
> RISC instructions are equal in size making it more efficient. (ie they are
> all xx-bit instructions, not 8, 16 or 32 bit like cisc or x86
> instructions). Which really doesnt come to play much in superscalar
> processing but it does come to play in when your trying to get
> the maximum utilization of your available bandwidth. (8-bit hunks of data
> take the same bandwidth as 64-bit instructions unless your stacking
> 8 8-bit chunks together down the bus..I dont think anyone does that
> though)
> ----
> 
> Yes, x86 chips can do that but it's still slightly slower. 

Thats to be expected. You need to carry memory tags for 8 different sets
of data.. 

> --
> The most important aspect of borrowing from RISC for superscalar design is
> the instructions all execute in one clock cycle. You dont have the cisc
> approach of one instruction, taking multiple clock cycles. Even if you
> have variable length instructions, like 8, 16, 32 bit instructions with
> super-scalar processing you can line them up and execute them
> simultaneously. This is exactly what the G-4 does (borrowed slightly from
> Cray via SGI) with the altivec instruction set. it will process 4 32 bit
> instructions or 16 8-bit instruction simultaneously.
> --
> 
> Yeah, x86 chips were doing this years ago with MMx, 3dNow, and SSE.  The 
> Altivec unit does it better, but that's to be expected since it's a more 
> recent design.

Well technically it's not that new of a design, its been under development
for quite a few years, and is really geared towards imbedded processors..
specifically digitalizing voice and other communications devices, It's
Motorola's baby and thats what they do..

> 
> In short, I am not knocking RISC, --ALL-- recent processors of --ANY-- type 
> use RISC technologies heavily.  However, Intel's implementation in this 
> instance is not likely to prove anything but good marketing.  I will say 

Hey don't knock their marketing, thats what made M$ rich. =)

> however, that CISC/RISC chips have consistantly out-performed their Apple 
> cousins.  Mhz for Mhz, the Apple will win any day, but when Apple's selling 
> G4's @ 500Mhz, and AMD's selling Thunderbirds at 1.1Ghz. I'm betting on the 
> T-Bird.  I would guess that the 1Ghz T-Bird might beat a 500Mhz G4 by 
> something like 30-50%.  

I seriously doubt its that much...Apple went with a wider data path rather
then bus speeds. The G-4's sport a 128-bit bus so they can run at half the
MHZ with only 5:1 processor to bus ratio which doesnt leave the processor
starving for information either. You do end up with a slight latency
problem with the ram access and the 100MHZ bus which may equate to <5%
performance difference.  

> Also, RISC programs are significantly larger 

Not really that much larger at least looking at the code between
applications from the 68k to the PPC. It was around 10% larger which was
substantial when you only had 1 meg to work with..

> and 
> somewhat harder to write than their CISC equivalents (at least in 
> assembler).  

Assembler sucks to write in anyway =)

> RISC has many advantages, but it also has many downfalls.  The 
> only realy 'perfect' solution is a hybrid.  Also, Apple's G4 should no 
> longer be considered truely RISC sincel the Altivec added how many 
> instructions???  The G4 haas nearly as many as an Athlon w/ MMx and 3DNow!!!

The G-4 is not risc because of Altivec, but the core of the G-4 is still
very much RISC. the pipelining is still risc, and the instructions are
still risc.  
  
> I am quite familiar with all current desktop class processors, including the 
> G4, and although on paper it appears to be better than the PIII or Athlon, 
> Real-world performance/price says otherwise.

Actually the price for the chip itself is high but competitive. You can
buy them in the 1000's for rather cheap. Even Apple's prices are fairly
competitive for what you get. 

I wont even throw in MP configs even though the G-4 was designed with the
intention of being an MP chip, but then we aren't comparing chips. 

The G-4 chip itself is used in vector based computing in large mainframe
types of systems in multiples. I wouldn't automatically jump to the
conclusion its necessarily a desktop class processor.