OT, But interesting

Sean picasso@madflower.com
Sun, 6 Aug 2000 18:15:44 -0400 (EDT)


On Sun, 6 Aug 2000, Tim Schmidt wrote:

> >Does AMD use IBM's Copper technology? I know Intel licensed the Copper
> >Technology with the "coppermine" series,
> 
> Actually, Coppermine is a misnomer.  All Intel chips currently use aluminum 
> technology.  AMD is still using Aluminum tech at it's Texas foundry where 
> all Durons and a few Thunderbirds are being produced, AMD is using copper 
> tech (dunno if it's liscenced from IBM) at it's Dresden plant in Germany 
> where most of the Thunderbirds are being produced as well as all of the 
> upcomming Corvette mobile Athlons.

I was thinking someone licensed IBM's process in the x86 world. But now
your making me chase down the articles I read a few months ago. Because I
was thinking that plant in Germany had its own technique.. It also could
be "secretly" licensed (theres a LOT of cloak and dagger stuff in the
computer industry) to help bolster AMD's image as a technology forerunner.
 
> >but Its a bit tricky to get it to
> >work right especially with large chips. IBM is having a hard time getting
> >it to work correctly with the G-4 series chips which have 2.5  times more
> >transistors then the G-3 series does.
> 
> AMD's Copper Thunderbirds (some un-godly ammount of transistors, like 3x as 
> many as the G4) are working beautifully.

That really could be. The G-4 was kind of rushed out the door, it is
supposedly really 2 G-3's and the Altivec unit on the same chip with a
preprocessor using super-scalar techniques. Which wasnt really the G-4
design since it was supposed to be a true 64-bit design. Instead of 2
32-bit chips with the Altivec 128-bit SIMD processor. 

> >
> >The copper technology rocks.  They took the G-3 300 design, shrank the die
> >from .20 to .18 and lowered the voltage and upped the clock to I believe
> >400MHZ and basically ended up with a faster chip that used less energy. (I
> >think it was from 6watts to 3.5watts.) This is why Apple can ship the
> >_real_ G-3 chip in its portables and not some cut down version like the
> >"mobile series". Also, its why if you get a desktop box, your more likely
> >to get a motorola chip (which I believe still uses aluminum) and if you
> >get a portable your more likely to get an IBM chip. It's also part of the
> >reason why they are shipping machines without cooling fans(iCube, iMac).
> 
> Yes, the lowering of voltage and wattage sed is a product of the die shrink, 
> using copper instead of aluminum apparantly won't make much of a difference 
> with .15, and will only start to show up when .13u chips hit shelves.  The 
> Thunderbirds use anywhere from 30 to 45W depending on speed.

so is using copper instead of aluminum. I dont think speedwise it
necessarily shows up but it sure does in heat/energy consumption. 
35-45W seems like a lot of wasted heat/energy to me, but I think the G-4's
use like 12W, and the G-3's use like around 5W.

> >
> >As far as stamping the speeds on the chips, IBM/Motorola looked at the
> >demand before stamping the chips. If they made a batch with a great yield
> >of 500MHZ chips, some of those might get stamped as 400MHZ just because
> >they were out of the 400MHZ chips. Which increases the chances of being
> >able to overclock them significantly. I dont know if Intel or AMD uses the
> >same technique or not.
> 
> Yes, all fabs do.  including everything from graphics processors to memory 
> (assuming there's more than one speed grade available).

I kind of thought that was industry wide. it makes sense =)

> >
> >Clock chipping (overclocking just the chip) has traditionally been
> >unstable and never resulted in major _real_ performance gains. Its a
> >little different story then overclocking your entire board as far as the
> >benefits.
> 
> On PC's, OC'ing the chip can yield --massive-- benefits, like a 25-45% 
> performance increase (Duron 600Mhz -> 950Mhz).

MHZ != RL performance increase. More precisely, I havent seen
chip clocking give a straight ratio of MHZ increase to RL performance, its
usually a curve and much lower then the percentage of overclocked speed. 
Then again I dont necessarily trust benchmarks either, I have seen WAY too
many slanted test results. And manufacturers are very likely to squalsh
your site if you do post benchmarks.   
 
> >
> >This throws out a couple of questions.
> >First MHZ to MHZ do the AMD and Intel chips stack up equally as far as RL
> >performance?
> 
> Yes and no.  Here's the breakdown:
> 
> Both AMD and Intel have a "flagship" and a "value" chip.  They both use the 
> same respective core, the only difference is (front side) bus speed and L2 
> Cache configuration.  AMD's Core is equal to Intel's in RW performance 
> except in FP where it's about 30% faster.
> 
> 
> Duron vs. Celeron
> 
> The Duron is approx 25% faster than the same speed Celeron, here's why:
> 
> -Duron: L1: 128k  L2: 64k  exclusive design (64bit bus, 16-way)
> -Celeron2 (Celermine): L1: 32k  L2: 128k  inclusive design (256bit bus, 
> 8-way)
> 
> As you can see, the Duron has 32k more cache, and more of it's cache is 
> closer to the core.  The duron however, has another advantage and that is 
> that info in the L1 does not have to be duplicated in L2 but the Celeron's 
> L1 has to be duplicated in L2...  Effectively giving the Celeron 96k of L2 
> and pusing the Duron to 64k more cache than the Celeron.
> 
> Thunderbird vs. Coppermine
> 
> The Thunderbird is significantly faster than Coppermine at lower speeds (ala 
> 600Mhz), but at 1Ghz, they score about equal with Coppermine coming out on 
> top in slightly more benchmarks.  This is due mainly to the Coppermine's 
> superior L2 Cache design.
> 
> -Thunderbird: L1: 128k  L2:256k exclusive design (64bit bus, 16-way)
> -Coppermine: L1: 32k  L2:256k inclusive design (256bit bus, 8-way)
> 
> As you can see, the Thunderbird has 128k more cache (counting the 
> inclusive/exclusive design) than the Coppermine, The Thunderbird is even 
> more likely to find what it needs in it's cache (16-was assosciative, 
> divides the cache into 16 different caching sectors).  However, the 
> Coppermine has 4x the bandwidth to it's L2 Cache, making it nearly as fast 
> as L1.
> 
So basically your saying the difference in speed in not primarily because
of the processor MHZ of the chip itself, its really due to the memory
design of cache memory? 

> >If Intel is disabling some unstable code, could this be some of the code
> >specifically on the chip for the SPEC tests (Intel has a set of
> >instructions on the chip specifically written for the SPEC tests and
> >only used for that)?
> 
> Possible, you'd have to ask an engineer.

I just think it would be funny to run the SPEC tests with an off the shelf
chip with the SPEC stuff disabled.  
 
> >
> >If the internal clock speed of the Pentium is running at 100MHZ what
> >is the benefit of clocking the rest of the chip to 1ghz? Your still
> >processing information internally at 100MHZ.
> 
> Coppermine's run at 133Mhz FSB (Front-side bus), that's the speed of the bus 
> it uses to communicate with memory, PCI/AGP busses.  The Athlon/Duron family 
> runs at a 100Mhz DDR bus (effectively 200Mhz) soon to be increased to 266Mhz 
> (133Mhz x2), the upcoming P IV Willamette runs a 400Mhz bus (Really 100Mhz 
> x4 like mega-DDR).

I havent really read up on DDR memory is DDR _kind of like_ RAID Striping
for memory? *needs to go visit Toms* Or does it actually use a 2x bites in
the bus width? (well it wouldnt be _quite_ 2x i wouldnt think because you
have 2 memory addresses.) I mean your memory is only 100MHZ, you can't go
faster then that but you can read/write ~2x as fast if you split it and
write to two simultaneously and 2x the bus width, but that still doesnt
increase the seek time for the actual memory itself. And you still end up
with a timing/processor latency problem. 

Apple tried something similar to this around 95-6. I forgot what they
called it. but you put matched pairs pairs of dimms in the corresponding
A/B slots and you got about a 3-5% performance boost, but it didnt
increase the width of the bus..

> The Wilamette takes a RISC-esque approach that will be 
> it's downfall 

The whole Pentium line is based on a RISC-esque approach, basically the
Pentium series was a risc core surrounded by a pre-processor that broke
the cisc instructions down into risc instructions for processing. Which
may explain the internal core running at 100mhz and the outer core of the
processor running at a GHZ *ponders*

> however, instead of adding transistors for parallel processing 
> (called super-scalar design) 

But if your doing super-scalar design you would WANT to use risc-esque
instructions wouldnt you? 

Everyone else in the industry the industry uses RISC processing, Cray,
Apple, Sun, IBM, HP, Digital, etc. 

Pipe-lining is faster, since if your trying to guess the next instruction
you have fewer to pick from thus increasing the odds you are going to
guess the correct one. 

RISC instructions are equal in size making it more efficient. (ie they are
all xx-bit instructions, not 8, 16 or 32 bit like cisc or x86
instructions). Which really doesnt come to play much in superscalar
processing but it does come to play in when your trying to get
the maximum utilization of your available bandwidth. (8-bit hunks of data
take the same bandwidth as 64-bit instructions unless your stacking
8 8-bit chunks together down the bus..I dont think anyone does that
though)

The most important aspect of borrowing from RISC for superscalar design is
the instructions all execute in one clock cycle. You dont have the cisc
approach of one instruction, taking multiple clock cycles. Even if you
have variable length instructions, like 8, 16, 32 bit instructions with
super-scalar processing you can line them up and execute them
simultaneously. This is exactly what the G-4 does (borrowed slightly from
Cray via SGI) with the altivec instruction set. it will process 4 32 bit
instructions or 16 8-bit instruction simultaneously. 

> it drastically reduces the number of 
> transistors and lengthens the pipeline (to 20 stages, as compared to the 
> Athlon's 13 stages and the PIII's 11).

Even if the pipelining is longer, that doesnt equate to slower. especially
my guess is that the pipe-lining is breaking up the old CISC code into new
risc instructions. Which means Wintel is taking a hardware based move to
RISC versus the Apple transition of software. (when Apple moved from the
Motorola 68k CISC processors to the the PPC processors, everything
basically ran under software emulation on the PPC-side, unless the program
was specifically compiled for the PPC, they did have "FAT" binaries which
basically contained both PPC and CISC based code) 
 
>  This allows the chip to be run much 
> cooler and hence clocked much higher, but it also --drastically-- reduces 
> performance in the real world.  

Thats not true. RISC processors are more _efficient_ which is why Apple
gets away with using RISC at slower clock speeds, they use fewer
transistors, less energy etc, because of the equal instruction sizes/clock
cycles. Of course Motorola and IBM aren't using a true risc design anymore
either since the instruction set has 220+ instructions not counting
altivec which I believe 228 more instructions which is a far cry from the
16 or 32 original instruction set. 

> It appears as if Intel is trying for the 
> crowd that looks at the Mhz rating, in essence, they're pullig a similar 
> trick to Cyrix's PR ratings.

Its great marketing though. I mean I seriously doubt there are even that
many people on this list, that judge performance other by anything but
MHZ muchless someone off the street. Especially "gamers" who buy 5k
machines so they can play half-life or whatever the new hot game is. 

Also the fewer the transistors the cheaper it is to produce. RISC designs
are usually cheaper to design and manufacture. This means more money in
their pocket, you know they wont drop the price. 

> >
> >This in my mind throws out the question so what if it the chip is running
> >at 1GHZ or 1.1GHZ? You are better off with a 1GHZ chip on a 200MHZ board
> >(or running your 1.1GHZ chip at 1GHZ). This is simply because of the wait
> >states involved. If you have a 200MHZ board with a 1GHZ chip and your
> >66MHZ PCI bus. You have an even 1:3 ratio between the PCI and the board
> >and an even ratio of 5:1 between the chip to the board. This translates
> >into NO wasted time for parts of a clock cycle. Your chip is running 15x
> >the PCI bus.
> 
> PCI bus in PC's run's at 33Mhz @ 32bit.  Some super-high end 
> Workstation/server boards have 66Mhz 64bit PCI. 

My bad I thought 66MHZ was fairly standard now and they just dropped the
33MHZ onboard for backwards compatibility. 

> The Intel chip @ 1133Mhz 
> runs an 8.5x multiplyer (8.5*133 = 1133) However, AMD's 1.1Ghz Athlon runs a 
> 5.5x multiplyer (5.5*200 = 1100 (really an 11x multiplyer: 11*100 = 1100, 
> but effectively 5.5))

Is this really effectively? (see ddr question =))

>  You are correct that this puts AMD's chip in a much 
> more favorable position.  PC's are generally not run asyncronously.  The 
> only exception is VIA's recent chipsets (and Intel's i815) which allow you 
> to run your SDRAM either 33Mhz higher or lower than the system bus speed.  
> Tests have shown that running it 33Mhz faster doesn't make much of a 
> difference (on the order of 1 or 2%), and slower slows it down about 5%.

IBM's flexible speed bus should be out soon if they aren't already
shipping it in their servers..

Running the memory at higher speeds doesnt solve the bus speed
problem which equates to not seeing dramatic differences in the numbers
you listed. 

> >
> >If your running at 1.1ghz you have the 1:3 ratio between the PCI and the
> >board, and a 5.5:1 ratio between the chip and the bus. If the buses can
> >only exchange data when the cycles match, you are actually waiting for the
> >chip to go around to the 11:2 ratio, the extra wait time involved actually
> >slows your machine down unless you are performing 11 instructions on
> >every piece of data.
> 
> Incorrect.  Deep buffers are used to hold information until the other bus is 
> ready, however, this is only used in a few chipsets as mentioned above.

So we are back to the cache/bus speed problem?

> >
> >Or am I completely offbase?
> >
> 
> Not completely, just slightly.  Put it this way, the engineers are about a 
> half step ahead of you ;)

I hope so. I dont even have a degree muchless ever studied EE. Heck, I'm
still having problems trying to remember how to use my multimeter...

Which I will say for as much as I rip on RatShack, This impressed me..
The multimeter is ~10 years old with the original batteries, it was stored
in the barn for the last 6 years. I pulled it out yesterday. Not only did
it work, but there was NO corrosion from the RatShack batteries.