Monday, November 30, 2020

KiloCore Pushes On-Chip Scale Limits with Killer Core

Must read

7 lessons about money you need to teach your kids

Financial master and writer, Jonathan Clements, recently took to Twitter with a simple list of lessons kids need to know about money. Though short...

5 Tips For Giving The Perfect Toast No Matter The Occasion

You're at an awards dinner. Or maybe it's a grand opening, or even a retirement party. You're asked to say a few words, but...

Announcements for Sept 7 through Sept. 13

Albemarle-Charlottesville NAACP will elect the nominating committee after the presentation “Special Education and Parents Rights” during its monthly meeting at 7 p.m. Monday at...

Rimini Street Once Again Sets New Premium Standard for Enterprise Software Support Service Level Commitments

  Rimini Street's ultra-responsive service model and seasoned engineers have won numerous awards for delivering excellence in customer service. Most recently, the Company was honored...




We have profiled a number of processor updates and novel architectures this week in the wake of the Hot Chips conference this week, many of which have focused on clever FPGA implementations, specialized ASICs, or additions to well-known architectures, including Power and ARM.

Among the presentations that provided yet another way to loop around the Moore’s Law wall is a 1000-core processor “KiloCore” from UC Davis researchers, which they noted during Hot Chips (and the press repeated) was the first to wrap 1000 processors on a single die. Actually, Japanese startup, Exascaler, Inc. beat them to this with the PEZY-SC (a 28nm MIMD processor with 1024 cores and has rankings on the Green 500 and a few machine wins in the country). This hiccup aside, the MIMD-based KiloCore approach is interesting–and has some noteworthy results compared to similar efforts.

Read More Articles :

KiloCore, has proven successful on both energy consumption and performance fronts. As one of the leads, Dr. Bevan Baas, shared from Hot Chips, the processors-per-die curve has remained relatively static over the last several years—with KiloCore representing a huge leap in the chart below.

The trajectory above is nothing new for Dr. Bevan Baas, who has spent decades immersed in low-power, high-performance processor design before becoming a professor at UC Davis. In the late 80s he was one of the designers of a high-end minicomputer in HP’s Computer Systems Division, before joining Atheros Communications, where he helped develop the first IEEE 802.11a wifi LAN. He now focuses on algorithms, architectures and circuits as part of the VLSI Computation Lab at UC Davis.

“KiloCore has been designed with the needs of computationally-intensive applications and kernels in mind. It is meant to act as a co-processor within a larger system and isn’t intended to run an operating system itself. There could be some cases in applications or systems where it could act as a sole processor, but they wouldn’t be general purpose systems,” Baas explains.

Kilo2

Each processor holds up to 128 instructions (and those larger are supported for the processors next to a shared memory block). Those are modified during application programming and stacked during runtime. The idea is to program them at once, stack them together, and let them go, Baas says. Applications can also optionally request that processors be reprogrammed during runtime based on signals from the processors—so a processor might get to a point in execution and send a package to the administrator with a reprogramming request. Alternately, groups of processors can do the same. “Most applications we have tested don’t’ use or need this feature, but it is possible,” Baas notes.

Data is passed via messages between processors (which means they don’t need to hop through a processor’s memory). The messages move from processor A’s software to the other’s software. At its simplest, it is a read/write with synchronization step between the processors, which is part of what makes it possible to scale to thousands, or even tens of thousands of processors, with the programmer needing to worry about synchronization routines and the like—the goal is that “they sort themselves out,” according to Baas.

One final word about programming. The way applications are implemented fits the architecture well. They are broken down into min-programs that are 128 words or less via a set of steps where the small programs are isolated down to coarse-grained tasks, task code is partitioned into serial code blocks, and parallelizable code blocks are replicated. Ultimately, this means a KiloCore array can run several different tasks at once, as seen in the example pictured. This sounds easy enough in a brief explanation, but one has to imagine there’s some significant development overhead.

 




More articles

Latest article

7 lessons about money you need to teach your kids

Financial master and writer, Jonathan Clements, recently took to Twitter with a simple list of lessons kids need to know about money. Though short...

5 Tips For Giving The Perfect Toast No Matter The Occasion

You're at an awards dinner. Or maybe it's a grand opening, or even a retirement party. You're asked to say a few words, but...

Announcements for Sept 7 through Sept. 13

Albemarle-Charlottesville NAACP will elect the nominating committee after the presentation “Special Education and Parents Rights” during its monthly meeting at 7 p.m. Monday at...

Rimini Street Once Again Sets New Premium Standard for Enterprise Software Support Service Level Commitments

  Rimini Street's ultra-responsive service model and seasoned engineers have won numerous awards for delivering excellence in customer service. Most recently, the Company was honored...

Henry Schein’s 19th Annual ‘Back to School’ Program Helps More Than 5,000 Children Return To The Classroom Ready To Learn

  MELVILLE, N.Y., Aug. 29, 2016 /PRNewswire/ -- Henry Schein, Inc. (Nasdaq: HSIC) has launched its annual "Back to School" program to provide underserved children throughout...