

# **Next Generation Multi-Purpose Microprocessor**

# Abstract for Presentation at MPSA, 4<sup>th</sup> of November 2009

Jiri Gaisler, Aeroflex Gaisler, Kungsgatan 12, SE-411 91, Göteborg, Sweden, Tel +46 31 7758650, jiri@gaisler.com

# **1 ABSTRACT**

The Next Generation Multi-Purpose Microprocessor (NGMP) will be a SPARC V8(E) based multi-core architecture that provides a significant performance increase compared to earlier generations of European space processors. Work on defining the NGMP is currently ongoing at Aeroflex Gaisler, Gothenburg, Sweden.

The presentation will describe the baseline architecture together with the refinements and additions that have been made during the first half of the activity's definition phase. The presentation will point out key choices that have been made and will also emphasise points that are still open. The software environments and operating systems that will be available for the NGMP, together with a general overview of the new LEON4 microprocessor, will also be presented.

It should be noted that the presentation will describe the current state of Aeroflex Gaisler's work on the NGMP. The choices presented have not passed a System Requirements Review.



# **1.1 Architectural Overview**

Illustration 1: NGMP Block Diagram

Illustration 1 above depicts an overview of the NGMP architecture. The system will consist of five AHB buses; one 128-bit processor bus, one 128-bit memory bus, two 32-bit I/O buses and one 32-bit bus for debugging. The processor bus will include four LEON4FT cores connected to a shared L2 cache. The memory bus is located between the L2 cache and the main external memory interface and will include a memory scrubber and possibly on-chip

|  | Doc. No.: |            | NGMP-MPSA-ABS |        |
|--|-----------|------------|---------------|--------|
|  | Issue:    | 1          | Rev.:         | 1      |
|  | Date:     | 2009-10-20 | Page:         | 2 of 4 |

memory. The I/O bus has been split into two separate buses where all slave interfaces have been placed on one of the buses (slave I/O bus) and all master interfaces have been placed on the other bus (master I/O bus). The master I/O bus connects to the Processor bus via an I/O MMU that provides address translation and access restriction.

The two I/O buses include all peripheral units such as the PCI, HSSL and SpaceWire. The dedicated 32-bit debug bus connects one debug support unit (DSU), one JTAG debug link, one SpaceWire RMAP target, and one USB debug communication link. The debug bus allows for non-intrusive debugging through the DSU and direct access to the complete system, since the debug bus is not placed behind an I/O MMU.

The target frequency of the LEON4FT and on-chip buses is 400 MHz, but depends ultimately on the implementation technology.

The key components of the NGMP system are:

#### Processor core

- LEON4FT with 32 + 16 kByte cache, SRMMU, physical snooping, 32-bit MUL
- GRFPU floating point, shared between pairs of LEON4FT with 4-word instruction FIFO, or dedicated GRFPU for each processor core.
- Internal timer unit with four timers and watchdog functionality
- Internal interrupt controller

#### Peripherals

- SpaceWire core, 200 Mbit/s, RMAP target, redundant link drivers, DMA
- 10/100/1000 Mbit Ethernet MAC with EDCL and DMA
- PCI core with 32-bit data, 66 MHz, master/target/host functionality
- ESA PCI arbiter with support for four agents
- High-Speed Serial Link, if available at time of implementation
- USB debug link
- 16-bit General purpose I/O port controller
- 8-bit UART interface
- JTAG debug link with AHB master interface
- SpaceWire RMAP target for dedicated SpaceWire debug link

#### System support cores

- AHB status register
- Debug support unit with performance monitoring and AHB/INST trace buffers
- Timer with five 32-bit timers, of which one is intended for use as a watchdog
- General purpose register for clock gating
- Multiprocessor interrupt controller
- Secondary interrupt controllers
- Memory scrubber

#### Memory interfaces

- 64-bit DDR2 interface, 400 MHz (DDR2-800), with 16 or 32 RS ECC bits
- 8/16 bit PROM/IO controller with BCH ECC

| Doc. No.: | Doc. No.:  |          | NGMP-MPSA-ABS  |  |
|-----------|------------|----------|----------------|--|
| Issue:    | 1          | Rev.:    | 1              |  |
| Date:     | 2009-10-20 | Page:    | 3 of 4         |  |
|           | Issue:     | Issue: 1 | Issue: 1 Rev.: |  |

## **1.2 LEON4 Microprocessor and L2 Cache**

The LEON4 processor is the latest processor in the LEON series. LEON4 is a 32-bit processor core conforming to the IEEE-1754 (SPARC V8) architecture. It is designed for embedded applications, combining high performance with low complexity and low power consumption. LEON4 improvements over the LEON3 processor include:

- Branch prediction
- 64-bit pipeline with single cycle load/store
- 128-bit wide L1 cache
- 4-port register file

The LEON4FT processor connects to an AMBA AHB bus with 128-bit wide data vectors. This leads to a 4x performance increase when performing cache line fills. Single cycle load and store instructions increases performance and also take advantage of the wider AHB bus.

Static branch prediction has shown to give an overall performance increase of 10%. The LEON4 also has support for the SPARC V9 compare and swap (CASA) instruction that improves lock handling and performance.

An important factor to high processor performance and good SMP scaling is high memory bandwidth coupled with low latency. As previously described, an 128-bit AHB bus will be used to connect the LEON4 processors. To mask memory latency, the new Aeroflex Gaisler L2 cache will be used as a high-speed buffer between the external memory and the AHB bus. A read hit to the L2 cache typically requires 3 clocks, while a write takes 1 clock. A 32-byte cache line fetch will be performed as a burst of two 128-bit reads. The first read will have a delay of 3 clocks and the second word will be delivered after one additional clock. A cache line will thus be fetched in 4 clocks (3 + 1). Error correction will add an additional latency of 1 clock to all read accesses to allow time for checksum calculation.

## **1.3 Improved Support Multi-Processor Debugging**

The NGMP platform will include new and improved debug and profiling facilities compared to the LEON2FT. The features include:

- AHB bus trace buffer with filtering and counters for statistics
- Processor instruction trace buffers with filtering
- Performance counters in each processor core
- Dedicated debug communication links that allow non-intrusive accesses to the processors' debug support unit
- Hardware break- and watchpoints
- Monitoring of data areas

## **1.4 Software Support**

The GRMON debug monitor from Aeroflex Gaisler will be extended to support all new functionality for debugging and profiling that will be included in the NGMP. The hardware platform will provide full backwards compatibility with existing LEON3FT software and all standard compilers that can produce correct SPARC V8 code can be used.

Board support packages for the NGMP will be delivered for the following operating systems:

- RTEMS 4.8 and 4.10
- eCos
- VxWorks 6.7
- Linux 2.6

|  |        | Doc. No.:  |       | NGMP-MPSA-ABS |  |
|--|--------|------------|-------|---------------|--|
|  | Issue: | 1          | Rev.: | 1             |  |
|  | Date:  | 2009-10-20 | Page: | 4 of 4        |  |

## 1.5 Conclusion

The development of Next Generation Multi-Purpose Processor is currently in the definition phase. The NGMP will be a will be a SPARC V8(E) based multi-core architecture, that provides a significant performance increase compared to earlier generations of European space processors, with high-speed interfaces such as SpaceWire and Gigabit Ethernet on-chip. The platform will have improved support for profiling and debugging compared to previous generations if European space processors and will have a rich set of software immediately available due to backwards compatibility with existing SPARC V8 software and LEON3 board support packages.