Matching an application's performance, power, memory and interface requirements to a specific embedded processor can be a daunting task for designers since similar systems can vary significantly. Although ARM® processors are available in a dozen variations, system designers seldom find a "perfect fit".
In this article, various standard interfaces are highlighted along with suggestions on how they may differ among embedded chip vendors. Understanding the basic interfaces can help designers prioritize which ones should be on-chip. However, while standard interfaces serve a valuable purpose, there is also an additional need for on-chip interfaces to be customized to provide additional on-chip resources. The article describes two of these peripheral blocks.
The universal serial bus (USB) interface was initially developed to connect personal computers to peripherals. Over time it has become popular for industrial and infrastructure applications. Human interface devices (HIDs), such as keyboards, mice and oscilloscopes, typically employ the USB interface meaning it must be supported by the system's embedded processor. The most effective way to accomplish this is with an on-chip peripheral.
In addition to HIDs, two other device classes can be utilized in industrial and infrastructure applications. USB communication device class (CDC) was designed for modems and faxes but also supports simple networking by providing an interface for transmitting Ethernet packets. Similarly, USB mass storage device (MSD) targets hard disk drives and other storage media.
The USB 2.0 specification requires the host to initiate all inbound and outbound transfers. The specification also defines three basic devices: host controllers, hubs, and peripherals.
USB 2.0's physical interconnect is a tieredstar topology with a hub at the center of each star. Each wire segment is a point-to-point connection between the host and a hub or function, or a hub connected to another hub or function.
The addressing scheme used for devices in a USB 2.0 system allows for up to 127 devices to be connected to a single host. These 127 devices can be any combination of hubs or peripherals. A compound or composite device will account for two or more of these 127 devices.
Although USB 2.0 is likely the first choice in industrial and many infrastructure applications, USB On-the-Go (OTG) is deployed when peripheral devices need to communicate with each other without any involvement from the host. To accommodate peer-to-peer communication, USB OTG introduced a new class of devices containing limited host capabilities for two peripherals to share data.
The OTG supplement defines a new handshake called the host negotiation protocol (HNP). Using HNP, a device connected as a default peripheral can request that it become the host. This allows the existing USB 2.0 host-device paradigm to provide peer-to-peer communication. A session request protocol (SRP) is also defined.
USB's popularity and status as a solid standard makes it possible for embedded processor vendors to offer software libraries that target specific USB functionality and therefore significantly trim development time. Instead of writing their own code to implement the interface, system designers simply make a function call.
The libraries should be certified as having passed USB device and embedded host compliance testing conducted by the USB Implementers Forum. Some vendors, such as Texas Instruments (TI), offer extensive USB libraries for their embedded processors.
In 2007, the USB 3.0 Promoter Group was formed to create a faster USB variant that will be backward compatible with previous USB standards but deliver 10 times the data of USB 2.0. USB 3.0 uses a new signaling scheme. Backward compatibility is maintained by keeping the USB 2.0 two-wire interface. Although this faster version is in the early stages of deployment, USB 2.0 will likely remain the most popular USB variant for several years with three speed options high-speed (480 Mbps), low-speed (1.5 Mbps) and full-speed (12 Mbps).
Although an interface conforming to the IEEE 802.3 Ethernet standard is often incorrectly referred to as an Ethernet media access controller (EMAC), a complete EMAC subsystem interface actually consists of three modules all of which may or may not be integrated on chip:
- The physical layer interface (PHY)
- The Ethernet MAC, which implements the EMAC layer of the protocol
- A custom interface typically referred to as the MAC control module
|Figure 1: EMAC subsystem.|
The EMAC control module controls device interrupts and incorporates an 8 kbyte internal random access memory (RAM) to hold EMAC buffer descriptors. The MDIO module implements the 802.3 serial management interface to interrogate and control up to 32 Ethernet PHYs connected to the device by using a shared two-wire bus.
Host software use the MDIO module to configure the auto negotiation parameters of each PHY attached to the EMAC, retrieve the negotiation results, and configure required parameters in the EMAC module for correct operation. The module is designed to allow almost transparent operation of the MDIO interface, with very little maintenance from the core processor.
EMAC modules provide an efficient interface between the processor and the network. EMAC modules usually offer 10Base-T (10 Mbits/sec) and 100BaseTX (100 Mbits/sec), half-duplex and full-duplex mode, and hardware flow control and quality-of-service (QoS) support. In addition, some processors now support gigabit EMAC capability supporting data rates of 1000 Mbits/sec.
Since Ethernet is so widely used, embedded processors typically integrate one or more EMAC interfaces on chip. There is some variation in the way different vendors implement the complete EMAC subsystem described above. The quality and extent of software support and libraries for implementing Ethernet interfaces is another decision point in choosing an embedded processor vendor.
At times, applications such as routers or switches will require more than one EMAC. By using multiple EMACs, these applications are able to communicate to numerous devices at once creating a synchronized process of communication.
The serial ATA (SATA) bus connects host bus adapters to mass storage devices such as hard disk drives and optical drives. It has nearly replaced its predecessor, parallel ATA (PATA) which required 40/80 wire parallel cable that could not exceed 18 inches. PATA's maximum data transfer rate was 133 Mbytes/s while SATA's serial data format uses two differential pairs to support interfaces to data storage devices at line speeds of 1.5 Gbits/s (SATA Revision 1), 3.0 Gbits/s (SATA Revision 2) and 6.0 Gbits/s (SATA Revision 3). SATA 1 and SATA 2 capability are available today with SATA 3 support coming in the near future.
Also, SATA controllers require a thinner cable as long as three feet. A thinner cable offers flexibility permits both easier routing and better air ventilation inside the mass storage enclosure.
The serial link attains its high performance in part by implementing an advanced system memory structure to accommodate high-speed serial data. The advanced host controller interface (AHCI) memory structure contains a generic area for control, status and a command list data table. Each entry in the command list table contains information for programming a SATA device, as well as a pointer to a descriptor table for transferring data between system memory and the device.
Most SATA controllers support hot swapping and the use of a port multiplier to increase the number of devices that can be attached to the single HBA port. The SATA standard includes a long list of features, but few SATA controllers support all of them. Popular features include:
- support for the AHCI controller spec 1.1
- integrated SERDES PHY
- integrated Rx and Tx data buffers
- support for SATA power management features
- internal DMA engine per port
- hardware-assisted native command queuing (NCQ) for up to 32 entries
- 32-bit addressing
- support for a port multiplier
- activity LED support
- mechanical presence switch
DDR2 is the successor to the double data rate (DDR) SDRAM specification and the two standards are not compatible. By transferring data on the rising and falling edges of the bus clock signal and by operating at a higher bus speed, DDR2 achieves a total of four data transfers per internal clock cycle.
A simplified DDR2 controller interface includes the following design blocks:
- memory control
- read interface
- write interface
- IO block
|Figure 2: Simplified DDR2 controller implementation.|
The memory control block issues accesses from memory to the application-specific core logic or vice versa. The read physical block handles external signal timing that captures data during read cycles; and the write physical block manages the issuance of clock and data with the appropriate external signal timing.
A byte-wide, bidirectional data strobe (DQS) is transmitted externally along with data (DQ) for capture. DQS is transmitted by memory during reads and by the controller during writes. On-chip delay-lock loops (DLLs) are used to clock out DQS and corresponding DQs. This assures that they can track each other during changes in voltage and temperature.
DDR2 SRAMs have differential clock inputs to reduce the effects of duty cycle variations on clock inputs. DDR2 SRAMs also support data mask signals to mask data bits during write cycles.
Mobile DDR (MDDR) is also called Low Power Double Data Rate memory (LPDDR) because it operates at 1.8 volts as opposed to the more traditional 2.5 or 3.3 volts and is commonly used in portable electronics. Mobile DDR memory also supports low-power states that are not available on traditional DDR2 memory. As with all DDR memory, the double data rate is achieved by transferring data on both clock edges of the device.
With the number of on-chip peripherals limited either by cost or other constraints, system designers often tend to find novel ways of moving data on and off chip. One tactic is to tap the resources of an unused video port, essentially tricking it to send and receive non-video data at high speeds. One of the downsides of this approach is that the data has to be formatted into video frames, which requires some processor MIPS during operation and valuable programming time during the design cycle.
Other methods present similar difficulties and most of the standard on-chip data interfaces are serial ports that are not capable of handling high-speed transfers.
As a result, many system designers see great value in a flexible, high-speed peripheral primarily for data transfer that does not conform to a particular interface standard but can be configured in a number of ways. This is particularly true if the system processor has to interface with high speed DACs, ADCs, DSPs, and even FPGAs capable of high speed data transfers of the order of 250 MB/s.
The basic architecture of such a peripheral is easy to describe. It would have multiple channels with separate, parallel data buses that could be configured to accommodate more than one word length. It would also have an internal DMA block so that its operations could proceed without draining the core's MIPS budget. Single or double data rates and multiple data packing formats are also desirable.
The universal parallel port (uPP) is available on a variety of TI embedded processors including the Sitara™ ARM9 AM1808 and AM1806 microprocessors (MPUs) and OMAP-L138 processor, which includes a TMS320C674x core and an ARM9 core.
Unlike serial peripherals such as SPI and UARTs, uPP offers designers the advantages of a parallel data bus with a data width of 8- to 16-bit per channel.
When running at its maximum clock speed of 75 MHz, uPP transfers data much faster than the serial port peripherals. For example, a single 16-bit uPP channel operating at 75 MHz is as much as 24 times faster than a SPI peripheral operating at 50 MHz. A simplified block diagram is shown in Figure 3.
|Figure 3: uPP simplified block diagram.|
The most important features of the uPP include:
- Two independent channels with separate data buses
- Channels can operate in same or opposing directions simultaneously
- I/O speeds up to 75 MHz with 8-16 bit data width per channel
- Internal DMA — leaves CPU EDMA free
- Simple protocol with few control pins (configurable: 2-4 per channel)
- Single and double data rates (use one or both edges of clock signal)
- Double data rate imposes a maximum clock speed of 37.5 MHz
- Multiple data packing formats for 9-15 bit data widths
- Data interleave mode (single channel only)
uPP is largely used for applications requiring off chip real-time processing like FPGAs or DSPs and is highly beneficial to markets needing data instantly such as the medical field. By utilizing the uPP, decision making processors are able to draw conclusions with up to date information.
The programmable real-time unit (PRU) is a small, 32-bit processing engine that provides additional resources for real-time processing on chip. Used exclusively in TI's embedded processors in the AM1x MPUs and OMAP-L138 solutions, PRU offers system designers an extra measure of flexibility, typically reducing component costs.
The PRU's four bus architecture allows instructions to be fetched and executed concurrently with data transfers. In addition, an input register is provided in order to allow external status information to be reflected in the internal processor status register.
An important goal in the PRU's design was to create as much flexibility as possible to perform a wide range of functions. The flexibility of the PRU allows developers to incorporate additional interfaces into their end product — whether it's a touch screen, integrated displays or storage capabilities — to further extend their capabilities or the capabilities of their own proprietary interfaces. This goal was in large part accomplished by giving the PRU full system visibility including all system memory, I/Os and interrupts.
Although its access to system resources is comprehensive, the PRU's internal resources are relatively modest. It has 4 Kbytes of instruction memory and 512 bytes of data memory. The PRU also has its own GPIOs with latencies measured in nanoseconds.
|Figure 4: Using the PRU to extend the capabilities of the existing device peripherals.|
The PRU can be programmed with simple assembly code to implement custom logic. The instruction set is divided into four major categories:
- move data in or out of the processor's internal registers
- perform arithmetic operations
- perform logical operations
- control program flow
Besides being an IO replacement, PRU can be programmed to execute a variety of control, monitoring, or other functions that are not available on chip. This flexibility is particularly helpful in applications containing control requirements that do not match those available on any standard processor configurations.
ARM Subsystem and Peripheral Integration
When evaluating peripheral interfaces in an ARM-based processor, it is important to understand how the peripherals and the ARM Subsystem integrate.
The ARM processor is suitable for complex, multi-tasking, and general purpose control tasks. It has a large program memory space and it has good context switching capability. It's suitable for running Real-Time Operating Systems (RTOS) and sophisticated High Level Operating Systems. The ARM is responsible for system configuration and control, which includes peripheral configuration and control, clock control, memory initialization, interrupt handling, power management, etc. The ARM Subsystem includes the ARM processor and other components necessary for the ARM processor to act as master of the overall processor system.
A typical ARM Subsystem consists of combinations of the following components:
- ARM Core (for example: ARM926EJ-S or ARM Cortex-A8)
- Write Buffer
- Instruction CACHE
- Data CACHE
- Java accelerator
- Neon single instruction, multiple data (SIMD) Engine
- Vector floating point coprocessor (VFP)
- ARM Internal Memories
- ROM (ARM boot loader)
- Bus Arbiters
- Bus arbiters for accessing internal memories
- Bus arbiters for accessing system and peripheral control registers
- Bus arbiters for accessing external memories
- Debug, trace, and emulation modules
- Embedded Trace Macrocell (ETM)
- System Control Peripherals
- ARM Interrupt Control Module
- PLL (Phased-Lock Loop) and Clock Control Module
- Power Management Module
- System Control Module
|Figure 5: ARM Subsystem block diagram.|
Although standard interfaces play a critical role in designing systems that are interoperable, low-cost and require less time to design, their utility is still limited for a design team that needs to differentiate its product. Designers should also look to their chip vendors for a wide variety of standard interfaces in multiple combinations. High quality software libraries that help implement the interfaces efficiently are other differentiating factors for chip vendors. Offering an additional level of flexibility is also helpful and can be accomplished by configurable interfaces such as TI's PRU and uPP. With options like these in their tool kit, system designers can be creative while simultaneously keeping component costs low.
Discover the benefits of becoming a My Digi-Key registered user.
• Enjoy faster, easier ordering with your information preloaded.
• View your order status, web order history
• Use our BOM Manager tool
• Import a text file into a RoHS query