USD

How to Efficiently Decode and Play Audio Files in Embedded Systems

By Jacob Beningo

Contributed By Digi-Key's North American Editors

Audio interfaces are increasingly becoming an expected feature of embedded designs. At the same time, users of embedded systems are expecting ever higher audio quality. For developers, this presents a challenge: how to run MP3 or other audio files on a microcontroller-based system. These systems are not only resource constrained, but also lack the easy-to-use audio interfaces that developers can leverage on a Linux-based system. This makes it more difficult to decode audio files and efficiently convert the contents to analog sound.

Developers must also choose carefully between a hardware or software solution and decide which components to use as cost, space, and development time are important considerations.

This article provides an introduction to several hardware and software solutions from AKM Semiconductor, Adafruit, STMicroelectronics, and Cirrus Logic Inc. that developers can use to efficiently and effectively add audio files to their embedded devices. It also includes some “tips and tricks” to help ensure a successful implementation.

Selecting an embedded audio format

Before diving into the details of how to integrate audio functionality into an embedded device, it’s useful to think through why the MP3 audio format is usually preferred. For an embedded system, there are actually three potential audio formats that could be used by developers: pulse code modulation (PCM), WAV, and MP3.

PCM is an uncompressed, lossless audio format that is often used by audio codecs to convert the digital representation of the audio stream into analog sound that the user hears. It’s a well supported standard format that dates back to CDs. PCM could be used in an embedded system, but the problem is that PCM files are typically much larger than WAV or MP3 files. In a resource constrained device where every nickel counts, a product may require a larger external memory device or a microcontroller with more memory in order to support it. For this reason, PCM is usually avoided unless the product is low volume, has only a single audio file, or is not cost constrained.

WAV files are also uncompressed and lossless, which make them very similar to PCM. While WAV files tend to be more popular than PCM files for embedded applications, they can also take up a considerable amount of space. They might be well suited if the embedded system already has an SD card or other large memory device.

For most systems, MP3 files are the preferred audio format. MP3 files are lossy, so some audio fidelity can be lost when the audio is encoded. However, they are significantly smaller than PCM or WAV, so they have shorter transfer and storage times to put the audio file(s) on the device, as well as smaller memory requirements.

Once it’s been decided that MP3 is the way to go, developers can choose to do it in hardware or software.

Hardware-based MP3 decoding

Often the quickest and easiest solution is to use a hardware MP3 decoder such as Adafruit’s 1681 VS1053B (Figure 1). The VS1053B can directly accept an MP3, WAV, OGG, or MIDI file format over a serial stream and decode it with little to no effort required from a developer. After decoding the stream, the VS1053B converts it to audio using an 18-bit digital-to-analog converter (DAC).

Diagram of Adafruit VS1053B hardware-based MP3 decoder chipFigure 1: The VS1053B from Adafruit is a hardware-based MP3 decoder chip that takes an audio stream and decodes it into the representative analog audio signal. This solution requires minimal software and doesn’t require a developer to understand how to decode or convert an MP3 file. (Image source: Adafruit)

What’s really interesting about the VS1053B is that it can also be debugged and controlled using a simple UART, versus many other decoders that use I2C. It also has eight general purpose input/output pins that can be used for application features such as reading bits or setting switches or status LEDs.

Developers looking to try out a hardware-based solution don’t necessarily have to create their own breakout board for the VS1053B. Adafruit provides the 1381 VS1053B codec + MicroSD breakout board.  Along with the VS1053B, the board has a MicroSD card slot that can be used to store audio files for decoding (Figure 2). The breakout board is designed to be connected to a microcontroller that would connect to the SD card via an SPI or SDIO port to read out the audio file. The audio file stream is then sent to the VS1053B for decoding. The output of the VL1053B can then be directed as needed, such as to a headphone jack or a speaker.

Image of 1381 VS1053B Codec + MicroSD breakout board from AdafruitFigure 2: The 1381 VS1053B Codec + MicroSD breakout board from Adafruit contains the necessary hardware to easily connect a microcontroller to play audio. The breakout board has an on-board MicroSD card slot that the microcontroller can read over SPI and then transfer that file to the VS1053B for decoding. (Image source: Adafruit)

Software-based MP3 decoding

A slightly more complex solution, but one that is often less costly from a bill of materials (BOM) standpoint, is to decode the MP3 file on the microcontroller and then stream the decoded file to an audio codec that generates the audio. In order to implement an efficient, software-based solution, a developer will need to implement several critical components such as:

  • An MP3 decoder library
  • A memory storage driver
  • A file system stack
  • A direct memory access (DMA) driver
  • An I2S driver
  • An I2C driver
  • An audio codec driver

At first glance, this seems like a lot of work for the software developer and a lot of potentially challenging software components to integrate to get the MP3 decoded and turned into audio. The best way to go about implementing an MP3 decoding solution is to leverage a microcontroller platform that supports audio encoding, decoding, and general processing.

While there are a lot of open source solutions that can be found on the internet, a professional, tried-and-true solution that developers can leverage is the STM32 toolchain. The STM32 microcontroller family has a development tool called STM32CubeMx which is integrated with their STM32CubeIDE that includes audio examples and development libraries. The examples and tools are part of an STM32CubeMX add-on plugin called X-CUBE-AUDIO. The plugin provides the audio libraries for MP3 decoding for any STM32 processor that is in the Arm Cortex-M4 class of microcontrollers.

Specifically, there are code project examples for creating an MP3 player that will run on an STM32F469IGH6TR microcontroller. The STM32F469IGH6TR is a very capable microcontroller that comes with 1 megabyte (Mbyte) of flash, 384 kilobytes (Kbytes) of RAM, and runs at 180 megahertz (MHz). The microcontroller comes in a 176-pin UBGA package that provides plenty of GPIO and peripheral features to accommodate nearly any application.

Diagram of STMicroelectronics STM32F469IGH6TR is a 180 MHz Arm Cortex-M4 processorFigure 3: The STM32F469IGH6TR is a 180 MHz Arm Cortex-M4 processor with 1 Mbyte of flash and 384 Kbytes of RAM. The 176-pin UBGA package provides plenty of GPIO for nearly any embedded application. (Image source: STMicroelectronics)

The MP3 player code example runs on the STM32F469I-DISCO Discovery kit (Figure 4). The STM32F469I-DISCO contains everything needed to decode and play MP3s. The board has a 4 inch, 800 x 480 pixel LCD that is used to update a developer on the MP3 demonstration state, along with player controls such as play, stop, next, and previous. The Discovery board also contains a headphone jack where the resulting audio is played in stereo. The only caveat with the example code is that it does require the MP3 files to be provided by an external source—specifically, a USB drive mass storage device that is connected through a micro USB connector.

Image of STMicroelectronics STM32F469I-DISCO Discovery kitFigure 4: The STM32F469I-DISCO Discovery kit has a 4 inch LCD that is used to operate the MP3 player demonstration. The audio files are provided by an external USB mass storage device through the onboard micro USB connector. It provides a working example on how to decode an MP3 file. (Image source: STMicroelectronics)

The MP3 decoding libraries do require an Arm Cortex-M4 or better processor, but as it turns out, running the demonstration code on the development board is a great way to not just see and experiment with a working example, but to also verify application performance. Using the serial wire debug (SWD) interface and the instrumentation trace macrocell (ITM) capabilities of the Arm core, it’s possible to perform a statistical analysis on the program counter to determine approximately how much processing power was being used to decode the MP3 files and play them. It turns out that nearly 50% of the CPU time can be spent updating the LCD display, while 10% or less was spent on the MP3 decoding. The STMicroelectronics audio libraries are very efficient, and they use DMA to push the decoded frames over I2S to an audio codec.

In the case of an application that does not require an LCD, but instead just needs to play audio based on other system events, a processor with fewer features can be used. For example, a developer might look at the STM32F469VGT6. The STM32F469VGT6 is still quite capable, with 1 Mbyte of flash and 384 Kbytes of RAM, all in a 100-pin LQFP. This part doesn’t use a BGA footprint, which can sometimes be intimidating for both developers and manufacturers.

Image of STMicroelectronics STM32F469VGT6 180 MHz processor with 1 Mbyte of FlashFigure 5: The STM32F469VGT6 is a 180 MHz processor with 1 Mbyte of flash and 384 Kbytes of RAM. The part is based on the Arm Cortex-M4 family, which is supported by the STMicroelectronics audio libraries. As shown, it comes in a 100-pin LQFP, making it less imposing for both developers and manufacturers. (Image source: STMicroelectronics)

Once a developer has selected and experimented with the solution that they believe will best fit their application, they need to decide how they will convert their decoded MP3 file from digital waveforms to analog sound.

Converting the audio stream to sound with a codec

Most hardware-based decoding solutions will also include a digital-to-analog converter (DAC) that can be used to convert the received digital file format into analog sound. However, these chips will often include an I2S output port that allows a developer to add their own audio codec. Software-based solutions will definitely need a codec to convert the decoded digital stream into audio. There are two ways that this can be done.

First, it is possible to take the digital audio and the microcontroller’s on-board DAC peripheral and generate the audio output. In general, this is not the best way to generate audio because it requires additional discrete components and careful analog circuit design and layout in order to achieve a quality output. It also requires a bit more setup on the microcontroller to get the DAC up and running, and then extra processor power is generally required to ensure the DAC is fed properly.

The second method, and the one that is generally recommended, is to use an integrated audio codec. Audio codecs are basically integrated circuits that have all the circuits for generating the analog output such as a DAC and Class D amplifiers. An audio codec has an advantage over a discrete solution in that it takes up very little board space and can also have digital circuits built in for controlling the audio output stream.

For example, the CS43L22-CNZ DAC from Cirrus Logic provides developers with a wide range of features such as:

  • DAC control through the I2C bus
  • Multiple outputs such as headphone and speaker
  • No external output filtering required
  • A digital signal processor engine for volume, bass, and treble control
  • Pop and click suppression

The CS43L22-CNZ receives a PCM encoded data stream over an I2S interface from the microcontroller, which it then converts using its internal DAC (Figure 6). The CS43L22-CNZ DAC can drive multiple outputs such as a speaker or a headphone. If a single, mono channel is used, the CS43L22-CNZ can output 2 watts of power to a speaker, or if stereo channels are used, up to 1 watt per channel.

Diagram of Cirrus Logic CS43L22-CNZ DAC audio DACFigure 6: The CS43L22-CNZ DAC is an audio DAC that can output up to 2 watts through a mono output or 1 watt per channel for stereo audio. The DAC has a digital signal processing engine that provides easy control for volume, bass, and treble control. (Image source: Cirrus Logic)

Some developers may not need all the features of the CS43L22-CNZ and may be able to save some BOM cost by going more minimalistic.

This, of course, depends on the application’s requirements, but one good example of such an approach is the AK4637EN audio codec from AKM (Figure 7). This is a 24-bit mono channel codec that has an output DAC for a speaker only. This codec also includes a microphone amplifier, so it can be used to also record audio if the application calls for it.

Diagram of AKM Semiconductor AK4637EN audio DAC in a small 20-pin QFN package (click to enlarge)Figure 7: The AK4637EN is an audio DAC in a small 20-pin QFN package that outputs a single mono audio channel at up to 1 watt. The codec can be controlled digitally through the I2C bus to manage the output volume and automatic output control. (Image source: AKM Semiconductor)

Just like most audio codecs, the AK4637EN also has an I2S interface to receive the digital audio signal from the microcontroller. The chip also contains an I2C interface that is used to control onboard digital features such as volume control.

As with any product features, developers need to take the time to carefully review the requirements they have for their system and balance the codec features and costs with their target BOM costs.

Tips and tricks for implementing an MP3 solution

Here are a number of “tips and tricks” that developers can use when selecting the appropriate solution for their application:

  • Perform a BOM cost analysis at expected volumes between an external MP3 decoder versus a more capable microcontroller that can run an MP3 decoder by itself. Make sure to use pessimistic, attainable, and optimistic volume figures to establish a range for better decision making.
  • Use an audio codec that accepts I2S to generate the output audio. Discrete solutions can be more time consuming to tune and the component costs can be equivalent.
  • Perform a performance analysis on MP3 software libraries using a development board to understand the minimum microcontroller characteristics necessary to run the solution.
  • Leverage DMA channels to transfer decoded MP3 frames to the audio codec over an I2S interface. This will allow a less expensive processor to be used.
  • Carefully review any MP3 software library licenses to ensure that they can be used with a commercial product. Most open source libraries require a paid license for commercial products unless they are provided by the chip vendor.

Following these tips will help ensure that developers select the right audio solution for their embedded application.

Conclusion

Adding audio to an embedded system may once have been a complex endeavor, but as shown, developers today have a wide range of solutions to choose from. These range from dedicated external codecs to integrated software libraries. Still, developers need to carefully evaluate their application needs and determine which solution path makes the most sense.

Factors to consider include BOM, solution complexity, development and integration time and cost, along with solution scalability. Once these factors are weighed against products volumes, target cost, and development schedule, the solution that fits best will become clear.

Disclaimer: The opinions, beliefs, and viewpoints expressed by the various authors and/or forum participants on this website do not necessarily reflect the opinions, beliefs, and viewpoints of Digi-Key Electronics or official policies of Digi-Key Electronics.

About this author

Jacob Beningo

Jacob Beningo is an embedded software consultant. He has published more than 200 articles on embedded software development techniques, is a sought-after speaker and technical trainer, and holds three degrees, including a Masters of Engineering from the University of Michigan.

About this publisher

Digi-Key's North American Editors