Use a Low-Cost Module and MicroPython to Quickly Build AI-Based Vision and Hearing Devices

By Stephen Evanczuk

Contributed By Digi-Key's North American Editors

To meet growing desire for smart connected products, developers are turning increasingly to artificial intelligence (AI) methods such as machine learning (ML). However, developers often face a difficult hardware choice. They can base their designs on general purpose processors that are cost-effective but lack sufficient performance for complex algorithms, or they can use specialized solutions that deliver high performance but increase design cost and complexity.

Those are no longer the only options. This article describes a simpler, cost-effective alternative from Seeed Technology that lets developers deploy high-performance AI-based solutions using the familiar MicroPython programming language.

Machine learning approaches

The success of ML algorithms has attracted attention from developers looking for more effective approaches for object detection and speech recognition in a wide range of applications. Among these algorithms, convolutional neural networks (CNNs) have demonstrated the kind of highly accurate recognition required in machine vision and hearing applications. As a result, CNNs and similar deep neural networks (DNNs) have found growing application in personal electronics, wearables, and Internet of Things (IoT) designs.

For applications with modest CNN inference requirements, developers can implement successful solutions using neural network software libraries running on general purpose processors with single instruction, multiple data (SIMD) architectures and digital signal processing (DSP) extensions (see, “Build a Machine Learning Application with a Raspberry Pi”).

For more demanding requirements, developers can build more powerful CNN-based designs using field programmable gate arrays (FPGAs) that embed high-performance DSP blocks capable of accelerating ML algorithms (see, “Use FPGAs to Build High-Performance Embedded Vision Applications with Machine Learning”).

Processor-based ML designs are typically simpler to implement than the FPGA-based ML designs, but the mathematical complexity of CNN models typically slows inference of processor-based solutions. FPGA-based solutions can use hardware implementation of key processing steps to speed inference performance, but development requirements can make it difficult to implement optimized solutions rapidly.

With the availability of the Seeed Technology Sipeed MAIX-I 114991684 module, developers have an alternative solution able to speed deployment of high-performance CNN inference solutions in smart products and edge computing devices.

High-performance CNN processing module

The MAIX-1 module combines a high-performance dual-core processor, an Espressif Systems ESP8285 Wi-Fi microcontroller, a Winbond W25Q128FW 128 megabit (Mbit) serial flash memory, a voltage regulator, and an IPEX antenna connector. A wireless version, the Sipeed MAIX-I 114991695 module, comes without the ESP8285 microcontroller. Designed to accelerate a wide range of application workloads, the dual-core processor integrates a pair of 64-bit RISC-V processors with floating-point units (FPUs) and accelerators for CNN models, audio processing, cryptography, and fast Fourier transform (FFT) calculation (Figure 1).

Diagram of Seeed Technology MAIX-I moduleFigure 1: The Seeed Technology MAIX-I module combines a wireless microcontroller, flash memory, a DC voltage regulator, and an IPEX antenna connector with a high-performance dual-core processor and accelerators for convolutional neural network (CNN) processing and other functions. (Image source: Seeed Technology)

Along with 8 megabytes (Mbytes) of static random access memory (SRAM) and 128 kilobits (Kbits) of one-time programmable (OTP) memory, the dual-core processor integrates a comprehensive set of interfaces including a liquid crystal display (LCD) port and a digital video port (DVP) for a video camera. Developers can use the processor’s field programmable IO array (FPIOA) multiplexer to map 255 internal functions to the available 48 general purpose IO (GPIO) ports.

The processor’s accelerators and integrated functionality support a range of requirements for typical smart product designs. For example, the audio processor unit (APU) supports up to eight microphones and includes its own dedicated 512 point FFT accelerator. Using these APU capabilities alone, developers can efficiently use microphone arrays to implement audio beamforming directional pickup used in speech interfaces for smart products. For speech interface capabilities such as key phrase wakeup, developers can use pre-processed audio output from the APU to drive the processor’s integrated CNN accelerator.

CNN accelerator

For all its capabilities, the most distinguishing feature of the Sipeed MAIX-I module lies in the CNN accelerator integrated in the module’s dual-core processor. Designed to speed processing of the individual kernel functions underlying CNNs, the neural network processor, here referred to as the “KPU” (kernel processing unit), provides hardware implementations of convolution, batch normalization, activation, and pooling kernel functions that comprise the individual layers of CNN models (see, “Get Started with Machine Learning Using Readily Available Hardware and Software”).

With these capabilities, developers can implement low-power designs that use CNNs to recognize speech activation phrases in audio interfaces or detect and classify objects in vision-based applications. In fact, the KPU can use the processor’s integrated SRAM to perform real-time inference using fixed-point CNN inference models as large as 5.9 Mbytes, or pre-quantized floating-point models as large as 11.8 Mbytes. In machine vision applications, for example, the KPU performs inference at better than 30 frames per second using the type of relatively small image frames used for face or object detection in smart products. For non-real-time applications, developers can use external flash to handle model sizes, limited only by flash capacity.

Internally, the KPU executes inference models using a first-in, first-out (FIFO) buffer to sequentially process each layer of a typical CNN model (Figure 2, top). For each layer, the KPU reads model parameters and data from its on-chip SRAM or external flash and executes that layer’s kernel function using the associated accelerated kernel function (Figure 2, bottom). Built into this layer processing pipeline, a callback mechanism allows developers to execute their own routines as KPU hardware completes each processing sequence.

Diagram of performing inference, the full KPU task comprises multiple layersFigure 2: When performing inference, the full KPU task (top) comprises multiple layers, each involving execution of the appropriate kernel functions (bottom). (Image source: Seeed Technology)

Development platforms

The KPU abstracts the complexity of CNN algorithm execution behind its dedicated hardware. For developers, Seeed eases the complexity of CNN-based development with a combination of hardware offerings and software packages. Along with the MAIX-I module, developers can quickly evaluate and develop MAIX-I-based designs using Seeed board-level products that provide increasing functionality.

At the base level, the Seeed 110991188 development kit combines a MAIX-I module mounted on a baseboard with a 2.4 inch LCD and a Seeed 114991881 OV2640 fisheye camera. The Seeed 110991189 kit provides the same features with the non-Wi-Fi version of the MAIX-I module.

For prototype development, the Seeed 102991150 Bit evaluation board mounts a MAIX-I module on a board designed specifically for breadboarding. The Seeed Technology 110991190 MAIX-I Bit kit combines the Bit evaluation board, a 2.4 inch display, an OV2640 camera, and a pair of headers for connecting the Bit board to a breadboard.

For development of more complex applications, the Seeed 110991191 Sipeed MAIX Go board kit combines the MAIX-I module with an STMicroelectronics STM32F103C8 microcontroller, a camera, an I2S microphone, a speaker, lithium battery management, a MicroSD slot, and multiple interface connectors (Figure 3). By attaching the included 2.8 inch LCD to the back of the board, developers can effectively use the kit as the platform for an AI-driven digital video system.

Image of Seeed Technology Sipeed MAIX Go boardFigure 3: One of a series of Seeed Technology MAIX boards, the Sipeed MAIX Go board combines the MAIX-I module with an STMicroelectronics STM32F103C8 microcontroller, a camera, a display, and multiple interfaces to provide a standalone imaging system for object recognition. (Image source: Seeed Technology)

The boards provide a standalone solution for many smart product requirements, and their support for MicroPython makes them easy to use. Using the combination of the Seeed Sipeed boards and MicroPython, developers can take advantage of a simpler approach for developing AI based smart products.

Rapid development with MicroPython

MicroPython was created to provide an optimized subset of the Python programming language for resource-constrained microcontrollers. With its direct support for hardware access, MicroPython brings the relative simplicity of Python based development to embedded system software development.

Instead of C libraries, developers use the familiar Python import mechanism to load required libraries. For example, developers simply import the MicroPython machine module to gain access to a microcontroller’s I2C interface, timers, and more. For designs using image sensors, developers can capture an image by importing the sensor module and calling sensor.snapshot(), which returns a frame from the image sensor.

Seeed’s MaixPy project extends MicroPython with support for the dual-core K210 processor at the heart of the MAIX-I module and associated development boards. Running on the MAIX-I module’s K210 processor, the MaixPy MicroPython interpreter uses MicroPython features and specialized MaixPy modules such as the MaixPy KPU module, which encapsulates the processor’s KPU functionality.

Developers can use MaixPy and the KPU module to easily deploy a CNN inference. In fact, the Seeed MaixHub model library provides a number of pretrained CNN models to help developers get started with the MAIX-I module. To download these models, developers need to provide a machine ID available by running an ID generator utility on the MAIX board.

For example, using the Seeed Sipeed MAIX Go kit with the LCD attached, developers can load a pretrained model for face detection. Performing inference with the model requires only a few lines of Python code (Listing 1).

Copy
import sensor
import image
import lcd
import KPU as kpu
 
lcd.init()
sensor.reset()
sensor.set_pixformat(sensor.RGB565)
sensor.set_framesize(sensor.QVGA)
sensor.run(1)
task = kpu.load(0x300000) # you need put model(face.kfpkg) in flash at address 0x300000
# task = kpu.load("/sd/face.kmodel")
anchor = (1.889, 2.5245, 2.9465, 3.94056, 3.99987, 5.3658, 5.155437, 6.92275, 6.718375, 9.01025)
a = kpu.init_yolo2(task, 0.5, 0.3, 5, anchor)
while(True):
    img = sensor.snapshot()
    code = kpu.run_yolo2(task, img)
    if code:
        for i in code:
            print(i)
            a = img.draw_rectangle(i.rect())
    a = lcd.display(img)
a = kpu.deinit(task)

Listing 1: Developers need only a few lines of MicroPython to implement inference using a flash resident neural network model. (Code source: Seeed Technology)

The pretrained model implements a type of CNN called a Yolo (you only look once) model, which speeds inference by using an entire image during training and inference rather than using a series of sliding windows underlying earlier CNN algorithms. Further optimizations of Yolo are embodied in the “Tiny Yolo2” model provided in the MaixHub model library. The result is a high-performance model that allows real-time face detection on the MAIX Go (Figure 4).

Image of real-time face detectionFigure 4: Using the Sipeeed MAIX Go board, developers can quickly explore real-time face detection built with a special pretrained CNN inference model. (Image source: Seeed Technology)

Of course, inference is only the deployment stage of the complex DNN model development process, and the apparent simplicity of this example can mask the challenges involved in implementing an effective model.

To develop a custom model, developers must acquire a sufficiently large set of samples for model training. For training developers use a deep-learning framework such as TensorFlow to configure a model and perform training using the training data.

Although those steps can be deceptively taxing, the MAIX-I ecosystem makes inference deployment relatively straightforward.

Seeed provides converters that let developers convert models developed in TensorFlow, Keras, or Darknet to the KPU’s special kmodel format. As with pretrained models downloaded from the MaixHub model library, developers can upload their custom models to the MAIX-I module and evaluate their performance with MicroPython as noted above.

Seeed also provides software development kits (SDKs) for creating custom software applications in the C programming language. Separate SDKs support standalone C applications or C applications built on the RTOS real-time operating system.

Conclusion

The rapid acceptance of image and speech-based interfaces for smart products continues to drive interest in using machine learning algorithms in the resource-constrained designs underlying those products. In the past, developers found few effective options for solutions that are simple to implement and powerful enough to provide real-time machine learning functionality.

As shown, using the MAIX-I module and associated boards from Seeed Technology, developers can rapidly deploy inference models on a hardware platform capable of delivering real-time recognition of speech or objects from audio or video streaming data.

Disclaimer: The opinions, beliefs, and viewpoints expressed by the various authors and/or forum participants on this website do not necessarily reflect the opinions, beliefs, and viewpoints of Digi-Key Electronics or official policies of Digi-Key Electronics.

About this author

Stephen Evanczuk

Stephen Evanczuk has more than 20 years of experience writing for and about the electronics industry on a wide range of topics including hardware, software, systems, and applications including the IoT. He received his Ph.D. in neuroscience on neuronal networks and worked in the aerospace industry on massively distributed secure systems and algorithm acceleration methods. Currently, when he's not writing articles on technology and engineering, he's working on applications of deep learning to recognition and recommendation systems.

About this publisher

Digi-Key's North American Editors