The Move From Cloud To Real Time Embedded Machine Learning

The Move From Cloud To Real Time Embedded Machine Learning

Cloud To Local Embedded System Processing

Poor connectivity, data quantity, real time decision making requirements and security concerns means cloud computing for many customers is not an option. However moving expensive, power hungry computing processing hardware to the location of the application is also not an option.

A general requirement is an embedded type system which might be similar in size to a standard Smart Phone. General requirements are often, low power, a reasonable level of performance and near real time processing.

Our Lessons

In the move from Cloud Machine Learning to Embedded Solutions we have encountered 4 key elements, which have proved highly critical to our products.

  1. The need for focused applications, with narrow clear objectives
  2. The ability to update Algorithms and Parameters allowing a continuously evolving, improving system
  3. Automatic Embedded Code Generation System to speed up the development process
  4. Well designed customized SOC/Board design For Real Time Processing

Focused Application

Generally the requirements for Machine Learning applications are quite focussed with narrow application specific aims. They vary from Heart problem detection, to the prediction of hardware failures. The key is to clearly define your requirements and produce an algorithm which performs well for those requirements.

This alone goes a long way to enabling them to be carried out on embedded systems. Even with this reduction in complexity, they often remain beyond most standard SOC designs.

Common solutions are:-

  1. GPU Accelerated Systems
  2. Multi-core DSP Solutions With Instruction Customization
  3. Custom FPGA Designs
  4. Custom LSI

Training And Application

Training of Machine Learning algorithms remains an off line task. Embedded systems simply cannot match the processing power of common cloud servers with GPU or FPGA co-processors. The embedded system as a result simply implements the trained network.

System Updates

Updating the system can take two forms:-

  1. Replacing the entire firmware (entire algorithm)
  2. Updating the Machine Learning Algorithm Parameters

The ability to update the system, as training data quantity and quality increases is vital. There is the possibility of better tuned algorithms and or even a change in the algorithms.

With such possibilities, care must be taken when designing customer SOCs. There is the risk the SOC is so highly customized, future upgrades, are no longer able to take advantage of the custom hardware.

Automatic Embedded Source Code Generation

The starting point for many Machine Learning algorithm development projects are Open Source frameworks such as Chainer, Tensor Flow and Caffe. These frameworks are written in Python, which makes them highly flexible and relatively easy to use. The underlying libraries are written in C or C++ to ensure good performance.

A significant number of companies now have developed software which automatically converts code from these frameworks to RTML, C or whichever language is appropriate for their specific platform.

However with standard embedded systems, performance can be poor and power requirements are excessive.

Customized Embedded Board/SOC Designs

There are currently about four solutions.

  1. GPU Accelerated SOCs
  2. Multi-core DSP Solutions With Instruction Customization
  3. Customer FPGA Designs
  4. Customer LSI

We are currently focused on the first 3 solutions.

GPU Solution

GPU solutions are known for their high performance, but at the expense of high power requirements. Companies such as NVIDIA are making significant progress in this respect, making embedded SOCs with significant capabilities for Deep Learning applications.

For systems which have a reliable power supply or batteries of significant size with regular recharging (mobile robot) and were decision making is in the order of a fraction of a second, they are potentially practical.

Multi-Core DSP Solutions With Instruction Customization

A common solution is the use of SOCs with multiple DPS cores. We currently focus on the use of Tensilica Cores, because they are highly configurable, with the potential for high performance and low power. Examples of configurability include:-

  1. Configurable bus widths
  2. Configurable VLWI instructions (32->128 bit). Enabling the execution of multiple instructions in a single cycle
  3. Custom Instructions Including SIMD and MIMD
  4. High speed buses allowing direct connectivity between cores and Memory
  5. DSP core options
  6. Special DMA technologies such as Supergather

The use of multicore solutions allows us to achieve good performance and when combined with a general purpose process cores such as an ARM 9 core, application development can be relatively straightforward.

Current applications are focussed on Machine Learning Algorithms and RNN systems.

FPGA Solutions

Another solution is the FPGA. They are relatively low powered and it is possible to get good performance with respect to power. Building an FPGA design from scratch can be problematic. Recently many companies use tools to autogenerate C or RTML from the various Python toolsets, so this has been greatly simplified.

Many practical application require preprocessing of input data, for example filters, FFT, Wavelet transforms, so integrating various IP can be relatively simple.

Dr Paul Anthony Creaser

Multi-domain Researcher, Engineer etc… From research and development to practical solutions, covering AI, algorithms, software, embedded firmware, cloud, devops, mlops, to hardware and sensors …

5y

It is 2020 and all this is becoming ubiquitous.

Like
Reply

To view or add a comment, sign in

More articles by Dr Paul Anthony Creaser

  • A Smarter Coffee Machine

    Like many companies, we a have a number of coffee machines at work. They do expressos, lattes, cappuccino etc.

    1 Comment
  • Bringing Old Elevators Into the Age of IOT

    As the Executive Technology Mentor, my tasks include mentoring (of course) via internal seminars, and R&D for future…

  • Gas MEMs Sensors

    Last year I read a number of articles and workshops which covered using gas sensors for substances classification…

  • Sound Detection System For Hard of Hearing

    Like many, I have an interest in the useful applications of AI. Last year, my father-in-law lost is wife to cancer.

  • Automated AI and ML Service For Sound Classification (WIP)

    AI Audio Service For the past 6 months I have been looking into and developing a service to automate Audio sensor AI…

  • Building AI Capabilities In A Company From Scratch (In Progress)

    Why? Why would you want to convince your company adding AI in-house capabilities is a good thing? Personally, my…

  • Multi Protocol Wireless Gateway

    Ever since our initial entry into the IOT market, we have been using multi-protocol IOT networks for B2B applications…

  • Venture meaning adVenture

    My current company regards itself as a Venture company. The key word is adVenture.

    1 Comment
  • Custom Tensilica Cores For Neural Networks

    We currently design, profile and test custom cores for neural networks. A basic design includes an ARM CPU for basic…

  • Thoughts For Practical Neural Networks On Embedded Systems

    One area I am currently focused on, is the use of Neural Networks in Embedded systems which make use of sensor data…

Others also viewed

Explore content categories