How to use multithreading or multi-core to design digital audio system
Time:2022-10-15
Views:1684
If your MCU application needs to process digital audio, consider using a multi-threaded approach. Using a multithreaded design approach enables designers to reuse their design parts in a direct manner.
Multi-core and multithreading are effective methods to design real-time systems. Using these technologies, the system is designed as a collection of many tasks that run independently and communicate with each other when needed. Decomposing system design from large monolithic code blocks to more manageable tasks can greatly simplify system design and speed up product development. Therefore, the real-time attributes of the entire system are easier to understand. The designer only needs to worry about the fidelity of each task implementation and propose such questions as "Is the network protocol implemented correctly?" And so on.
In this article, we will discuss how to use multithreading or multi-core design methods to design real-time systems that operate on data streams, such as digital audio systems. We use several digital audio systems to illustrate the design method, including asynchronous USB audio 2, Ethernet AVB and digital base of MP3 player. Before showing how to effectively use multi-core and multithreading to design the required buffer and clock schemes, we will briefly discuss the concepts of digital audio, multi-core and multithreading.
Digital Audio
In many consumer markets, digital audio has replaced analog audio for two reasons. First, most audio sources are digital. Whether delivered in lossy compressed format (MP3) or uncompressed format (CD), digital standards have replaced traditional analog standards, such as tape and tape. Secondly, digital audio is easier to process than analog audio. Data can be transmitted through existing standards (such as IP or USB) without loss, and hardware design does not require any "magic" to reduce background noise. As far as the digital path is concerned, the background noise is constant and is not affected by TDMA noise that may be caused by mobile phones.
The digital audio system operates on the sample stream. Each sample represents the amplitude of one or more audio channels at a certain time point, and the time between samples is controlled by the sampling rate. The CD standard has two channels (left and right) and uses a 44.1 kHz sampling rate. Common audio standards use 2, 6 (5.1), and 8 (7.1) channels, and 44.1 kHz, 48 kHz, or multiples of the sampling rate. We use 48 kHz as an example, but this is by no means the only standard.
Multicore and multithreading
In the multithreading design method, the system is represented as a collection of concurrent tasks. Using concurrent tasks instead of a single program has several advantages:
Multitasking is a good way to support separation of concerns, which is one of the most important aspects of software engineering. Separation of concerns means that different tasks of design can be designed, implemented, tested and verified separately. Once the interaction between tasks is specified, the team or individual can complete their own tasks.
Concurrent tasks provide a simple framework to specify what the system should do. For example, the digital audio system will play audio samples received through the network interface. In other words, the system should perform two tasks at the same time: receiving data from the network interface and playing samples on its audio interface. It is confusing to represent these two tasks as a single sequential task.
A system represented as a set of concurrent tasks can be implemented by a set of threads in one or more multithreaded kernels (see Figure 1). We assume that threads are scheduled at the instruction level, as is the case on the XMOS XCore processor, because this enables concurrent tasks to run in real time. Note that this is different from multithreading on Linux, for example, threads are scheduled on a single processor with context switching. This may make these threads appear concurrent to humans, but not to a group of real-time devices.
The concurrent tasks are logically designed to communicate through message passing. When two tasks are implemented by two threads, they communicate through sending data and control channels. Inside the kernel, channel communication is performed by the kernel itself. When threads are located on different cores, channel communication is performed through the switch (see Figure 2).
Multithread design has been used by embedded system designers for decades. In order to implement embedded systems, system designers used to use a large number of microcontrollers. For example, inside a music player, you might find three microcontrollers that control flash memory, DAC, and MP3 decoder chips.
Figure 1: Threads, channels, cores, switches, and links. Concurrent threads communicate through channels within the kernel, between cores on a chip, or between cores on different chips.
We believe that a modern multithreaded environment can replace this design strategy. A single multithread chip can replace multiple MCUs and provide an integrated communication model between tasks. The system does not need toimplement custom communication between tasks on a separate MCU, but is implemented as a group of threads communicating through channels.
Using a multithreaded design approach enables designers to reuse their design parts in a direct manner. In traditional software engineering, functions and modules are combined to perform complex tasks. However, this method may not be suitable for real-time environments, because executing two functions in turn may destroy the real-time requirements of functions or modules.
In an ideal multithreaded environment, the combination of real-time tasks is negligible, because it is just a case of adding a thread (or core) to each new real-time task. In fact, designers will limit the number of kernels (for example, for economic reasons), so they must decide which tasks will form concurrent threads and which tasks will be integrated into a single thread as a collection.
Multithreaded digital audio
A digital audio system can be easily split into multiple threads, including a network protocol stack thread, a clock recovery thread, an audio transmission thread, and optional threads for DSP, device upgrade, and driver authentication. The network protocol stack can be as complex as the Ethernet/IP stack and contain multiple concurrent tasks, or as simple as the S/PDIF receiver.
Figure 2: Physical avatar of a three core system with 24 concurrent threads. The top equipment has two cores and the bottom equipment has one core.
We assume that threads in the system communicate by sending data samples through channels. In this design method, it is not important whether threads are executed on a single core or multi-core system, because multi-core only adds scalability to the design. We assume that the calculation requirements of each thread can be established statically and do not depend on data, which is usually the case of uncompressed audio.
We will focus on two parts of the design: buffering between threads (and their impact on performance) and clock recovery. Once these design decisions are made, the internal implementation of each thread follows normal software engineering principles and is as difficult or easy as expected. Buffering and clock recovery are interesting because they both have a qualitative impact on the user experience (promoting stable low latency audio) and are easy to understand in a multithreaded programming environment.
buffer
In digital solutions, data samples are not necessarily transmitted at the time of delivery. This requires buffering of digital audio. For example, consider a USB 2.0 speaker with a sample rate of 48 kHz. The USB layer will transmit bursts of six samples in every 125 µ s window. There is no guarantee that six samples will be transmitted in the 125 µ s window, so a buffer of at least 12 samples is required to ensure that samples can be streamed to speakers in real time.
The design challenge is to build the right amount of cushioning. In analog systems, buffering is not a problem. The signal is transmitted on time. In digital systems designed based on non real time operating systems, programmers usually insist on using large buffers (250 or 1000 samples) to deal with the uncertainty in scheduling strategies. However, large buffers are expensive in terms of memory, increasing latency, and proving that they are large enough to ensure clickless delivery.
Multithreaded design provides a good framework to infer buffers informally and formally and avoid unnecessary large buffers. For example, consider that the above USB speakers add an ambient noise correction system. The system will include the following threads:
The thread that receives USB samples over the network.
A series of 10 or more threads that filter the sample stream, each with a different set of coefficients.
The thread that uses I 2 S to transmit the filtered output samples to the stereo codec.
The thread that reads samples from a codec connected to the microphone sampling ambient noise.
Threads that resample ambient noise to an 8 kHz sampling rate.
The thread to establish the spectral characteristics of environmental noise.
The thread that changes the filter coefficients based on the calculated spectral characteristics.
All threads will run on a multiple of the 48 kHz base cycle. For example, each filtering thread will filter one sample every 48 kHz cycle; The delivery thread will deliver one sample per cycle. Each thread also has a defined window, on which it operates, and a defined method to advance the window. For example, if our filter thread is implemented using biquadratic, it will run on a window containing three samples, one sample ahead of each cycle. The spectrum thread can run on 256 sample windows that push 64 samples for every 64 samples (to execute FFT (Fast Fourier Transform)).
Now it is possible to establish all the parts of the system running in the same cycle and connect them together in the form of synchronous parts. Buffers are not required within these synchronized parts, although a single buffer is required if the thread is to run in the pipeline. A buffer is required between the synchronized parts. In our example, we finally get three parts:
The part that receives samples from USB, filters and transmits them at 48 kHz.
The portion of the ambient noise sampled at 48 kHz and transmitted at 8 kHz.
The part that establishes the spectral characteristics and changes the filter settings at 125 Hz.
The three parts are shown in Figure 3. The first part of receiving samples from the USB buffer needs to buffer 12 stereo samples.
Figure 3: Threads grouped together by frequency.
The transmitted part needs to buffer a stereo sample. Running 10 filter threads as pipes requires 11 buffers. This means that the total delay from the receiver to the codec includes 24 sampling times, i.e. 500 µ s, and an additional sample can be added to cope with the intermediate jitter in the clock recovery algorithm. This part operates at 48 kHz.
The second part of sampling the ambient noise needs to store one sample at the input and six samples in the secondary sampling. Therefore, there are seven sample delays at 48 kHz or 145 µ s.
The third part of establishing spectrum characteristics needs to store 256 samples at a sampling rate of 8 kHz. No additional buffers are required. Therefore, the delay between ambient noise and filter correction is 256 samples at 8 kHz, and the second sampling time is 145 µ s, or just over 32 ms. Please note that these are the minimum buffer sizes of the algorithm we choose to use; If this delay is unacceptable, a different algorithm must be selected.
It is usually easy to design threads to operate on blocks of data rather than individual samples, but this increases the overall latency experienced, memory requirements, and complexity. This should only be considered when there are obvious benefits, such as increased throughput.
Timed Digital Audio
A big difference between digital audio and analog audio is that analog audio is based on this basic sampling rate, while digital audio needs to distribute clock signals to all parts of the system. Although all components can use different sampling rates (for example, some parts of the system may use 48 kHz, while others may use 96 kHz, with a sampling rate converter in the middle), all components should agree on the length of one second, and therefore on the basis of the measured frequency.
An interesting feature of digital audio is that all threads in the system are independent of the base of this clock frequency. Suppose there is a gold standard base frequency. It is not important whether multiple cores in the system use different crystals, as long as they operate on the samples. However, at the edge of the system, the true clock frequency is important, as is the delay in sampling on the way.
In a multithreaded environment, a thread will be set aside to explicitly measure the real clock frequency, implement the clock recovery algorithm, measure the local clock and the global clock, and agree with the master clock on the clock offset.
The clock may be implicitly measured using the underlying bit rate of the interconnect, such as S/PDIF or ADAT. Measuring the number of bits per second on any of the networks will give the measured value of the master clock. The clock can be measured explicitly using protocols designed for this purpose, such as PTP over Ethernet.
In the clock recovery thread, a control loop can be implemented, which estimates the clock frequency and adjusts according to the observed error. In the simplest form, the error is used as an indicator for adjusting the frequency, but the filter can be used to reduce jitter. This software thread implements the functions traditionally executed by PLL but in software, so it can adapt to the environment cheaply.
conclusion
The multithreading development method enables the digital audio system to be developed using the divide and conquer method. One of the problems is divided into a group of concurrent tasks, and each task is executed in a separate thread on the multithreaded kernel.
Like many real-time systems, digital audio is suitable for the multithreading design method, because the digital audio system obviously consists of a group of data processing tasks, and these tasks also need to be executed at the same time
Disclaimer: This article is transferred from other platforms and does not represent the views and positions of this site. If there is any infringement or objection, please contact us to delete it. thank you! |