Embedded Linux boot time optimization for a custom application

Embedded systems have been a popular topic for a long time, therefore we decided to share some insight into one of our applications. According to Wikipedia, an embedded system is a computer system – a combination of a computer processor, computer memory, and input/output peripheral devices – that has a dedicated function within a larger mechanical or electrical system.

This blog post aims to describe the process of developing a simple Python application to control the hardware, along with optimization of the boot time. For all of this, we used a build system – Yocto. It allows building a fully functional system, including bootloader, DTB (device tree), Linux kernel, root file system (with required apps) and a custom app as a service. 

The boot time here is understood as the time from plugging the power connector to observing a blinking LED. Boot process includes (in short):

  • hardware delay: voltage regulators, capacitors, reset supervisory circuit
  • booting ROM Code (inside the processor)
  • bootloader (U-Boot) loading and initialization
  • Linux kernel image loading into memory
  • Linux kernel booting
  • systemd booting – system and service manager
  • custom application loading time (especially for languages like Python)

Some of these times can be observed from system logs, but to get a full picture one needs an oscilloscope or logic analyzer connected to pins (UART TX, 3V3, LED).

The hardware that we used:

Toradex provides a Yocto build process, with all BSPs (Board Support Package) layers and a lot of own tunings and support, so it was our starting point. Of the top of that, we added a custom layer. We placed also our demo application there, because it is really small, but ideally (according to the Yocto spirit) we should have downloaded our application from a git repository during the build.

Toradex build provides a lot of functionality with multiple utilities. It is really nice, as it is a ready-to-go system for most applications and development. But if you have a custom application, we specific requirements, you can strip a lot of functionality, almost on each stage of the development. Also for some applications – like our using CAN Bus, you need to modify the DTS (Device Tree Source).

Our application that we developed controls a device via CAN Bus interface, but for simplicity, this blog post describes only control of the GPIOs. The process is quite similar, but definitely more approachable for a single article. Anyway, we would note the difference between GPIO and CAN control in some places. 

Default Console Distribution (at the moment of development 3.0b3) uses systemd as the system and service manager. It is one of the most popular managers, providing great flexibility, but unfortunately, it is also slow… But anyway, we decided to go with it, as our requirement. For the application layer, we decided to go with Python.

Our optimization and application include changes in multiple components of the build process, described in the following paragraphs.

  • Linux kernel modules selection

One of the most popular tools to configure the kernel is menuconfig. To run it in the bitbake environment run:

bitbake linux-toradex -c menuconfig -f

It brings a new terminal, where one can get rid of (or include) some modules.

But the changes made with menuconfig are applied only in the build directory. One can prepare a patch from the changes or take the whole .config file. The first approach seems to be cleaner, but for simplicity, we went with the latter, and copied the file to our layer – in an appropriate directory, as layers/meta-canboard/recipes-kernel/linux/linux-toradex/colibri-imx7-emmc/defconfig

  • Device Tree (DTS) modification

The device tree allows the kernel to know the hardware. It is written in a readable form (in a file called DTS) and then compiled to a binary file for a specific kernel version (DTB).
Toradex provides a DTB file for its evaluation board, with a lot of additional hardware and a standard configuration of pins (according to Toradex standard). Fortunately, the DTS files are created as a composition, allowing to include base files. E.g. processor manufacturer (here NXP) may prepare a basic DTS for the chip, Toradex can provide next DTS layer for its COM, and one can include all of this and add only peripherals required for a custom board.

DTB files are created during Linux build, according to a Makefile in /arch/arm/boot/dts. So to add a custom DTS in Yocto, one can patch the kernel – here just include the new file. To achieve this, one needs to create linux-toradex_%.bbappend file, as well as the DTS file. See its content for our GPIO demo below:

#include "imx7d-colibri-emmc.dtsi"
/ {
    model = "Toradex Colibri iMX7D on Colibri with CAN";
compatible = "toradex,colibri-imx7d-emmc-eval-v3",
     "toradex,colibri-imx7d-emmc", "fsl,imx7d";
    chosen {
        stdout-path = "serial0:115200n8";
&iomuxc {
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_gpio1 &pinctrl_gpio2 &pinctrl_gpio3 &pinctrl_gpio7
/* &pinctrl_gpio4 same pins as pinctrl_flexcan2. pinctrl_gpio7 same as pinctrl_flexcan1. Therefore removed above */
&uart1 {
status = "okay";

But for CAN Bus application we needed to enable 2 CAN interfaces (and disable appropriate GPIOs).

  • Removing unnecessary DISTRO-, MACHINE- and IMAGE_FEATURES

The default Toradex image includes a lot of excessive modules (for our application). We definitely don’t need a Bluetooth stack nor X11 server. See example snippet below:

MACHINE_FEATURES_remove = "bluetooth"
MACHINE_FEATURES_remove = "touchscreen"
DISTRO_FEATURES_remove = "irda"
DISTRO_FEATURES_remove = "opengl"

One can also use default images from poky, e.g. core-image-minimal.

  • Removing unnecessary application and utils, and installing a custom application

Toradex provides also a lot of utilities and tools, but we can strip them in the final product, to save the rootfs space. To achieve this, one can remove these applications from IMAGE_INSTALL.

But we also need to include some new dependencies – Python3, required python3 libraries, and our custom application:

    python3 \
    python3-gpio \
    gpio-demo \
  • U-Boot optimization

The most visible optimization here is just changing of the CONFIG_BOOTDELAY to 0. It is the time how long the U-Boot allows to enter the boot menu before moving to the default boot command. Since some version, U-Boot wait also for one key even if the delay is set to 0. In the past, one had to set CONFIG_ZERO_BOOTDELAY_CHECK to keep the opportunity to enter the boot menu.
One can also save some time on the output text written to the serial terminal, either reducing the number of logs and making it asynchronous with CONFIG_UBI_SILENCE_MSG. 

By default, U-Boot supports a lot of interfaces to boot the system, but we can also save some time on removing some of them (Ethernet, USB, …). We didn’t have a play with this, because our COM board (Toradex iMX7) allows to run U-Boot ONLY from eMMC (due to eFUSES set by Toradex), therefore one needs to be careful while tuning the U-Boot. In case of misconfiguration, one would need to enter a recovery mode (this includes soldering a bridge onto the board and connecting to the PC via USB). On most boards, one can run the system (including U-Boot) from ana SD-Card, but it is considered as less stable and unrecommended for the production version. It is most probably a business decision, to prevent people from using SD-Card, as it may lead to a worse perception of the Toradex products, but of course, it is only my guess…
Anyway, the U-Boot configuration was made in Yocto as a patch for the U-Boot – colibri_imx7_emmc_defconfig.patch:

From d578b7452451d15bce87f006c0ddea5833119231 Mon Sep 17 00:00:00 2001
From: Przemyslaw 
Date: Tue, 9 Jun 2020 11:45:05 +0200
Subject: [PATCH] some fixes
 configs/colibri_imx7_emmc_defconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/configs/colibri_imx7_emmc_defconfig b/configs/colibri_imx7_emmc_defconfig
index 75732925ab..255c7bbed8 100644
--- a/configs/colibri_imx7_emmc_defconfig
+++ b/configs/colibri_imx7_emmc_defconfig
@@ -11,7 +11,7 @@ CONFIG_IMX_BOOTAUX=y
 # CONFIG_CONSOLE_MUX is not set
@@ -72,3 +72,4 @@ CONFIG_CI_UDC=y
\ No newline at end of file
  • Custom Python application

For the blog purpose, we created a gpio_demo.py application, to control the LED. The application uses Python gpio library. It is not a standard package, therefore we need to create a bb file for it. fortunately, there is a tool to create it automatically for a Python package. To create the bb file, issue:

pipoe -p cantools –python python3

We also want to run our application automatically at boot (as a service). Because we want to run it as fast as possible, without optimizing of the systemd itself, we can set fields WantedBy and  DefaultDependencies, as below:

Description=gpio-demo service

ExecStart=python3 /usr/bin/gpio_demo.py


We also need to install our application file (e.g. in /usr/bin) and enable the service.

  • Flashing the image to the eMMC

Finally, we need to flash the image to the built-in eMMC memory, as it is most probably the fastest available interface. One can flash the U-Boot with the environment, kernel, dts and rootfs manually to the eMMC, but Toradex provides a great tool to do it automatically – Toradex Easy Installer. It requires all these components to be packed in an appropriate format, but our Yocto build already provides it. We then flash the eMMC over the Ethernet, connecting to the Toradex Easy Installer with a VNC client – see screenshot below.

After flashing and restarting, our custom Linux distributions start and work well.
The time from plugging the power to the second LED blink is ~3.4 seconds. After disabling the kernel output to the serial port, it goes down to 2.9 s, including all the boot steps described earlier in the article.
The time was measured with a logic analyzer. The time we are talking about is from the rising edge of channel 0 to the second rising edge of channel 2.
It is worth noting, that the first LED toggle takes place 1.7 s after powering the board. This comes from the bash script – wrapper around the Python script..

channel 0 – power 3.3V
channel 1 – serial TX
channel 2 – LED GPIO.

For the version with serial output enabled, we can see that the kernel itself already boots in 1 second. 

U-Boot 2019.07-3.0.3+g26d926eda0 (Jun 09 2020 - 09:45:47 +0000)

CPU:   Freescale i.MX7D rev1.3 1000 MHz (running at 792 MHz)
CPU:   Extended Commercial temperature grade (-20C to 105C) at 36C
Reset cause: POR
DRAM:  1 GiB
PMIC:  RN5T567 LSIVER=0x01 OTPVER=0x0d
Loading Environment from MMC... OK
In:    serial
Out:   serial
Err:   serial
Model: Toradex Colibri iMX7 Dual 1GB (eMMC) V1.1A, Serial# 06597040
SEC0: RNG instantiated
Net:   FEC0
Hit any key to stop autoboot:  0 
Booting from internal eMMC chip...
42387 bytes read in 9 ms (4.5 MiB/s)
3571456 bytes read in 74 ms (46 MiB/s)
Kernel image @ 0x81000000 [ 0x000000 - 0x367f00 ]
## Flattened Device Tree blob at 82000000
   Booting using the fdt blob at 0x82000000
   Using Device Tree in place at 82000000, end 8200d592

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0
[    0.959055] VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
[    1.117189] systemd[1]: Detected architecture arm.
Welcome to TDX WizzDev 2.6-snapshot-20200608 (thud)!
[    1.156436] systemd[1]: Set hostname to .
[  OK  ] Reached target Swap.
         Mounting Temporary Directory (/tmp)...
[  OK  ] Started gpio-demo service.
         Starting udev Coldplug all Devices...
[  OK  ] Started Journal Service.
[    2.125964] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
TDX WizzDev 2.6-snapshot colibri-imx7-emmc ttymxc0
Colibri-iMX7-eMMC_Console-Image 3.0b3 20200609
colibri-imx7-emmc login:

The speed-up was enough for our application, but of course, it can be improved even more. Further optimization would definitely include stripping more of U-Boot functionality and removing unused modules from the kernel. If boot time is more important than flexibility, one can also try to replace systemd with System V or even a Busybox functionality, running a single script. Although our service is started almost at the beginning, there is still systemd overhead.
Also using a Python application is really slow. C application would be definitely faster – but less flexible in terms of development. As we can see on the timing graph – loading the Python application takes 1.2 s out of the total 2.9 seconds.