Bosch IoT Insights

Pipelines: Embedding a Linux Binary

With the following method, you might use your own existing Docker image or even create a new one to run a custom step. Theoretically, you might use any executable which can run on Linux. We have described the process along with some examples in the following package:

pipeline-example-embedded-linux-binary.zip


Below, you find the general instructions to embed your own docker image based on a Python example. The very same instructions can also be found in the top-level README.md file of the package above.

Furthermore, we implemented some examples using other executables. They are located in the other_examples folder of the archive which you can download above. For each of these, we wrote a specific README.md file located in their respective subfolder.

Extracting and compressing a file system from a docker image

Build an example docker image with a newer Python version from a Docker file (inside ./resources). The docker image might install build-essential and must install fakechroot in order to decouple the file system of this docker image from the host's file system in the pipeline processing container.

Use our provided Docker file for testing or try to use your own Docker container as FROM:

# Uses Python (based on debian 10 buster) or try to use your own docker image (based on libc)
FROM docker.io/python:slim-buster
 
# add build-essentials (useful for compiling Python modules during pip install)
RUN apt-get update && apt-get install -y --no-install-recommends build-essential \
&& apt-get clean && rm -rf /var/lib/apt/lists/* # Clean up to keep the image size as small as possible
 
# add fakechroot to allow decoupling from host operating system without root permission
RUN apt-get update && apt-get install -y --no-install-recommends fakechroot \
&& apt-get clean && rm -rf /var/lib/apt/lists/* # Clean up to keep the image size as small as possible
 
CMD ["./bin/bash"]


Open a terminal (e.g. Bash or Cmd) - for the example in the directory of this README.

Build a docker image and give it a name (here: python_image)

docker build -t python_image -f ./resources/Dockerfile .

Create the docker container without starting it (in order to build the file system of the image)


Linux:

CONTAINER_ID=$(docker create python_image) && echo $CONTAINER_ID

Windows Cmd:

docker create python_image > CONTAINER_ID
set /P CONTAINER_ID=<CONTAINER_ID
del CONTAINER_ID
echo %CONTAINER_ID%


Export the file system of the newly created image into a compressed file (we use .xz for minimal file size)


Linux:

docker export $CONTAINER_ID | xz > ./resources/distro_flat.xz

Windows Cmd: For compression, you might need the compress-tool in your %PATH% environment.

docker export %CONTAINER_ID% | xz > .\resources\distro_flat.xz

If you have GitBash (Mingw64) installed, you could use the xz.exe from there with the following commands

rem // Store path to git.exe in variable GIT_EXE_PATH
where git > GIT_EXE_PATH
set /P GIT_EXE_PATH=<GIT_EXE_PATH
del GIT_EXE_PATH
rem // Store current directory and switch to git\mingw64\bin directory
pushd %GIT_EXE_PATH%\..\..\mingw64\bin
rem // Save path of current directory %CD% (where xz.exe is located)
set GIT_TOOLS_DIR=%CD%
rem // Restore original directory
popd
docker export %CONTAINER_ID% | %GIT_TOOLS_DIR%\xz.exe > .\resources\distro_flat.xz

This resulting distro_flat.xz can then be embedded into a custom step package for use within an Insights' processing pipeline. To use the compressed file system of the desired distribution, some more information are needed and must be extracted from the image.

The following command will use the script create_environment.sh to generate the output for our constant.py. This script must be executed inside your image. With the following command the docker image is executed with a local mount to the scripts directory and will execute the script to collect the paths used in the example.


Linux:

docker run --rm -it -v "/$(pwd)/scripts:/scripts" python_image bash -c "./scripts/create_environment.sh"

Windows Cmd:

docker run --rm -it -v "%cd%/scripts:/scripts" python_image bash -c "./scripts/create_environment.sh


If it is not working with your linux distribution or you rather do it on your own, the following commands might help you find out what is wrong or what is required to start an executable of your choice. The script executes the following commands in a row and builds the variables used in constant.py. You might also use this script to retrieve the necessary environment paths on your own, adapt them and use them in your own custom step if you do not use the example step.py, constant.py and embedded_linux.py.

Find the paths in the image for the ELF loader and for the fakechroot library. These paths must be used in your step.py to call the executable of your choice in the image inside Insights' processing pipeline container.

docker run --rm -it python_image bash -c "find / -name ld-*.so -or -name libfakechroot.so | sed -u 's/^/\$\(pwd\)/'"
 
 
# Expected output
$(pwd)/lib/x86_64-linux-gnu/ld-2.28.so
$(pwd)/usr/lib/x86_64-linux-gnu/fakechroot/libfakechroot.so


These values are used to create environment variables for FAKECHROOT_ELFLOADER and for LD_PRELOAD. In our example these paths are stored in the file constant.py as variables named OWN_ELF_LOADER and OWN_PRELOAD and are used from helper methods in embedded_linux.py which are called in step.py.

Check if the following command (it lists all files in ld.so.conf in the image) is working. This command could be also executed locally on your own file system in your docker container and should provide the libraries in your container.

docker run --rm -it python_image bash -c "ls /etc/ld.so.conf /etc/ld.so.conf.d/* | xargs cat | grep -v -E -e '^\s*(include|#|$)|fakechroot'"
 
 
# Expected output
/usr/local/lib
/usr/local/lib/x86_64-linux-gnu
/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu


Those values, combined by ':' colons and prepended with $(pwd), are used in the example in constant.py as variables named OWN_LIBRARY_PATH and are used for the environment variable LD_LIBRARY_PATH. This environment variable is necessary in your step.py and must be set before calling the executable of your choice.

We could build the LD_LIBRARY_PATH variable for the image (without the fakechroot library path).

docker run --rm -it python_image bash -c "ls /etc/ld.so.conf /etc/ld.so.conf.d/* | xargs cat | grep -v -E -e '^\s*(include|#|$)|fakechroot' | sed -u 's/^/\$\(pwd\)/' | tr '\n' ':' | rev | cut -c 2- | rev | xargs -n1 printf \"LD_LIBRARY_PATH=%s\""
 
 
# Expected output
LD_LIBRARY_PATH=$(pwd)/usr/local/lib:$(pwd)/usr/local/lib/x86_64-linux-gnu:$(pwd)/lib/x86_64-linux-gnu:$(pwd)/usr/lib/x86_64-linux-gnu


In the example embedded_linux.py, we join the environment variables with space characters. Later on, we use them in helper method run, which executes the command string in a Python subprocess.run.

env_variables = ' '.join([
'FAKECHROOT_ELFLOADER=' + constant.OWN_ELF_LOADER,
'FAKECHROOT_BASE=$(pwd)',
'LD_PRELOAD=' + constant.OWN_PRELOAD,
'LD_LIBRARY_PATH=' + constant.OWN_LIBRARY_PATHS
])


The resulting command which is executed in the processing pipeline should be as follows:

# Environment variables for the loader that executes your executable
FAKECHROOT_ELFLOADER=$(pwd)/lib/x86_64-linux-gnu/ld-2.28.so
FAKECHROOT_BASE=$(pwd)
LD_PRELOAD=$(pwd)/usr/lib/x86_64-linux-gnu/fakechroot/libfakechroot.so
LD_LIBRARY_PATH=$(pwd)/lib/x86_64-linux-gnu:$(pwd)/usr/lib/x86_64-linux-gnu/
# The executable is the argument of the loader (ld-2.28.so). The loader will execute it.
$(pwd)/lib/x86_64-linux-gnu/ld-2.28.so <executable_of_your_choice>

In Python we use module subprocess to execute the command in a separate process.

Now we need to package our own file system and the environment variables, prepared as above, in a custom step. Therefore, we need to extract the file system inside the custom step, fix the symbolic links and finally call our own executable.

Using a compressed file system in a custom step inside the pipeline

First you need to extract your own provided compressed file system inside your custom step runtime. Inside your custom step you need to use something like the following code snippet to extract your compressed file system into a newly created directory (here named distro).

def unpack_xz():
os.mkdir('./distro')
os.chdir('./distro')
with tarfile.open('../resources/distro_flat.xz') as f:
f.extractall('.')
os.chdir('./..')

Next, all the symbolic links inside the extracted file system have to be relocated to point to the outer absolute path of your image's file system (because the Linux Kernel will relocate them without the injected libc or fakechroot). Any symlink that points to an absolute file outside your file system should be redirected to a new absolute path inside your extracted root directory, except /proc and /dev. They must point to the outer file system.

You should consume the output to stdout of your subprocess otherwise the pipeline-processing may interfere with it.

def fix_symbolic_links():
cmd_fix_symlinks = '''
find $(pwd) -xdev -type l | while read linkname;
do
target=`readlink "$linkname"`;
case "$target" in
$(pwd)*) ;; # do nothing
/*) ln -vsf "$(pwd)$target" "$linkname" ;;
esac;
done;
rm -rf proc dev; ln -vsf /proc && ln -vsf /dev
'''
print(subprocess.run(cmd_fix_symlinks, shell=True, stdout=subprocess.PIPE, cwd='./distro').stdout.decode('utf-8'), file=sys.stderr)

Next, call the executable of your choice (provided by the image) with some special Unix techniques:

  • Configure the included fakechroot library with its base directory that should be the root for your executable.

  • Configure the ELF loader that should be used inside the fakechroot to read libraries in the images root directory

  • With the environment variable LD_PRELOAD, configure the library that wraps libc calls.

  • With LD_LIBRARY_PATH, configure the paths to the used libraries inside the extracted file system as absolute paths.

The next code snippet will show you an example call python --version with an absolute path from the outer file system $(pwd)/usr/local/bin/ ($(pwd): displays path name of working directory). The executable itself is not called directly. It is used as first argument of the loader $(pwd)/lib/x86_64-linux-gnu/ld-2.28.so and is executed by the loader from the image. The loader must be configured to use the image's directory (./distro) as a root directory (/). Therefore, we prepend the environment variables (FAKECHROOT_ELFLOADER,FAKECHROOT_BASE,LD_PRELOAD,LD_LIBRARY_PATH) since they are necessary for the call. A Python subprocess could be configured to be executed in a current working directory (cwd="./distro"). Inside this base directory as a root of your extracted file system, the pwd will create absolute paths within the processing pipeline.

cmd_python_version = ' '.join([
'FAKECHROOT_ELFLOADER=$(pwd)/lib/x86_64-linux-gnu/ld-2.28.so',
'FAKECHROOT_BASE=$(pwd)',
'LD_PRELOAD=$(pwd)/usr/lib/x86_64-linux-gnu/fakechroot/libfakechroot.so',
'LD_LIBRARY_PATH=$(pwd)/lib/x86_64-linux-gnu:$(pwd)/usr/lib/x86_64-linux-gnu/',
'$(pwd)/lib/x86_64-linux-gnu/ld-2.28.so $(pwd)/usr/local/bin/python --version' ])
sp = subprocess.run(cmd_python_version, shell=True, stdout=subprocess.PIPE, cwd="./distro").stdout.decode('utf-8')

Bundle code and compressed file system in a custom step zip file

Create a zip file for your custom Python step that contains all the source code and the resources (i.e. the compressed file system) for your own custom step.

Those files and folders should be located at the root/top of the zip file (no parent folder):

File

Usage

executable-manifest.yaml

Mandatory

resources/distro_flat.xz

Mandatory

src/step.py

Mandatory

src/insights_protocol.py

Mandatory if used like in the example

src/constant.py

Mandatory if used like in the example

requirements.txt

Optional

See also Configuring a pipeline.

Debugging

In the invoke method of your step, you could check if the file system is extracted as expected.

# append current working directory just for information
document['metaData.debug']['current_working_dir'] = os.getcwd()
# append current directory list of the extracted file system
document['metaData.debug']['own_file_system'] = os.listdir('./fs_other_distro')

Alternatively, you could print debug information to stderr and check the output of your pipeline in the App Console of Insights.

# print directory list of the extracted file system
print("Extracted file system root: %s\n" % os.listdir(constant.OWN_FILE_SYSTEM_DIR), file=sys.stderr)

Problems and Restrictions

Modify PATH environment

Sometimes you need to modify the PATH environment variable because other executables are expected to be available via $PATH. You can change the PATH environment variable with export PATH=$(pwd)/your-executable/bin:$PATH

Missing reference to /proc/self/exe

In other_examples/ghci you will find an example that explains problems caused by /proc/self/exe and how it could be fixed with a specific symlink or by using patchelf to run the command directly.

Invalid elf_header

If you get errors like libc.so is not an ELF file - it has the wrong magic bytes at the start. or libc.so: invalid ELF header you might also have a look at the example other_example/ghci

Restrictions

The mechanism will not work with programs that do not use libc, e.g. because they are built with static linking switched on. Those are pretty rare since many distribution discourage to use static linking. For example, Go is a programming language that normally builds totally self-containing executables by packing all dependencies into. Busybox – a famous all-in-one solution for small shell environments – is another example. Finally, the Alpine Linux distribution is based on musl libc – another implementation of libc than the one that fakechroot uses (glibc). It is not expected that such programs will work when LD_PRELOAD contains the libfakechroot.so, and so won’t exports from docker containers that use Alpine based images to reduce size.

Testing commands and your own executable locally

For a faster development cycle, testing the executable of your choice and the commands necessary to run it, is also possible locally. For this you need to run a Docker container on your developer machine, which simulates our processing-pipeline container. Insights' container is Ubuntu-based (Jammy version), so should the test container. Then you can use a terminal and execute the same steps directly, which are executed otherwise in your step.py.

The following Dockerfile (located in the directory local-testing) will create such a container, which is very similar to the one which is used in our production environment within IoT-Insights.

FROM docker.io/eclipse-temurin:17-jdk-jammy
 
# Make python3.9 version available (https://wiki.ubuntuusers.de/Python/manuelle_Installation/)
WORKDIR /tmp
 
ENV PYTHON_VERSION=3.9.15
 
# Make python3.9 version available (https://wiki.ubuntuusers.de/Python/manuelle_Installation/)
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
build-essential \
libssl-dev \
zlib1g-dev \
libncurses5-dev \
libncursesw5-dev \
libreadline-dev \
libsqlite3-dev \
libgdbm-dev \
libdb5.3-dev \
libbz2-dev \
libexpat1-dev \
liblzma-dev \
tk-dev \
libffi-dev \
uuid-dev \
&& curl -k -L https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz -o Python-${PYTHON_VERSION}.tgz \
&& tar -xf Python-${PYTHON_VERSION}.tgz \
&& cd Python-${PYTHON_VERSION} && ./configure && make && make install && cd .. \
&& rm -rf Python-${PYTHON_VERSION} Python-${PYTHON_VERSION}.tgz \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
 
WORKDIR /
 
RUN apt-get update && apt-get upgrade -y \
&& apt-get install -y --no-install-recommends \
curl \
gcc \
git \
inotify-tools \
net-tools \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
 
RUN useradd -ms /bin/bash vcap \
&& usermod -d /home/vcap vcap
 
USER vcap:vcap
WORKDIR /home/vcap
 
ENV USER_DIR=/home/vcap
 
CMD ["./bin/bash"]

You could execute the following commands inside the directory local-testing

At first, you need to build the Docker image for the simulated parent container and give it a name (here: pipeline_test_image).

docker build -t pipeline_test_image .

Next you need to start the Docker container for this pipeline_test_image image and add a mount point to your local developer machine. The mount point will provide access to the distro_flat.xz, with all the content from step 1, from out of your running container. Summarized: The distro_flat.xz contains a file system, that is extracted from a Linux distribution in which your executable is normally runnable. The file system is a flat file system and not a container instance, it contains only the physical bits and bytes (which are normally stored on a hard-drive), including all libraries and tools that are necessary to run this executable file.

Mounting the XZ archive into the container shortens the test cycle, as that big file is not required to be copied into the container.

docker run --rm -it -v "/$(pwd)/../resources:/resources" pipeline_test_image bash

Now, you will be logged-in in a running Docker container. Inside this container, that is very similar to our processing-pipeline environment, you can test the commands, that should be executed from your step.py. The commands that you need to execute are at least the following three:

  • Extract your distro_flat.xz. You need to extract the distro_flat.xz, which is provided via a Docker mount point. The distro_flat.xz is located on your developer machine, inside the .resources directory, and will be mapped inside the Docker container under the absolute path /resources. The following command will create a directory and extract the distro_flat.xz file into this directory and finally print the content of this directory.

    mkdir distro && cd distro && tar -xf /resources/distro_flat.xz -C ~/distro && ls -la
  • Fix all symlinks inside the extracted file system. To bring the extracted file system to life we need to fix all internal symlinks as we do it in the step.py. Therefore, you could execute the following command:

    find $(pwd) -xdev -type l | while read linkname;
    do
    target=`readlink "$linkname"`;
    case "$target" in
    $(pwd)*) ;; # do nothing
    /*) ln -vsf "$(pwd)$target" "$linkname" ;;
    esac;
    done;
    rm -rf proc dev; ln -vsf /proc && ln -vsf /dev
  • At last, you can try to call your executable using the advanced Linux techniques described in this article. You must set the environment variables for the elf_loader, the fakechroot directory, and the paths the preloaded library directories. If there are any errors or missing libraries or something like that, the errors will be printed directly to the console of the running Docker container. This will make the feedback much faster, and it will be easier to analyze why something is missing. If you can successfully run the executable of your choice using this approach, you still need to translate all the commands that you executed in this container into Python code and repeat them in the step.py of your custom step. The history unix command displays all the commands that have benn executed in your running container. The following command will run the newer Python version and should show how the commands must look like:

    FAKECHROOT_ELFLOADER=$(pwd)/lib/x86_64-linux-gnu/ld-2.28.so FAKECHROOT_BASE=$(pwd)
    LD_PRELOAD=$(pwd)/usr/lib/x86_64-linux-gnu/fakechroot/libfakechroot.so
    LD_LIBRARY_PATH=$(pwd)/lib/x86_64-linux-gnu:$(pwd)/usr/lib/x86_64-linux-gnu/
    $(pwd)/lib/x86_64-linux-gnu/ld-2.28.so $(pwd)/usr/local/bin/python --version

Additional: If you rather want to test your Python step.py code it is also possible to mount the whole custom-step.zip file that you normally upload to IoT-Insights. Inside the running container you have to unpack this zip file. Inside the unzipped directory you should find the structure with your resources/distro_flat.xz and inside src you should find your Python code.

With this setup you could test your Phyton code and that the unzipping of the distro-flat.xz is working as expected.