Skip to content

openwrt-tests CI execution flow

How a CI run works end-to-end: from a GitHub Actions trigger to a firmware test on a physical DUT in a remote lab.

Companion to openwrt-tests onboarding (infrastructure setup) and lab architecture (VLAN and coordinator design).


1. Two independent planes

The flow combines two separate communication channels that should not be confused:

Plane Protocol Purpose
Control WebSocket (port 20408) Coordinator manages places, reservations, locks. labgrid-client and the Labgrid pytest plugin use this.
Hardware access SSH over WireGuard Runner reaches the lab host to access physical resources (serial, power, DUT SSH). The coordinator is not involved for this.

2. Components per location

Datacenter VM (global-coordinator)

Component Role
labgrid-coordinator (port 20408) WebSocket server. Registers places from places.yaml, tracks locks and reservations.
places.yaml Generated by Ansible from labnet.yaml. Lists every place (DUT) across all labs.
GitHub Actions self-hosted runners Processes that poll GitHub via HTTPS and execute workflow jobs. Labeled global-coordinator.
WireGuard peers One per lab host. Gives each lab a private IP reachable from the VM.

Each lab host (e.g. labgrid-fcefyn, labgrid-aparcar)

Component Role
labgrid-exporter Registers local DUT resources (serial, power, network) with the coordinator over WebSocket.
exporter.yaml Declares the physical resources of each place: USB serial path, PDUDaemon port, DUT IP+VLAN interface.
pdudaemon Controls DUT power via relay or PDU. Exposes an HTTP API on localhost:16421.
ser2net Exposes USB serial ports as TCP sockets. Used by Labgrid SerialDriver.
dnsmasq DHCP + TFTP server per VLAN. DUTs boot initramfs via TFTP.
labgrid-bound-connect SSH proxy command (runs as sudo). Bridges a TCP connection to a DUT IP bound to a specific VLAN interface using socat.
WireGuard peer Tunnel to the VM.
flowchart TD
    subgraph vm ["Datacenter VM (public IP)"]
        RUNNERS["GitHub runners"]
        COORD["labgrid-coordinator :20408"]
    end

    subgraph lab ["Lab host (e.g. labgrid-fcefyn)"]
        EXP["labgrid-exporter"]
        BC["labgrid-bound-connect"]
        PDU["pdudaemon :16421"]
        SER["ser2net"]
        DNS["dnsmasq / TFTP"]
        DUTs["DUTs (192.168.1.1%vlanXXX)"]
    end

    RUNNERS -- "1" --> COORD
    EXP -- "2" --> COORD
    RUNNERS -- "3" --> BC
    BC -- "4" --> DUTs
    PDU -- "5" --> DUTs
    SER -- "5" --> DUTs
    DNS -- "5" --> DUTs
Hold "Alt" / "Option" to enable pan & zoom
# Connection Detail
1 Runners → Coordinator WebSocket localhost:20408 (reserve / lock / unlock)
2 Exporter → Coordinator WebSocket via WireGuard (register resources)
3 Runners → bound-connect SSH via WireGuard (LG_PROXY)
4 bound-connect → DUTs socat TCP bound to correct VLAN interface
5 Local services → DUTs pdudaemon (power), ser2net (serial), dnsmasq (DHCP/TFTP)

All connections between the VM and the lab host traverse a WireGuard tunnel. places.yaml on the VM is generated by Ansible from labnet.yaml.


3. Matrix strategy: one job per device

The generate-matrix job reads labnet.yaml and produces a JSON list of all devices across all labs. GitHub Actions expands it into parallel jobs, one per device.

flowchart LR
    LN["labnet.yaml\n(devices + labs)"]
    GM["generate-matrix job\n(ubuntu-latest)"]
    J1["Job: openwrt_one\n(labgrid-fcefyn)\nruns-on: global-coordinator"]
    J2["Job: bananapi_bpi-r4\n(labgrid-fcefyn)\nruns-on: global-coordinator"]
    J3["Job: linksys_e8450\n(labgrid-hauke)\nruns-on: global-coordinator"]

    LN --> GM
    GM -- "matrix JSON" --> J1
    GM -- "matrix JSON" --> J2
    GM -- "matrix JSON" --> J3
Hold "Alt" / "Option" to enable pan & zoom

Each job receives its own matrix.device, matrix.proxy, matrix.target, matrix.firmware values.


4. Environment variables and $GITHUB_ENV

Variables are passed between steps via $GITHUB_ENV: a temporary file the runner creates per job. Each echo "VAR=val" >> $GITHUB_ENV writes a line; the runner reads the file after each step and injects the variables into the next step's process environment.

sequenceDiagram
    participant GH as GitHub Actions
    participant Step1 as Step: Set environment
    participant Step2 as Step: Wait for free device
    participant Step3 as Step: Run test
    participant Step4 as Step: Poweroff and unlock

    GH->>Step1: matrix.device, matrix.proxy, matrix.firmware, matrix.version_url
    Step1->>Step1: wget firmware from OpenWrt mirrors
    Step1-->>GH: LG_IMAGE=/path/to/firmware<br/>LG_PROXY=labgrid-fcefyn<br/>(via $GITHUB_ENV)

    GH->>Step2: (reads LG_PROXY from env)
    Step2->>Step2: labgrid-client reserve --wait --shell device=X
    Note right of Step2: eval sets LG_TOKEN in current shell
    Step2-->>GH: LG_TOKEN=xxx<br/>LG_PLACE=+<br/>LG_ENV=targets/device.yaml<br/>(via $GITHUB_ENV)
    Step2->>Step2: labgrid-client -p +$LG_TOKEN lock

    GH->>Step3: (reads all LG_* from env)
    Step3->>Step3: uv run pytest tests/

    GH->>Step4: (reads LG_TOKEN from env)
    Step4->>Step4: labgrid-client power off
    Step4->>Step4: labgrid-client -p +$LG_TOKEN unlock
Hold "Alt" / "Option" to enable pan & zoom
Variable Set by Used by Value
LG_IMAGE Step "Set environment" Labgrid plugin (!template $LG_IMAGE in target YAML) Local path to firmware file
LG_PROXY Step "Set environment" labgrid-client, Labgrid plugin Lab proxy name (e.g. labgrid-fcefyn)
LG_TOKEN Step "Wait for free device" via eval labgrid-client lock/unlock Reservation token from coordinator
LG_PLACE Step "Wait for free device" Labgrid plugin (!template "$LG_PLACE") + (active reservation)
LG_ENV Step "Wait for free device" Labgrid plugin targets/<device>.yaml
LG_COORDINATOR Not set (uses default) labgrid-client, Labgrid plugin localhost:20408 (runner and coordinator share the VM)

!template in target YAML

Target files (targets/<device>.yaml) use !template to expand environment variables at Labgrid load time:

resources:
  RemotePlace:
    name: !template "$LG_PLACE"   # expands to "+"

images:
  root: !template $LG_IMAGE       # expands to /path/to/firmware

5. Full CI sequence

sequenceDiagram
    participant GH as GitHub (HTTPS)
    participant R as Runner (VM)
    participant LC as labgrid-client
    participant COORD as labgrid-coordinator (VM :20408)
    participant PP as pytest + Labgrid plugin
    participant LAB as Lab host
    participant DUT as DUT

    GH-->>R: trigger job (HTTPS poll)
    R->>R: checkout repo, install uv

    Note over R: Step "Set environment"
    R->>R: wget firmware from mirrors
    R->>R: write LG_IMAGE, LG_PROXY to $GITHUB_ENV

    Note over R: Step "Wait for free device"
    R->>LC: labgrid-client reserve --wait --shell device=X
    LC->>COORD: WebSocket: reserve(device=X)
    COORD-->>LC: LG_TOKEN=xxx (place allocated)
    LC->>COORD: WebSocket: lock(+$LG_TOKEN)
    R->>R: write LG_TOKEN, LG_PLACE, LG_ENV to $GITHUB_ENV

    Note over R: Step "Run test"
    R->>PP: uv run pytest tests/
    PP->>COORD: WebSocket: getResources(place=+)
    COORD-->>PP: serial=USBSerial@lab, PDU=localhost:16421@lab, IP=192.168.1.1%vlanXXX
    PP->>LAB: SSH (LG_PROXY=labgrid-fcefyn via WireGuard)
    LAB->>LAB: labgrid-bound-connect (socat TCP → 192.168.1.1:22 bound to vlanXXX)
    LAB->>DUT: TCP connection to DUT
    PP-->>DUT: flash firmware via TFTP + U-Boot serial
    PP-->>DUT: SSH to DUT (via bound-connect)
    DUT-->>PP: test results

    Note over R: Step "Poweroff and unlock"
    R->>LC: labgrid-client power off
    LC->>COORD: WebSocket: call PDUDaemonDriver.off
    COORD->>LAB: forward to exporter → pdudaemon → relay → DUT power off
    R->>LC: labgrid-client unlock +$LG_TOKEN
    LC->>COORD: WebSocket: unlock(+$LG_TOKEN)
Hold "Alt" / "Option" to enable pan & zoom

6. Role of labgrid-bound-connect

labgrid-bound-connect is a Python script installed on each lab host (not on the VM). It is invoked as an SSH ProxyCommand when the runner opens a connection to a DUT.

Problem it solves: all DUTs share the same IP (192.168.1.1), each on a different VLAN interface (vlan100, vlan101, ...). A normal TCP connect from the lab host would use the default route and miss the right VLAN. The script uses socat with so-bindtodevice=vlanXXX to force the connection out through the correct interface.

Runner (VM)
  └── SSH → lab host (via WireGuard)
              └── labgrid-bound-connect vlan101 192.168.1.1 22
                    └── socat STDIO TCP4:192.168.1.1:22,so-bindtodevice=vlan101
                          └── DUT on VLAN 101

The script runs under sudo (passwordless via /etc/sudoers). It is deployed by the Ansible playbook to /usr/local/sbin/labgrid-bound-connect on each lab host.


7. Summary: who calls what

Action Caller Target Protocol
Reserve place labgrid-client (runner) coordinator WebSocket
Lock place labgrid-client (runner) coordinator WebSocket
Get resources for place Labgrid pytest plugin coordinator WebSocket
SSH to DUT Labgrid pytest plugin lab host → DUT SSH over WireGuard + bound-connect
Serial access Labgrid pytest plugin lab host ser2net TCP over WireGuard
Power control labgrid-client / plugin coordinator → exporter → pdudaemon WebSocket → HTTP
Unlock place labgrid-client (runner) coordinator WebSocket
Register resources labgrid-exporter (lab) coordinator WebSocket (outbound from lab)