lime-packages CI: hardware test stage¶
How fcefyn-testbed/lime-packages consumes the firmware artifacts
produced by build-image and exercises them on the self-hosted lab
runner (testbed-fcefyn) and on QEMU. Single source of truth:
build-firmware.yml.
Build pipeline overview: lime-packages CI: firmware build.
0. Two-repo model: workflow vs. tests¶
The test infrastructure is deliberately split across two repositories:
| Repo | What it owns |
|---|---|
fcefyn-testbed/lime-packages |
The CI workflow (.github/workflows/build-firmware.yml), the build scripts, and the matrix config. This is the fork of libremesh/lime-packages. |
fcefyn-testbed/libremesh-tests |
The pytest test suite (tests/test_libremesh.py, test_mesh.py, etc.), labgrid environment files (targets/<device>.yaml), and the uv project that pins test dependencies. |
The workflow checks out libremesh-tests@staging during each test
job and calls uv run pytest from there. No test code lives inside
lime-packages itself.
Why the split?¶
libremesh-tests can be used independently:
- Local runs against any firmware (pre-built or downloaded) without going through the CI workflow.
- Future reuse by other forks of
lime-packages(or the upstreamlibremesh/lime-packages) with zero changes to the test code. - Allows separate versioning - test improvements land in
libremesh-testswithout touchinglime-packages.
Repo ownership requirements¶
For the CI workflow in lime-packages to access libremesh-tests,
the self-hosted runner must be able to check out both repos. Two
layouts work:
Option A: same GitHub organisation (current setup)
Both repos live in fcefyn-testbed. The runner is registered at the
organisation level (Settings > Actions > Runners), so it can
service workflows from any repo in the org.
fcefyn-testbed/lime-packages <- workflow here
fcefyn-testbed/libremesh-tests <- checked out by the workflow
The actions/checkout step in the workflow uses
repository: fcefyn-testbed/libremesh-tests - this is a public
repo, so no token: override is needed.
Option B: different organisations (future: upstream contribution)
The upstream libremesh/lime-packages workflow can still check out
fcefyn-testbed/libremesh-tests (public repo, no auth needed):
- uses: actions/checkout@v6
with:
repository: fcefyn-testbed/libremesh-tests
ref: staging
path: libremesh-tests
The runner, however, must be registered in the libremesh org (or in
the specific libremesh/lime-packages repo) for the workflow to be
eligible for self-hosted execution. The lab runner registration is the
only thing that needs to change when contributing upstream - the
libremesh-tests checkout line stays the same as long as the test
repo remains public.
When libremesh eventually merges this CI approach, the expected
final state is:
- Runner registered in
libremeshorg. - Workflow in
libremesh/lime-packageschecks outlibremesh/libremesh-tests(a fork/equivalent living in the same upstream org). fcefyn-testbed/lime-packagesgoes back to tracking upstream with only testbed-specific device entries intargets.yml.
Pinned branch (staging)¶
The workflow always checks out libremesh-tests@staging. This is the
integration branch where reviewed test improvements land before going
to main. Using a named branch (not a SHA) means test fixes propagate
automatically to the next CI run without a lime-packages PR.
1. Trigger matrix¶
| Trigger | test-firmware |
test-mesh |
test-mesh-pairs |
test-firmware-qemu |
test-mesh-qemu |
|---|---|---|---|---|---|
pull_request |
every place | forced N=3 | skipped | run | run |
workflow_dispatch (physical_single=true) |
every place | per physical_mesh_count (0/2/3) |
skipped | run | run |
schedule (cron 06:00 UTC) |
every place | skipped | 3 walking pairs | run | run |
Notes:
- QEMU tests run automatically on every PR without approval.
- Physical tests (
test-firmware,test-mesh) require approval from a member of thelab-reviewersteam via thephysical-labGitHub Actions environment before execution on the self-hosted runner. See CI governance. - The workflow concurrency group
physical-lab-sharedmakes sure that no two lab-bound triggers run at once. - The
summaryjob is a required status check for merging; it fails if any upstream job failed or was cancelled.
2. End-to-end flow¶
flowchart LR
A[firmware-* artifacts] --> B[download-artifact on testbed-fcefyn]
B --> C[Stage under /srv/tftp/firmwares/ci/RUN_ID/]
C --> D[labgrid lock]
D --> E[pytest libremesh-tests]
E --> F[upload test-results-*]
| Step | What happens |
|---|---|
| Artifacts | build-image uploads firmware-<device>-<release> per matrix. |
| Checkout | libremesh-tests@staging, aparcar/openwrt-tests@main. |
| Staging | Firmware copied to /srv/tftp/firmwares/ci/<run_id>/<place>/<release>/ (single-node), .../mesh/<release>/, or .../mesh-pairs/<pair>/<release>/. Per-job staging dirs avoid races. |
| Single | Per place: lock labgrid-fcefyn-<place>, set LG_IMAGE, run pytest tests/test_libremesh.py. |
| Mesh | test-mesh: stage every device the mesh shape needs, set LG_MESH_PLACES + LG_IMAGE_MAP, run pytest tests/test_mesh.py. |
| Pairs | test-mesh-pairs (cron only): three sequential 2-node pairs, max-parallel: 1. Covers every active lab device twice per day. |
Each step is implemented in tools/ci/lab_stage_firmware.sh and tools/ci/lab_stage_mesh.sh; the workflow steps themselves are 2-3 lines plus env vars.
3. Mesh-after-firmware serialisation¶
test-mesh and test-mesh-pairs declare needs: [..., test-firmware]
so they cannot start while a test-firmware job is holding a labgrid
lock on the same place. Without this, both jobs race for the same lock
and one fails with You have already acquired this place.
The QEMU jobs run in parallel with the lab jobs since they do not share lab resources.
4. PR strategy¶
flowchart LR
PR[Pull request] --> TF["test-firmware (every place)"]
PR --> TM["test-mesh (N=3)"]
PR --> TQ[test-firmware-qemu + test-mesh-qemu]
TF --> S[summary]
TM --> S
TQ --> S
- Every physical place runs single-node
test-firmware(afterlab-reviewersapproval). test-meshis forced tophysical_mesh_count=3on PRs becausepull_requestcannot pass workflow inputs and N=3 is the most representative shape (3 different SoC families).- The
summaryjob is a required status check; merge is blocked until all jobs succeed. - Only members of the
maintainersteam can merge the PR (see CI governance).
5. Walking-chain mesh (cron only)¶
Three sequential pairs run in test-mesh-pairs with max-parallel: 1:
| Pair | A | B |
|---|---|---|
| 1 | belkin_rt3200_2 | openwrt_one |
| 2 | openwrt_one | bananapi_bpi-r4 |
| 3 | bananapi_bpi-r4 | belkin_rt3200_3 |
Every active device is exercised twice per day with a different mesh
peer. belkin_rt3200_1 is excluded (in repair) - re-include it by
adding it back to mesh_pairs: in prepare_matrix.sh.
6. QEMU coverage¶
| Job | Purpose |
|---|---|
test-firmware-qemu |
Single-node qemu_x86_64 boot: test_libremesh.py, test_base.py, test_lan.py. |
test-mesh-qemu |
Multi-node mesh on QEMU using vwifi (kmod-mac80211-hwsim with USR1 broadcast). |
Both run on GitHub-hosted runners with KVM. The
tools/ci/enable_kvm.sh step installs a udev rule that grants the
runner user rw on /dev/kvm (default permissions deny non-root
access). udevadm trigger --name-match=kvm is used so the rule applies
to the existing device node, not just future hot-plugs.
7. Labgrid reservation contract¶
Single-node¶
- Lock:
uv run labgrid-client -v -p labgrid-fcefyn-<place> lock. - Unlock + power-off: in an
if: always()step,labgrid-client -p $LG_PLACE power offthen... unlock. The-pflag is required: without it labgrid falls back to its empty default and refuses to act. - Teardown: remove
/srv/tftp/firmwares/ci/<run_id>/<place>/<release>/.
The three Belkin RT3200 units (belkin_rt3200_1/_2/_3) all run
the linksys_e8450 artefact under per-place TFTP staging and per-place
labgrid locks - the lock keys on the place name, not the device.
Environment for pytest: LG_PROXY=labgrid-fcefyn,
LG_PLACE=labgrid-fcefyn-<place>, LG_ENV=targets/<device>.yaml,
OPENWRT_TESTS_DIR=<aparcar/openwrt-tests checkout>.
Mesh¶
Mesh fixtures (tests/conftest_mesh.py in libremesh-tests) require:
LG_MESH_PLACES: comma-separated place names.LG_IMAGE_MAP:place1=/abs/path1,place2=/abs/path2.
VLAN 200 / switch configuration is handled by conftest_vlan (lab host
SSH); set VLAN_SWITCH_DISABLED=1 to skip it.
8. Automatic issues on failure (healthcheck)¶
On schedule runs (daily cron), test-firmware automatically
manages GitHub issues for failing devices:
| Outcome | Action |
|---|---|
| Test fails, no issue exists | Creates CI healthcheck: <place> (<release>) with label healthcheck. |
| Test fails, issue already open | Adds a comment with the latest failure details. |
| Test fails, issue closed | Reopens the issue and updates its body. |
| Test passes, issue open | Comments "passed" and closes the issue. |
Each issue body contains a metadata table (place, device, release, run
link, date) and the full output of lime-report.sh -m (markdown
mode, see PR #1242)
inside a collapsible <details> block.
lime-report is collected from the DUT before poweroff/unlock via
labgrid-client ssh -- lime-report.sh -m. If the device is
unreachable or the command does not exist yet, a fallback message is
stored instead.
This only triggers on schedule so that PRs and manual dispatches do
not create noise in the issue tracker.
9. Debugging a failed run¶
- Open the run on GitHub, find the failed
test-firmware*ortest-mesh*job. - Download
test-results-<device>/test-results-mesh-*. Each bundle has--lg-logconsole output andreport.xml(JUnit). - Check the auto-created healthcheck issue for the device - it
contains the
lime-reportoutput. - On the lab host, check coordinator/exporter, TFTP permissions under
/srv/tftp/firmwares/ci/, and stale locks vialabgrid-client who. - For QEMU jobs, the
qemu-*-logsartifact contains the QEMU console plus pytest's--lg-log.
10. Runner prerequisites¶
The testbed-fcefyn runner must already run libremesh-tests
workflows: uv and labgrid-client on PATH (via uv run), write
access to /srv/tftp/firmwares/, reachability of LG_PROXY.
See CI runner and
Running tests.
For a brand-new device that has not been onboarded yet, follow Adding a device.
11. CI governance: teams and merge policy¶
Access control for fcefyn-testbed/lime-packages is split into two
GitHub Teams with distinct responsibilities:
| Team | Members | Responsibility |
|---|---|---|
lab-reviewers |
francoriba, ccasanueva7 | Approve physical lab test runs (environment deployment) |
maintainers |
francoriba | Review PRs and merge to master |
Environment protection (physical-lab)¶
The physical-lab environment has a required reviewer rule set to
the lab-reviewers team. When a PR or workflow_dispatch triggers a
job that uses environment: physical-lab, GitHub Actions pauses the
job until a team member clicks Approve and deploy in the Actions
UI. can_admins_bypass is disabled so the gate is uniform for everyone.
The schedule trigger skips the gate (empty environment name) so the
daily cron runs unattended.
Branch ruleset (bloqueo marge)¶
A repository ruleset on master enforces:
| Rule | Effect |
|---|---|
required_status_checks (summary) |
PR cannot merge until summary passes |
pull_request (1 approval, dismiss stale) |
PR needs at least one approving review |
deletion + non_fast_forward |
Prevent branch deletion and force-pushes |
Bypass actors: only the maintainers team. They can push directly
to master for CI infrastructure fixes.
Summary job as CI gate¶
The summary job depends on all other jobs (if: always()). The
build_summary.sh script checks every upstream job result:
successorskipped(job condition not met) -> passfailureorcancelled-> fail and exit non-zero
Since summary is the required status check, and it waits for all
jobs (including test-firmware which is pending environment approval),
the PR stays unmergeable until every test completes successfully.
Complete PR lifecycle¶
sequenceDiagram
participant C as Contributor
participant GH as GitHub
participant LR as lab-reviewer
participant M as maintainer
C->>GH: Open PR
GH->>GH: Builds + QEMU tests (automatic)
GH-->>LR: Request environment approval
LR->>GH: Approve and deploy
GH->>GH: Physical tests run on self-hosted runner
GH->>GH: summary job evaluates all results
M->>GH: Code review (approve PR)
M->>GH: Merge
Portability to upstream¶
The workflow only references environment: physical-lab by name. All
governance (teams, rulesets, environment reviewers) lives in GitHub
repository/org settings, not in the YAML. Any organisation adopting
this workflow creates its own physical-lab environment and teams
without modifying the workflow file.