CLI Reference¶
The coreai-onnx command provides four subcommands.
inspect¶
Analyse an ONNX model and report op coverage.
coreai-onnx inspect <model.onnx>
Sample output (single-Relu model):
Op histogram: model.onnx
┏━━━━━━┳━━━━━━━┳━━━━━━━━━━━┓
┃ Op ┃ Count ┃ Status ┃
┡━━━━━━╇━━━━━━━╇━━━━━━━━━━━┩
│ Relu │ 1 │ supported │
└──────┴───────┴───────────┘
Total nodes: 1 | convertible: yes
Exit codes: see Exit codes (stable, all commands). Exit 1 means the model contains unsupported ops (listed in the output).
convert¶
Convert an ONNX model to a Core AI .aimodel asset. By default convert runs a
three-step pipeline:
Validate — confirm the input model loads and runs on ONNX Runtime, and stop with a clear error if it does not. Skipped, with a note, when
onnxruntimeis not installed.Convert — lower the model and write the optimised
.aimodel.Precision check — run the saved
.aimodeland print a precision comparison against ONNX Runtime. Skipped, with a note, off macOS or whenonnxruntimeis not installed.
ONNX Runtime is executed only once: the reference outputs from the validation step are reused for the precision check. The reference session runs with ONNX Runtime’s graph optimizations and Arm KleidiAI kernels disabled so it reflects ONNX spec semantics — both have produced wrong references on Apple Silicon (ORT 1.24–1.26) — so reference outputs may differ slightly from a default ORT session.
coreai-onnx convert <model.onnx> -o <output.aimodel> [options]
Options:
-o,--output— output.aimodelpath (required)--no-optimize— skip theoptimize()step after conversion (optimisation runs by default)--name— entrypoint function name (default:main)--no-validate— skip the pre-conversion ONNX Runtime validation gate--no-verify— skip the post-conversion precision check--repair— apply known-safe ONNX rewrites for documented Core AI runtime limitations (e.g. promote float16 programs to float32) before conversion, then re-verify parity; applied fixes appear inresult.repairs(see repair)--rtol— precision-check relative tolerance (default:1e-3, or1e-2for float16 outputs)--atol— precision-check absolute tolerance (default:1e-4, or1e-3for float16 outputs)--min-psnr— PSNR (dB) floor below which artol/atolfailure is not accepted as benign accumulation noise (default:50; see The default precision verdict and Benign precision-check failures)--compute-unit— pin the precision check to a unit (cpu_only,cpu,gpu,ane). Default: the verdict measures fp32 fidelity (the check runs on the runtime’s choice and falls back tocpu_onlyonly on a miss). GPU and ANE execute in float16;cpu_onlyis the fp32 path that proves conversion fidelity. An explicit value is the verdict, with no fallback--seed— RNG seed for input generation (default:0)
Sample output:
Converted successfully
Nodes: 1
Output: model.aimodel
Precision: ONNX Runtime vs .aimodel
┏━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ Output ┃ Max abs error ┃ Max rel error ┃ PSNR ┃ Pass ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│ y │ 5.9605e-08 │ 1.1217e-07 │ 145.59 │ PASS │
└────────┴───────────────┴───────────────┴────────┴──────┘
Precision check passed
Exit codes: see Exit codes (stable, all commands). Exit 3 means the .aimodel was written but the precision check did not pass (exceeded tolerance, or could not run); exit 1 means conversion failed and no .aimodel was written.
Validation inputs: the validate and precision steps feed seeded random tensors. For models with index-style inputs (e.g. gather indices) those values can fall out of range and trip a false validation failure — pass
--no-validate(and--no-verify) for such models, or use the Python API (validate_onnxruntime/verify) with explicitinputs.
verify¶
Run the converted model on random inputs and compare outputs against ONNX Runtime.
coreai-onnx verify <model.onnx> <model.aimodel> [--rtol RTOL] [--atol ATOL] [--min-psnr DB] [--compute-unit UNIT] [--seed 0] [--name ENTRYPOINT]
Options:
--rtol— relative tolerance (default:1e-3, or1e-2for float16 outputs)--atol— absolute tolerance (default:1e-4, or1e-3for float16 outputs)--min-psnr— PSNR (dB) floor below which artol/atolfailure is not accepted as benign accumulation noise (default:50)--compute-unit— pin the check to a unit (cpu_only,cpu,gpu,ane). Default: the verdict measures fp32 fidelity (runs on the runtime’s choice, falls back tocpu_onlyonly on a miss; an explicit value has no fallback)--seed— RNG seed for input generation (default:0)--name— entrypoint function name used at convert time (default:main)
Requires onnxruntime and the Core AI runtime (macOS 27+).
The default precision verdict¶
The precision check answers one question — did the conversion preserve the
model? — which is a property of the fp32 computation, not of any one
hardware unit. So when you do not pass --compute-unit, the verdict measures
conversion fidelity, and two correct-but-divergent cases that used to fail are
demoted to warnings:
float16 hardware divergence. The runtime executes GPU/ANE in float16, so a model whose intermediates exceed float16 range (or whose program the GPU backend cannot execute at all) can drift, NaN, or abort there while the conversion is exact. When the default-unit run fails or the runtime aborts, the check re-runs on the deterministic fp32
cpu_onlypath; if that passes, the verdict is pass with aprecision_hardware_divergencewarning.Benign accumulation noise. An output that fails the elementwise tolerance but reaches a high PSNR (≥ 50 dB by default; see below) and matches the reference’s non-finite pattern is accepted with a
precision_benign_noisewarning.
For speed, the check runs first on the runtime’s default unit (the fast path);
the fp32 cpu_only confirmation runs only when that first run does not pass,
so a passing conversion pays no extra cost. precision.compute_unit in the
envelope is the effective unit behind the verdict: null when the default
unit was authoritative, "cpu_only" when the fp32 confirmation was, or the unit
you pinned. An explicit --compute-unit is honored exactly, with no fallback —
pin gpu/ane to make the float16 hardware path the verdict, or cpu_only to
force the fp32 path.
Benign precision-check failures¶
The strict per-element criterion is numpy.allclose. Two classes of correct
conversions fail it elementwise; by default
(The default precision verdict) both pass with
a warning rather than failing, and the report tells them apart:
Large-magnitude accumulation noise. When an output’s values span many orders of magnitude (some vision encoders reach
1e12on random probe inputs), even fp32-vs-fp32 summation-order differences produce absolute errors no sensiblertol/atolaccepts elementwise — while PSNR stays excellent (> 100 dB). The default--min-psnr 50floor accepts these as benign (with aprecision_benign_noisewarning); raise the floor, or set--rtol/--atol, to treat them as failures. As reference points, 60 dB is a faithful conversion; 100+ dB is numerically near-perfect.float16 hardware. The runtime executes GPU/ANE in float16. A model whose intermediate values exceed float16 range (or whose divisions flush denormal denominators to zero) can drift or produce NaN there even though the conversion is exact. The default verdict already isolates this: it falls back to the fp32
cpu_onlypath and passes with aprecision_hardware_divergencewarning. If the model needs fp32 at runtime, ship it withSpecializationOptions.cpu_only()(or fix the numerics in the source model, e.g. add an epsilon to unguarded divisions); pin--compute-unit gpu/aneto make the float16 path the verdict.
A third case is the model itself producing NaN/Inf on the random probe input
(e.g. an unguarded 0/0). The reference and the converted model then both
emit non-finite values; the check compares those positions by NaN/Inf mask,
reports per-output counts as expected_nonfinite, and emits the
reference_nonfinite warning so the degenerate reference is visible.
Sample output:
Precision: ONNX Runtime vs .aimodel
┏━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━┓
┃ Output ┃ Max abs error ┃ Max rel error ┃ PSNR ┃ Pass ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━┩
│ y │ 0.0000e+00 │ 0.0000e+00 │ inf │ PASS │
└────────┴───────────────┴───────────────┴──────┴──────┘
Verification passed
Exit codes: see Exit codes (stable, all commands). Exit 2 means the Core AI runtime is unavailable (requires macOS 27+).
schema¶
Print the tool’s machine-readable capability description.
coreai-onnx schema [--json]
In human mode, prints a summary table of commands with a count of supported ops, error codes, and exit codes.
With --json, emits the full capability contract — commands with all flags,
error codes, warning codes, exit codes, the supported-op list, and runtime
requirements. The code tables in the dump are the same source the CLI emits
from, so they cannot drift from behavior.
Exit codes: always 0 (the command needs no model, no optional dependencies,
and works on any platform).
--json example (trimmed — commands and supported_ops bodies omitted):
{
"schema_version": 1,
"command": "schema",
"status": "ok",
"result": {
"tool": {
"name": "coreai-onnx",
"version": "1.1.2",
"schema_version": 1,
"description": "ONNX conversion tooling for Apple platform runtimes, with validation, precision checks, float16 repair, and command/API surfaces.",
"global_options": [
{
"flag": "--json",
"type": "flag",
"default": false,
"required": false,
"help": "Emit one machine-readable JSON document on stdout instead of human-formatted output"
}
]
},
"commands": [ ... ],
"error_codes": [ ... ],
"warning_codes": [ ... ],
"exit_codes": [ ... ],
"supported_ops": [ ... ],
"runtime": {
"convert_platforms": "any",
"verify_platforms": "macOS 27+ with the Core AI runtime",
"onnxruntime_required_for": ["validate", "verify"]
}
},
"warnings": [],
"error": null
}
Machine-readable output (--json)¶
Every command accepts --json (before or after the subcommand). Stdout then
carries exactly one JSON document — nothing else — in this envelope:
{
"schema_version": 1,
"command": "convert",
"status": "ok",
"result": { },
"warnings": [{"code": "onnxruntime_missing", "message": "..."}],
"error": null
}
On failure, status is "error" and error is an object with keys
code, message, details, and hint. result may still be present when
partial output exists (for example, the .aimodel was written but the
precision check failed). Non-finite precision metrics serialize as strings:
PSNR of infinity as "inf", negative infinity (zero reference signal,
nonzero noise) as "-inf", and NaN/Inf error maxima as "nan"/"inf".
rtol/atol/min_psnr/compute_unit in verify results echo the values
passed on the command line; null means the default was chosen inside
verify().
Command results¶
inspect—{model_path, total_nodes, convertible, ops: [{op, count, supported}], unsupported: [...]}(ops sorted by name). Exit code1withstatus: "ok"means the analysis succeeded but the model is not convertible.convert—{output_path, total_nodes, optimized, validated, repairs, precision};repairslists any auto-repairs applied with--repair(empty otherwise; each is{name, summary, details}— see repair);precisionisnullwhen the check was skipped — either explicitly (--no-verify) or for a reason reported inwarnings.verify—{passed, rtol, atol, min_psnr, compute_unit, seed, outputs: [{name, max_abs_error, max_rel_error, psnr, passed, expected_nonfinite}]}(outputs in model order;expected_nonfinitecounts NaN/Inf values in the ONNX Runtime reference for that output).
Error codes (stable, append-only)¶
Code |
Meaning |
|---|---|
|
Ops with no lowering; |
|
The input model failed to load or run on ONNX Runtime. |
|
A lowering failed; |
|
The Core AI compiler failed to optimize or save the program. |
|
Outputs exceeded tolerance (asset still written for |
|
The precision check could not run; |
|
The file is not a valid ONNX model. |
|
The file could not be read or written. |
|
|
Warning codes: onnxruntime_missing (onnxruntime not installed; validation
and precision check skipped), platform_no_runtime (not on macOS; precision
check skipped but .aimodel was written), reference_nonfinite (the ONNX
Runtime reference itself contains NaN/Inf on the probe input; parity at those
positions is checked by mask — see
Benign precision-check failures),
precision_benign_noise (an output failed the elementwise tolerance but was
accepted as benign accumulation noise at high PSNR — the conversion is
faithful), precision_hardware_divergence (the model diverged on the runtime’s
default float16 GPU/ANE unit but is faithful on the fp32 CPU path; the verdict
reflects fp32 — see The default precision verdict).
Exit codes (stable, all commands)¶
Code |
Meaning |
|---|---|
|
Success. |
|
Failure: bad model, unsupported ops, conversion failure, or verification failure. |
|
Platform error ( |
|
The |
Stability policy¶
schema_version is 1 and bumps only on breaking changes; additive fields
do not bump it. Error codes, warning codes, and exit codes are append-only.