Embedl Device Cloud
Compile locally and profile TFLite models on the Embedl device cloud.
This guide walks you through compiling a TFLite model locally and profiling it on the Embedl device cloud — a managed cloud backed by AWS Device Farm. This is the simplest way to profile TFLite models: it requires no additional cloud accounts beyond your Embedl Hub account.
Since compilation runs locally using onnx2tf, turnaround is faster than
fully cloud-based providers: the local compile takes 10–25 seconds and
the cloud profiling step typically completes in under a minute. Note
that the local compiler only supports FP16 conversion — if you need INT8
quantization or other device-specific optimizations, use Qualcomm AI Hub for compilation. You can still
profile the compiled model on the Embedl Device Cloud.
You will learn how to:
- Compile an ONNX model to TFLite locally
- Profile the compiled model on a cloud device
The Embedl device cloud is a profiling-only provider. Compilation is done locally, or you can compile on Qualcomm AI Hub and profile the resulting model here.
Prerequisites
Make sure you have completed the setup guide to:
- Create an Embedl Hub account
- Install the
embedl-hubPython library - Configure an API key
No additional setup is needed — the Embedl device cloud is included with your Embedl Hub account.
Creating a project
embedl-hub init \
--project "My Project" \
--artifact-dir ~/my-artifactsThis sets the default project and artifact directory for subsequent commands. The artifact directory is where compiled models, profiling results, and other outputs are stored on disk. Later commands — such as profiling a model from a previous compile step — look here for previously produced artifacts. If omitted, a platform-specific default location is used.
You can view your current settings at any time:
embedl-hub showSelecting a target device
The Embedl device cloud provides access to a range of real devices. You need to select a target device — the specific hardware the model will be profiled on.
embedl-hub list-devicesYou can also browse the full list on the Supported devices page.
Preparing a model
The compile step expects an ONNX file. You can save
your existing PyTorch model in ONNX format using torch.onnx.export:
import torchfrom torchvision.models import mobilenet_v2model = mobilenet_v2(weights="IMAGENET1K_V2")example_input = torch.rand(1, 3, 224, 224)torch.onnx.export( model, example_input, "mobilenet_v2.onnx", input_names=["input"], output_names=["output"], opset_version=18, external_data=False, dynamo=False,)Compiling a model locally
Since the Embedl device cloud is a profiling-only provider, we compile
the model locally using onnx2tf. This applies FP16 quantization and
typically completes in 10–25 seconds depending on model size (no device
or cloud account needed for this step):
embedl-hub compile tflite local \
--model /path/to/mobilenet_v2.onnxThe compiled model is saved as mobilenet_v2.tflite in the artifact
directory configured by embedl-hub init --artifact-dir.
If you need INT8 quantization for better on-device performance, consider using Qualcomm AI Hub instead, which compiles with device-specific INT8 quantization.
Profiling a model
Profile the compiled model on the Embedl device cloud:
embedl-hub profile tflite aws \
--model /path/to/mobilenet_v2.tflite \
--device "Samsung Galaxy S24"You can also profile a model from a previous compile run:
embedl-hub profile tflite aws \
--from-run latest \
--device "Samsung Galaxy S24"Use embedl-hub log to view your runs.
Profiling gives you the model’s latency on the target hardware, which layers are slowest, the number of layers executed on each compute unit type, and more. You can use this information to iterate on the model’s design and answer questions like:
- Can we optimize the slowest layer?
- Why aren’t certain layers running on the expected compute unit?
Next steps
- Learn how to view, name, and tag your runs, and how to interpret profiling results in the exploring results guide.
- See the providers guide for the full reference of supported provider and toolchain combinations.