Skip to main content
Documentation

Embedl Device Cloud

Compile locally and profile TFLite models on the Embedl device cloud.

This guide walks you through compiling a TFLite model locally and profiling it on the Embedl device cloud — a managed cloud backed by AWS Device Farm. This is the simplest way to profile TFLite models: it requires no additional cloud accounts beyond your Embedl Hub account.

Since compilation runs locally using onnx2tf, turnaround is faster than fully cloud-based providers: the local compile takes 10–25 seconds and the cloud profiling step typically completes in under a minute. Note that the local compiler only supports FP16 conversion — if you need INT8 quantization or other device-specific optimizations, use Qualcomm AI Hub for compilation. You can still profile the compiled model on the Embedl Device Cloud.

You will learn how to:

  • Compile an ONNX model to TFLite locally
  • Profile the compiled model on a cloud device

Prerequisites

Make sure you have completed the setup guide to:

  • Create an Embedl Hub account
  • Install the embedl-hub Python library
  • Configure an API key

No additional setup is needed — the Embedl device cloud is included with your Embedl Hub account.

Creating a project

embedl-hub init \
    --project "My Project" \
    --artifact-dir ~/my-artifacts

This sets the default project and artifact directory for subsequent commands. The artifact directory is where compiled models, profiling results, and other outputs are stored on disk. Later commands — such as profiling a model from a previous compile step — look here for previously produced artifacts. If omitted, a platform-specific default location is used.

You can view your current settings at any time:

embedl-hub show

Selecting a target device

The Embedl device cloud provides access to a range of real devices. You need to select a target device — the specific hardware the model will be profiled on.

embedl-hub list-devices

You can also browse the full list on the Supported devices page.

Preparing a model

The compile step expects an ONNX file. You can save your existing PyTorch model in ONNX format using torch.onnx.export:

import torch
from torchvision.models import mobilenet_v2
model = mobilenet_v2(weights="IMAGENET1K_V2")
example_input = torch.rand(1, 3, 224, 224)
torch.onnx.export(
    model,
    example_input,
    "mobilenet_v2.onnx",
    input_names=["input"],
    output_names=["output"],
    opset_version=18,
    external_data=False,
    dynamo=False,
)

Compiling a model locally

Since the Embedl device cloud is a profiling-only provider, we compile the model locally using onnx2tf. This applies FP16 quantization and typically completes in 10–25 seconds depending on model size (no device or cloud account needed for this step):

embedl-hub compile tflite local \
    --model /path/to/mobilenet_v2.onnx

The compiled model is saved as mobilenet_v2.tflite in the artifact directory configured by embedl-hub init --artifact-dir.

Profiling a model

Profile the compiled model on the Embedl device cloud:

embedl-hub profile tflite aws \
    --model /path/to/mobilenet_v2.tflite \
    --device "Samsung Galaxy S24"

You can also profile a model from a previous compile run:

embedl-hub profile tflite aws \
    --from-run latest \
    --device "Samsung Galaxy S24"

Use embedl-hub log to view your runs.

Profiling gives you the model’s latency on the target hardware, which layers are slowest, the number of layers executed on each compute unit type, and more. You can use this information to iterate on the model’s design and answer questions like:

  • Can we optimize the slowest layer?
  • Why aren’t certain layers running on the expected compute unit?

Next steps