ML Engineer

EngineeringBangalore, India (Onsite)Full-time₹15L - ₹25L

About the role

As an ML Engineer, you will own the intelligence at the center of the product: the system that decides, for each request, which model serves it best on quality, cost, and latency. That covers the classifiers that read a request, the scoring and selection logic that picks a model, and the evaluation systems that let us show the routing is actually good.

The work sits between applied machine learning and production engineering. You will take models from an idea to a latency-sensitive serving path, and stay rigorous about measurement so that each improvement is real and can be defended. Unknowns are treated as unknown rather than assumed from a related model.

What you will do

Build and improve the routing engine, including task and complexity classification, candidate scoring, and model selection.
Design and run evaluation systems that report honestly on how good the routing decisions are.
Work with large language models and transformer models in production, across fine-tuning, inference, and prompt design.
Turn routing quality into measurable improvements, and keep the rule that a model is only credited on a task once it has been evaluated on it.
Work with backend engineers to ship models into the serving path without compromising latency or reliability.
Investigate routing failures and edge cases, and feed what you learn back into the models and the evaluation suite.

You may be a good fit if you have

A strong machine learning or natural language processing (NLP) background, with hands-on experience building, training, and evaluating models.
Proficiency in Python and the modern machine learning stack, such as PyTorch and the transformers ecosystem.
Experience running large language models in production, not only in research or notebooks.
Rigor about evaluation and measurement, and discomfort with numbers that are not backed by evidence.
The ability to reason about latency, cost, and reliability alongside model quality.
Experience with routing, ranking, optimization, or recommendation systems is useful but not required.
Model distillation or small-model fine-tuning, for example DeBERTa-class classifiers, is a plus.

We encourage you to apply even if you do not meet every point above. Strong candidates rarely match a list exactly, and people from underrepresented groups are more likely to rule themselves out early. If the work interests you, we would rather read your application than have you hold back.

What we offer

Meaningful equity and real ownership of what you build.
A direct hand in a product at the core of AI infrastructure.
A small, senior team that moves quickly.
Substantial engineering and go-to-market problems, not narrow ones.
A team that works together in person, onsite in Bangalore.

How we hire

01An introductory conversation.
02A deeper discussion of the role with the founders.
03A practical assignment or working session.
04A final conversation with the founders.
05An offer, usually within two to three weeks of the first call.

About Inferbase

Inferbase is building the routing layer for AI. Smart routing sends each request to the model that fits best on quality, cost, and latency, and any model runs through one OpenAI-compatible API. The team is small and senior, and the problem sits at the layer most AI applications depend on.

Apply for this role

Start building with smart routing.

Automatically route workloads to the right model for every task, every time.

Start Building Read the docs