Engineering for Local Inference is a SaaS Strategic Necessity

Engineering for Local Inference is a SaaS Strategic Necessity
Architecture Review, Old School

Otherwise the long term economics make no sense at all.

There has been spate of stories in my feeds lately about how adding AI to SaaS products changes the cost and margin structure of SaaS products. The narrative goes like this:

In the early days of software we were in an expanding market. Software was sold as a one-time transaction. There were essentially no costs associated with adding customers or growth in customer usage. Margins were driven by control over development costs. Adding customers expanded profit margins.

Later we got cloud and made everything a subscription service. There were some costs associated with users, but these did not scale linearly and could be controlled with good architecture. Adding customers and their usage still expanded profit margins.

Now we have AI. There are direct costs associated with each AI usage. Since AI is embedded in features, these AI costs scale linearly with customer count and engagement. Adding customers has little to no impact on profit margins, which are now very thin as the direct costs have significantly grown.

You see the issue: SaaS offerings used to have costs which were not linearly scaled by customer acquisition and use. Add AI, now they do. So the margin which used to expand with growth now may remain static, or even shrink as new customers come aboard.

The impact of this is fundamental: SaaS economics used to result in tremendous revenue capture by SaaS companies. Now SaaS + AI economics result in tremendous revenue transfer from SaaS companies to inference providers. Once you add the AI dependency, SaaS is no longer a profit center: it is a customer-driven hamster wheel powering the inference provider.

There are several ways this can go from here, and these scenarios are definitely not mutually exclusive:

Inference providers bleed SaaS dry

Big AI is spending enormous amounts on data centers to host the largest world models, which are expanding in size and capability at significant velocity. One objective is clearly consolidation of SaaS customer interaction into a small number of choke points. These choke points enable incremental monetization of each customer interaction, and allow the inference providers to tax the SaaS providers, whose business depends upon access to inference to survive.

People used to say that no startup should be built on someone else's platform: the moment you become successful they can pull the rug out from under you. The annals of startup history are filled with products which failed as their core value proposition became a mere feature of the host platform they relied on.

By definition, relying on a third party inference provider to power SaaS features is building your startup on someone else's platform. A choice to pay the tax now is simply a choice to wait for the next shoe to drop.

SaaS Atomizes into a Billion Bespoke Implementations

There are lots of stories now about companies, fed up with high SaaS bills, asking Claude to implement a functional replacement for the SaaS offerings whose bills have become troubling. No more do they respect the legacy concern that it is too costly to develop and maintain a bespoke implementation for just your own company: the savings in this present day are often extremely evident and immediate.

In a previous generation we called this impulse for people to do it themselves, to forge a direct relationship with the entity that actually provided the value, disintermediation. Because that is what it is. If much SaaS today is just a wrapper for AI inference provided by a large AI provider, then those SaaS offerings are enabling – intermediating – inference usage for customers who cannot organize that inference usage themselves.

Accepting a reasonable cost of intermediation which enables things that are not core to the business is best practice. Accepting intermediation enabling things that ARE core to the business is fiduciary malpractice.

We are in a phase right now where intermediation in inference is seen by many enterprise leaders as a convenient enabler of experimentation and capability building. AI is strategic and velocity is necessary. If a SaaS tool or provider can provide an important value increment which is beyond their organizational capability today, they may well bite.

But once the necessary organizational capacity is built and the value delivery equation is well understood, intermediation and the relationships required to maintain it will be rightly seen as overheads, to be ruthlessly eliminated. As expertise continues to grow, that impulse for efficiency will drive down into more tactical tools and platforms, until there is no more significant optimization value to be gained.

Inference on the Device Transfers Linear Costs to the End User

Moores Law has driven the software industry for decades. Projects building software that would not run on consumer hardware currently in the market drove generations of PC upgrades. If you have nostalgia for himem.sys settings, or ever upgraded your system for the latest Call of Duty, you have experienced this phenomenon.

In a fast moving environment where there is uncertainty about what the definition of a 'good enough' capability might be, it can difficult for the fearful to see how the cost of inference could be transferred to the consumer or business user. Physical architecture matters. If adding that local inference capability requires expensive add-ons or bulky side-cars, it is less likely to happen, resulting in a very small addressable market of little potential.

But if the implementation of a good enough capability can be accomplished in an incremental fashion, there is a good argument to bet on it. Expansion of memory from 64k -> 640k -> 2MB -> 2 GB and up is a great example of how this has happened before. Increasing video resolution from CGA-> EGA-> VGA -> SVGA-> what we have today is another.

It is my impression (I could be wrong?) there is at least one provider which has intentionally focussed on making good enough local inference capability available at a silicon level in every device they sell. They do not have a world model of their own, for which they are often criticized, but even an underpowered laptop or cell phone from this device provider can provide local inference:

The strategic implication here is obvious. In a world where there are lots of models competing for supremacy, on benchmarks and on inference capability, models by definition are a commodity. The value is in enabling access to inference provided by models, not necessarily building one yourself.

Big AI has focussed on building massive data centers to power model building and provide access to inference. But what if the most powerful and expensive hardware inference engine today will become a miniscule component tomorrow? Moores Law has already accomplished this with generic memory and compute. There is no reason to think this will not happen over time with inference engines as well.

Strategically, as a SaaS provider, this offers an option to restore your previous scalable economics, albeit by exchanging a very large global market for a smaller device-provider-specific market. That tradeoff might not seem very attractive today, but feels a lot like the tradeoff of writing for Windows versus DOS in 1992. Its a much smaller market now, sure, but it has the economics you want and eventually everyone will be there with you.

Conclusion

For SaaS to survive with scalable economics it must do several things, which include:

  • find a way to avoid or remove the inference tax which linearly scales with customer acquisition and engagement growth, destroying the attractive margins which have fueled SaaS investment and growth, and
  • thread the needle as a necessary and important capability companies or individuals will purchase, but will not not decide to simply replicate on their own

If you cannot figure out how to do the first thing in that list, there is no real point in doing the second: the economics of the business will be too unattractive to matter.

For these reasons I do believe engineering for local inference whenever possible is the most important SaaS strategic necessity when it comes to AI implementation in features and value propositions.