Azalio

[In preview] Generally Available: User and group quota reports in Azure NetApp Files

Azalio tdshpsk — Thu, 16 Apr 2026 18:59:55 +0000

For organizations leveraging individual user and group
quotas in Azure NetApp Files to manage capacity on NFS, SMB, and dual-protocol
volumes, the user and group quota reporting feature offers clear
visibility into key metrics such as quota limits, used c

The post [In preview] Generally Available: User and group quota reports in Azure NetApp Files first appeared on Azalio.

Introducing Anthropic’s Claude Opus 4.7 model in Amazon Bedrock

Azalio tdshpsk — Thu, 16 Apr 2026 14:59:48 +0000

Today, we’re announcing Claude Opus 4.7 in Amazon Bedrock, Anthropic’s most intelligent Opus model for advancing performance across coding, long-running agents, and professional work.

Claude Opus 4.7 is powered by Amazon Bedrock’s next generation inference engine, delivering enterprise-grade infrastructure for production workloads. Bedrock’s new inference engine has brand-new scheduling and scaling logic which dynamically allocates capacity to requests, improving availability particularly for steady-state workloads while making room for rapidly scaling services. It provides zero operator access—meaning customer prompts and responses are never visible to Anthropic or AWS operators—keeping sensitive data private.

According to Anthropic, Claude Opus 4.7 model provides improvements across the workflows that teams run in production such as agentic coding, knowledge work, visual understanding,long-running tasks. Opus 4.7 works better through ambiguity, is more thorough in its problem solving, and follows instructions more precisely.

Agentic coding: The model extends Opus 4.6’s lead in agentic coding, with stronger performance on long-horizon autonomy, systems engineering, and complex code reasoning tasks. According to Anthropic, the model records high-performance scores with 64.3% on SWE-bench Pro, 87.6% on SWE-bench Verified, and 69.4% on Terminal-Bench 2.0.
Knowledge work: The model advances professional knowledge work, with stronger performance on document creation, financial analysis, and multi-step research workflows. The model reasons through underspecified requests, making sensible assumptions and stating them clearly, and self-verifies its output to improve quality on the first step. According to Anthropic, the model reaches 64.4% on Finance Agent v1.1.
Long-running tasks: The model stays on track over longer horizons, with stronger performance over its full 1M token context window as it reasons through ambiguity and self-verifies its output.
Vision: the model adds high-resolution image support, improving accuracy on charts, dense documents, and screen UIs where fine detail matters.

The model is an upgrade from Opus 4.6 but may require prompting changes and harness tweaks to get the most out of the model. To learn more, visit Anthropic’s prompting guide.

Claude Opus 4.7 model in action
You can get started with Claude Opus 4.7 model in Amazon Bedrock console. Choose Playground under Test menu and choose Claude Opus 4.7 when you select model. Now, you can test your complex coding prompt with the model.

I run the following prompt example about technical architecture decision:
Design a distributed architecture on AWS in Python that should support 100k requests per second across multiple geographic regions.

You can also access the model programmatically using the Anthropic Messages API to call the bedrock-runtime through Anthropic SDK or bedrock-mantle endpoints, or keep using the Invoke and Converse API on bedrock-runtime through the AWS Command Line Interface (AWS CLI) and AWS SDK.

To get started with making your first API call to Amazon Bedrock in minutes, choose Quickstart in the left navigation pane in the console. After choosing your use case, you can generate a short term API key to authenticate your requests as testing purpose.

When you choose the API method such as the OpenAI-compatible Responses API, you can get sample codes to run your prompt to make your inference request using the model.

To invoke the model through the Anthropic Claude Messages API, you can proceed as follows using anthropic[bedrock] SDK package for a streamlined experience:

from anthropic import AnthropicBedrockMantle
# Initialize the Bedrock Mantle client (uses SigV4 auth automatically)
mantle_client = AnthropicBedrockMantle(aws_region=REGION)
# Create a message using the Messages API
message = mantle_client.messages.create(
    model="anthropic.claude-opus-4-7",
    max_tokens=2048,
    messages=[ 
	    {"role": "user", "content": "Design a distributed architecture on AWS in Python that should support 100k requests per second across multiple geographic regions"}
    ]
)
print(message.content[0].text)

You can also run the following command to invoke the model directly to bedrock-runtime endpoint using the AWS CLI and the Invoke API:

aws bedrock-runtime invoke-model  
 --model-id anthropic.claude-opus-4-7  
 --region us-east-1  
 --body '{"messages": [{"role": "user", "content": "Design a distributed architecture on AWS in Python that should support 100k requests per second across multiple geographic regions."}], "max_tokens": 512, "temperature": 0.5, "top_p": 0.9}'  
 --cli-binary-format raw-in-base64-out  
invoke-model-output.txt

For more intelligent reasoning capability, you can use Adaptive thinking with Claude Opus 4.7, which lets Claude dynamically allocate thinking token budgets based on the complexity of each request.

To learn more, visit the Anthropic Claude Messages API and check out code examples for multiple use cases and a variety of programming languages.

Things to know
Let me share some important technical details that I think you’ll find useful.

Choosing APIs: You can choose from a variety of Bedrock APIs for model inference, as well as the Anthropic Messages API. The Bedrock-native Converse API supports multi-turn conversations and Guardrails integration. The Invoke API provides direct model invocation and lowest-level control.
Scaling and capacity: Bedrock’s new inference engine is designed to rapidly provision and serve capacity across many different models. When accepting requests, we prioritize keeping steady state workloads running, and ramp usage and capacity rapidly in response to changes in demand. During periods of high demand, requests are queued, rather than rejected. Up to 10,000 requests per minute (RPM) per account per Region are available immediately, with more available upon request.

Now available
Anthropic’s Claude Opus 4.7 model is available today in the US East (N. Virginia), Asia Pacific (Tokyo), Europe (Ireland), and Europe (Stockholm) Regions; check the full list of Regions for future updates. To learn more, visit the Claude by Anthropic in Amazon Bedrock page and the Amazon Bedrock pricing page.

Give Anthropic’s Claude Opus 4.7 a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

— Channy

The post Introducing Anthropic’s Claude Opus 4.7 model in Amazon Bedrock first appeared on Azalio.

The two-pass compiler is back – this time, it’s fixing AI code generation

Azalio tdshpsk — Thu, 16 Apr 2026 09:59:25 +0000

If you came up building software in the 1990s or early 2000s, you remember the visceral satisfaction of determinism. You wrote code. The compiler analyzed it, optimized it, and emitted precisely the machine instructions you expected. Same input, same output. Every single time. There was an engineering rigor to it that shaped how an entire generation of developers thought about building systems.

Then large language models (LLMs) arrived and, almost overnight, code generation became a stochastic process. Prompt an AI model twice with identical inputs and you’ll get structurally different outputs—sometimes brilliant, sometimes subtly broken, occasionally hallucinated beyond repair. For quick prototyping that’s fine. For enterprise-grade software—the kind where a misplaced null check costs you a production outage at 2am—it’s a non-starter.

We stared at this problem for a while. And then something clicked. It felt familiar, like a pattern we’d encountered before, buried somewhere in our CS fundamentals. Then it hit us: the two-pass compiler.

A quick refresher

Early compilers were single-pass: read source, emit machine code, hope for the best. They were fast but brittle—limited optimization, poor error handling, fragile output. The industry’s answer was the multi-pass compiler, and it fundamentally changed how we build languages. The first pass analyzes, parses, and produces an intermediate representation (IR). The second pass optimizes and generates the final target code. This separation of concerns is what gave us C, C++, Java—and frankly, modern software engineering as we know it.

The structural parallel between classical two-pass compilation and AI-driven code generation.

WaveMaker

The analogy to AI code generation is almost eerily direct. Today’s LLM-based tools are, architecturally, single-pass compilers. You feed in a prompt, the model generates code, and you get whatever comes out the other end. The quality ceiling is the model itself. There’s no intermediate analysis, no optimization pass, no structural validation. It’s 1970s compiler design with 2020s marketing.

Applying the two-pass model to AI code generation

Here’s where it gets interesting. What if, instead of asking an LLM to go from prompt to production code in one shot, you split the process into two architecturally distinct passes—just like the compilers that built our industry?

Pass 1 is where the LLM does what LLMs are genuinely good at: understanding intent, decomposing design, and reasoning about structure. The model analyzes the design spec, identifies components, maps APIs, resolves layout semantics—and emits an intermediate representation, an IR. Not HTML. Not Angular or React. A well-defined meta-language markup that captures what needs to be built without committing to how.

This is critical. By constraining the LLM’s output to a structured meta-language rather than raw framework code, you eliminate entire categories of failure. The model can’t inject malformed tags if it’s not emitting HTML. It can’t hallucinate nonexistent React hooks if it’s outputting component descriptors. You’ve reduced the stochastic surface area dramatically.

Pass 2 is entirely deterministic. A platform-level code generator—no LLM involved—takes that validated intermediate markup and emits production-grade Angular, React, or React Native code. This is the pass that plugs in battle-tested libraries, enforces security patterns, and applies framework-specific optimizations. Same IR in, same code out. Every time.

First pass gives you speed. Second pass gives you reliability. The separation of concerns is what makes it work.

Why this matters now

The advantages of this architecture compound in exactly the ways that matter for enterprise development. The meta-language IR becomes your durable context for iterative development—you’re not re-prompting the LLM from scratch every time you refine a component. Security concerns like script injection and SQL injection are structurally eliminated, not patched after the fact. Hallucinated properties and tokens get caught and stripped at the IR boundary before they ever reach generated code. And because Pass 2 is deterministic, you get reproducible, auditable, deployable output.

Pass 1 — LLM-powered

• Translates design/spec to structured components and design tokens
• Enables iterative dev with meta-markup as persistent context

Eliminates script/SQL injection by design

Pass 2 — Deterministic

• Generates optimized, secure, performant framework code
• Validates and strips hallucinated markup and tokens

Plugs in battle-tested libraries for reliability

If you’ve spent your career building systems where correctness isn’t optional, this should resonate. The industry spent decades learning that single-pass compilation couldn’t produce reliable software at scale. The two-pass architecture wasn’t just an optimization, but an engineering philosophy: separate understanding from generation, validate before you emit, and never let a single phase carry the entire burden of correctness.

We’re at the same inflection point with AI code generation right now. The models are powerful. The architecture around them has been naive. The fix isn’t to wait for a smarter model. It’s to apply the engineering discipline we’ve always known, and build systems where stochastic brilliance and deterministic reliability each do what they do best—in the right pass, at the right time.

Deterministic software engineering is cool again. Turns out it never really left.

—

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

The post The two-pass compiler is back – this time, it’s fixing AI code generation first appeared on Azalio.

Ease into Azure Kubernetes Application Network

Azalio tdshpsk — Thu, 16 Apr 2026 09:59:23 +0000

If you’re using Kubernetes, especially a managed version like Azure Kubernetes Service (AKS), you don’t need to think about the underlying hardware. All you need to do is build your application and it should run, its containers managed by the service’s orchestrator.

At least that’s the theory. However, implementing a platform that abstracts your code from the servers and network that support it brings its own problems, and a whole new discipline. Platform engineers fill the gap between software and hardware, supporting security and networking, as well as managing storage and other key services.

Kubernetes is part of an ecosystem of cloud-native services that provide the supporting framework for running and managing scalable distributed systems, including the tools needed to package and deploy applications, as well as components that extend the functionality of Kubernetes’ own nodes and pods.

Key components of this growing ecosystem are the various service meshes. These offer a way to manage connectivity between nodes and between your applications and the outside network, with tools for handling basic network security. Often implemented as “sidecar” containers, running alongside Kubernetes pods, these network proxies can consume added resources as your applications scale. That means more configuration and management, ensuring that configurations are kept up-to-date and that secrets are secure.

Istio goes ambient

One of the key service mesh implementations, Istio, has developed an alternate way of operating, what the project calls “ambient mode”. Here, instead of having individual sidecars for each pod, your service mesh is implemented as per-node proxies or as a single proxy that supports an entire Kubernetes namespace. It’s an approach that allows you to start implementing a service mesh without increasing the complexity of your platform, making it easy to go from a basic development Kubernetes implementation to a production environment without having to change your application pods.

It’s called ambient mode because there’s no need to add new service mesh elements as your application scales. Instead, the service mesh is always there, and your pods simply join it and take advantage of the existing configuration. The resulting implementation is both easier to use and easier to understand.

Microsoft has used Istio as part of Azure Kubernetes Service for many years. Istio is one of a suite of open-source tools that provide the backbone of Azure’s cloud-native computing platform.

Introducing Azure Kubernetes Application Network

So, it’s not surprising to learn that Microsoft is using Istio’s ambient mesh as the basis of Azure Kubernetes Application Network. The new service (available in preview) allows application developers to add managed network services to their applications without needing the support of a platform engineering team to implement a service mesh. It will even help you migrate away from the now-deprecated ingress-nginx by providing access to the recommended Kubernetes Gateway API without needing more sidecars and letting you use your existing ingress-nginx configurations while you complete your migration.

Microsoft describes the preview of Azure Kubernetes Application Network as “a fully managed, ambient-based service network solution for Azure Kubernetes Service (AKS).” The underlying data and control planes are managed by AKS, so all you need to do is connect your AKS clusters to an Application Network and AKS will then manage the service mesh for you, without any changes to your applications.

Like other implementations of Istio’s ambient mesh, there are two levels to Application Network: a core set of node-level application proxies that handle connectivity and security for application services, and an optional set of lower-level proxies that support routing and apply network policies, acting as a software-defined network inside your Kubernetes environment.

This approach lets you build and test a Kubernetes application on your local development hardware without using Application Network features, then deploy it to AKS along with the required network configuration — simplifying both development and deployment. It also reduces development overheads, both in compute and developer resources.

Using Azure Kubernetes Application Network

Once deployed Application Network connects the services in your application securely, managing encrypted connections automatically and managing the required certificates. It can support unencrypted connections, for when you aren’t sending confidential data and don’t need the associated overhead. As the service is managed by AKS, new pods are automatically provisioned as they are deployed, with the ambient mesh supporting both scale-up and scale-down operations.

The architecture of Application Network is much like that of an Istio ambient mesh. The main difference is that the service’s management and control planes are managed by Azure, with application owners limited to working with the service’s data plane, configuring operations and setting policies for their application workloads. Azure’s control of the management plane automates certificate management, ensuring that connections stay secure and there is little risk of certificate expiration, using the tools built into Azure Key Vault.

The Application Network data plane holds proxies and gateways used by the service mesh, and these are deployed when the service is launched, along with the required Kubernetes configurations. The key to operation is ztunnel, a proxy that intercepts inter-service requests, secures the connection, and routes requests to another ztunnel running with the destination service. A gateway oversees connections between ztunnels running in remote clusters, allowing your service mesh to scale out with demand.

Building your first ambient service mesh in AKS

Getting started with Azure Kubernetes Application Network requires the Azure CLI. If you’re working with an existing AKS cluster, then you will need to enable integration with Microsoft Entra and enable OpenID Connect.

As the Application Network service is in preview, start by registering it in your account. This can take some time, but once it’s registered you can install the AppNet CLI extension that’s used to manage and control Application Network for your AKS clusters. You can now start to set up the ambient service mesh, either creating new clusters to use it, or adding the service mesh to existing AKS deployments.

Starting from scratch is the easiest way, as it ensures that you’re running in the same tenant. AKS clusters and Application Network can be in the same resource group if you want, but it’s not necessary. You’re free to use separate resource groups for management.

The appnet command makes it easy to create an Application Network from the command line; all you need is a name for the network, a resource group, a location, and an identity type. Once you’ve run the command to create your ambient mesh, wait for the mesh to be provisioned before joining a cluster to your network. This again simply needs a resource group, a name for the member cluster, and its resource group and cluster name. At the same time, you define how the network will be managed, i.e. whether you manage upgrades yourself or leave Azure to manage them for you. Additional clusters can be added to the network the same way.

With an Application Network and member clusters in place, the next step is to use Kubernetes’ own tooling to add support for the ambient mesh to your applications. Microsoft provides a useful example that shows how to use Application Network with the Kubernetes Gateway API to manage ingress. You need to use kubectl and istioctl commands to enable gateways and verify their operation, adding services and ensuring that they are visible to each other through their respective ztunnels.

Securing applications with policies

Policies can be used to control access from the application ingress to specific services as well as between services, reducing the risk of breaches and ensuring that you control how traffic is routed in your application. These policies can be locked down to ensure only specific methods can be used, so only allowing HTTP GET operations on a read-only service, and POST where data needs to be delivered. Other options can be used to enforce OpenID Connect authorization at a mesh level.

Not all Azure Kubernetes clusters are supported in the preview, which is only available in Azure’s largest regions. For now, Application Network won’t work with private clusters or with Windows node pools. Once running you can’t switch upgrade modes, and as it’s based on Istio, you can’t enable Istio service meshes in your cluster. These requirements aren’t showstoppers, and you should be able to get started experimenting with the service as it’s still in preview.

AKS Application Network is a powerful tool that helps simplify and secure the process of building and running inter-cluster networks in an AKS application. As it is an ambient service, it’s possible to scale as necessary, and can help provide secure bridges between clusters. By working at a Kubernetes level, it’s possible to use Application Network to provide policy driven production network rules, allowing developers to build and test code in unrestricted environments before moving to test and production clusters.

As Application Network uses familiar Kubernetes and Istio constructions, it’s possible to build configurations into Helm charts and other deployment tools, ensuring configurations are part of your build artifacts and that network configurations and policies are delivered with your code every time you push a new build – without needing platform engineering support.

The post Ease into Azure Kubernetes Application Network first appeared on Azalio.

The agent tier: Rethinking runtime architecture for context-driven enterprise workflows

Azalio tdshpsk — Thu, 16 Apr 2026 09:59:22 +0000

Most large enterprises run on deterministic software foundations. Business rules are embedded within workflows, state transitions are modeled explicitly and escalation paths are defined in advance. System behavior is specified in advance, making outcomes predictable. Meaningful scenarios are encoded as conditional branches and validated before release. For decades, this approach has delivered the reliability and control required for mission-critical operations.

This model assumes most situations can be anticipated and expressed in logic. It works well when variation is limited and conditions remain manageable. If new requirements can be added as workflow branches, the structure holds. It begins to strain when processes must respond to context — not just thresholds, but the broader circumstances of a case.

In my experience, customer onboarding in banking makes this tension visible. Onboarding sits at the intersection of digital channels, fraud detection, regulatory obligations and revenue goals. It must satisfy Know Your Customer (KYC) and Anti-Money Laundering (AML) requirements while minimizing abandonment and resisting synthetic identity attacks.

During my involvement in digital account opening initiatives at a major North American bank, cross-functional design sessions repeatedly surfaced the same trade-off. Product teams pushed to reduce friction and improve conversion while fraud teams responded to bot-driven account creation and mule schemes with additional safeguards. Compliance insisted regulatory standards be met without exception and engineering absorbed each new requirement into the orchestration framework. Individually, these decisions were rational. Collectively, they made the workflow more complex.

The underlying challenge was not a shortage of rules but expressing contextual judgment within a static branching structure. Differentiation occurred only at predefined checkpoints and information was often collected in bulk rather than adapting to known facts. Collect too little and the institution risks regulatory exposure or fraud; collect too much and abandonment rises. Attempt to encode every variation as additional branches and the workflow becomes increasingly fragile.

Adaptive scoring and contextual models can complement deterministic logic. Rather than enumerating every scenario in advance, they help determine whether additional verification is warranted or whether progression can continue with existing evidence. Deterministic workflows still enforce regulatory requirements and final state transitions; the adaptive layer informs how the system navigates toward those outcomes.

Although onboarding illustrates the issue clearly, the same pattern appears in credit adjudication, claims processing and dispute management. As adaptive signals enter these workflows, the architectural question shifts from adding branches to deciding where contextual judgment should reside. In my view, what is missing is not another conditional path but a different runtime model — one that interprets context and determines the next appropriate action within defined limits. This architectural layer, which I refer to as the Agent Tier, separates contextual reasoning from deterministic execution.

Introducing the agent tier: Separating execution from contextual judgment

In many enterprises, orchestration logic does not reside in a formal workflow platform. It is embedded in SPA applications, implemented in APIs, supported by rule engines and coordinated through service calls across systems. User journeys are assembled through API calls in predefined sequences, with eligibility or routing conditions evaluated at specific checkpoints.

This approach works well for repeatable, well-understood paths. When inputs are complete, risk signals are low and no exception handling is required, the clean path can be executed deterministically. State transitions are known in advance. Service calls follow predictable patterns. Human tasks are invoked at predefined points.

The difficulty arises when the workflow encounters ambiguity. Inputs may be incomplete. Signals may require interpretation rather than simple threshold comparison. Multiple systems may need to be coordinated in a sequence not explicitly modeled. Attempting to encode every such situation into SPA logic or orchestration APIs leads to increasingly complex condition trees and harder-to-maintain code. Instead of expanding hard-coded branching indefinitely, the runtime separates into two complementary lanes: Repeatable execution and contextual reasoning.

Conceptually, the enterprise runtime evolves into a two-lane structure, illustrated below.

Nitesh Varma

The deterministic lane retains control over authoritative state changes and rule enforcement. It manages eligibility checks, applies regulatory criteria, invokes known service sequences and finalizes cases in core systems. It continues to handle most predictable scenarios.

The runtime invokes the Agent Tier when contextual judgment is required. This may occur when additional evidence must be gathered before a rule can be evaluated, when multiple signals must be interpreted together rather than independently or when coordination across systems cannot be expressed through a fixed sequence. It evaluates available actions and returns a bounded recommendation that allows deterministic execution to resume.

The movement between lanes is explicit. The deterministic workflow hands off when it reaches a point where static branching is insufficient. The Agent Tier performs synthesis or dynamic coordination. Once the Agent Tier produces a structured result, such as a completed evidence bundle, a validated set of inputs or a recommended next step, control returns to the deterministic lane for controlled progression and final state transition.

This separation allows incremental adoption. Existing SPA logic and orchestration APIs remain intact; ambiguity points can be redirected to the Agent Tier without destabilizing deterministic execution.

What happens inside the agent tier

The Agent Tier is not a single “AI decision.” It is a structured reasoning cycle that combines interpretation with controlled action.

When the deterministic workflow hands off a case, the Agent Tier interprets the current situation by assembling available context — user inputs, existing customer relationships, fraud signals, journey state and relevant policy constraints. Based on that composite view, it selects the next action from an approved set of enterprise capabilities. That action might involve retrieving additional information, invoking a verification service, requesting clarification from the user or coordinating multiple systems in sequence. Once the action completes, the result is evaluated and the cycle continues until deterministic execution can resume.

This alternating pattern of reasoning and action is common in agentic system design. In technical literature, it is often referred to as the ReAct (Reason and Act) pattern, which interleaves reasoning steps with structured action selection. Rather than attempting to reach a final answer in a single pass, the system gathers evidence, reassesses its position and proceeds incrementally. In enterprise settings, this pattern becomes a disciplined way to manage contextual interpretation.

Reasoning in the Agent Tier does not involve free-form system access. It proceeds through approved operations exposed via governed interfaces. In practice, these tools are enterprise primitives such as:

APIs that retrieve or update enterprise data
event triggers that initiate downstream processing
workflow actions that advance a case
controlled service calls into core or third-party systems

Each operation is defined by explicit input/output contracts and permission boundaries and carries metadata describing its purpose and constraints. The runtime selects from this governed catalog — a mechanism commonly referred to as tool calling. Some frameworks further group related tools into higher-level capabilities known as skills, reusable functions for objectives such as identity verification or KYC evidence assembly.

Before control returns to the deterministic lane, the agentic runtime can also perform a structured self-check. It can verify that required conditions are satisfied, confirm alignment with policy constraints and ensure that any necessary approvals have been identified. In technical discussions, this is often described as reflection.

Taken together, these patterns do not introduce unchecked autonomy. They provide a structured way to manage contextual synthesis and dynamic coordination without allowing adaptive logic to diffuse across SPA code and orchestration services. Deterministic systems continue to enforce authoritative state transitions. The Agent Tier prepares the conditions under which those transitions occur.

In many implementations, the Agent Tier does not directly control the workflow. Instead, it recommends the next step based on the available context. The deterministic tier remains responsible for execution. After each step is completed — retrieving evidence, invoking a verification service or preparing a review case — the updated context is returned to the Agent Tier, which evaluates the new state and recommends the next action. In this model, contextual reasoning informs progression while deterministic systems continue to enforce authoritative state transitions.

Returning to the onboarding example, the Agent Tier changes how the journey adapts to each applicant. The deterministic tier still executes core steps such as creating the customer profile, enforcing regulatory checks and committing account state in core systems. The Agent Tier evaluates the evolving context — customer relationships, fraud signals, identity verification results and available documentation — and recommends whether the workflow can proceed along the clean path, trigger additional verification or escalate to manual review. The result is not a new onboarding process but a workflow that adapts its progression dynamically while preserving the deterministic controls required for regulated operations.

Conceptually, the interaction between contextual reasoning and deterministic execution can be understood as a simple runtime loop, as illustrated below.

Nitesh Varma

The workflow progresses through a continuous loop in which contextual reasoning recommends the next step, deterministic systems execute it and the resulting context feeds back into the next recommendation.

Governing adaptive systems without losing control

Separating contextual reasoning from deterministic execution clarifies responsibility but does not eliminate risk. In regulated environments, adaptive sequencing must operate within explicit governance boundaries.

The trust and operations overlay represents cross-cutting controls across the runtime: Audit logging, approval gates, observability, security enforcement and lifecycle management. Within this structure, authoritative state transitions remain deterministic. Core systems continue to create client profiles, enforce limits, record disclosures and apply regulatory thresholds. The Agent Tier may influence progression, but final state changes occur only through controlled interfaces.

This containment boundary preserves explainability. When progression changes — for example, when additional verification is triggered or escalation occurs — institutions must be able to reconstruct why. Which signals were assembled? Which tools were invoked? What reasoning produced the recommendation? Concentrating contextual evaluation within a defined runtime layer makes that traceability possible.

Operational experience reinforces the need for these guardrails. Engineering discussions of production agent systems emphasize constrained tool access, explicit action catalogs, bounded iteration and strong observability. In enterprise environments, contextual reasoning must likewise operate through governed tools and visible control points.

Approval gates remain part of this structure. High-risk actions such as credit issuance, account restrictions, large payments or regulatory filings may still require human authorization regardless of how the progression was determined. Reflection inside the Agent Tier can validate readiness, but authorization remains explicit.

Lifecycle discipline is equally important. Changes to models, identity providers, tool contracts or orchestration logic can alter workflow behavior. The Agent Tier should therefore operate as a governed platform capability with versioned reasoning logic, controlled tool catalogs and defined testing and rollback mechanisms.

The objective is not to eliminate probabilistic reasoning but to contain it within observable workflows and governed boundaries. As adaptive capabilities expand, the architectural question is not whether contextual reasoning will exist, but whether it is diffused across the stack or concentrated within a controlled runtime layer.

Architectural leadership in an adaptive era

Introducing an Agent Tier adds a new runtime component, but enterprise complexity is not new; it is already dispersed across channel code, orchestration services, rule engines and proliferating conditional branches. The architectural question is not whether complexity exists, but where it resides. As fraud models evolve, verification technologies improve and regulatory expectations shift, adaptive capabilities will continue to expand.

I believe architecture must evolve from enumerating state transitions to defining containment boundaries. Deterministic systems enforce regulatory and operational requirements and remain responsible for authoritative state changes. Adaptive reasoning operates within explicit policy constraints and informs how workflows progress toward those outcomes. Instead of encoding every possible path in advance, enterprises can move toward context-driven workflows in which deterministic execution handles authoritative actions while the Agent Tier determines the next appropriate step based on evolving context.

This evolution does not require wholesale reinvention. It can begin with a single high-impact workflow where contextual variability is already evident. By introducing a disciplined runtime layer that mediates uncertainty while preserving deterministic control, organizations can modernize incrementally. In that sense, the Agent Tier is not simply a new feature; it is a structural response to a changing runtime reality, one that allows adaptive systems to operate within clear architectural and governance boundaries.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

The post The agent tier: Rethinking runtime architecture for context-driven enterprise workflows first appeared on Azalio.

MuleSoft Agent Fabric adds new ways to keep AI agents in line

Azalio tdshpsk — Wed, 15 Apr 2026 18:59:17 +0000

Salesforce first sought to tackle AI agent sprawl last year with Agent Fabric, a suite of capabilities and tools inside its MuleSoft AnyPoint Platform. Now, it’s seeking to further rein in unruly AI agents on its platform and those of other vendors too, with new governance tools and deterministic controls.

When enterprises adopt multiple agentic AI products, they can end up redundant or siloed workflows or scattered across teams and platforms, undermining operational efficiency and complicating governance as they try to scale AI safely and responsibly.

Agent Fabric, introduced in September 2025, started out as a place for enterprises to register, view, interconnect and govern agents. In January it added a deterministic scripting tool and the ability to scan for new agents and add them to the registry.

But enterprises still need more help to bring their AI agents under control, so Salesforce is adding more features.

First up is an expansion of the deterministic controls in the form of Agent Script for Agent Broker, an intelligent routing service inside Agent Fabric that is designed to connect agents across domains, dynamically matching user tasks with the best-fit agent. Salesforce said the controls will help developers codify workflows in multi-agent systems in order to ensure consistent and reliable outputs.

Rather than leave probabilistic agents to make all the decisions about how to resolve a problem, introducing an element of unpredictability, Agent Script for Agent Broker enables enterprises to steer some of the decision-making according to predetermined rules that require fewer computing resources than running a large language model.

That’s welcome news for Robert Kramer, managing parter at KramerERP.

“Pure autonomous agents don’t necessarily work in production as enterprises need to ensure predictable outcomes. The deterministic controls should facilitate a secure handoff of control and rules while still allowing the model to engage in reasoning when it’s appropriate,” he said. “It’s a balance between control and flexibility, which is the norm for most real deployments.”

For Rebecca Wettemann, principal analyst at Valoir, providing both deterministic and probabilistic options within Agent Fabric enables developers and agent builders to take the lower-cost route to more accurate and predictable results from agentic systems.

Enterprises will have to wait to put this deterministic orchestration feature into production, though: Still in beta testing, it won’t be generally available until June 2026.

Centralized LLM governance tackles cost

Beyond orchestration, Salesforce has added a new LLM Governance capability in AI Gateway, the control layer within Agent Fabric that provides centralized visibility of token usage, costs, and data flows for third-party model.

Enterprises will be able to use LLM Governance, now generally available, to help them keep their AI operations on budget, Salesforce said.

This is becoming increasingly important as CIOs seek to bring disparate AI systems under centralized control and justify spiralling AI costs.

Info-Tech Research Group advisory fellow Scott Bickley warned that without centralized governance like this, different teams around a company may choose different models, negotiate their own API contracts, and manage token budgets locally.

“This results in sprawling costs, inconsistent security postures, and no enterprise-wide policy enforcement,” he said. “By positioning AI Gateway as the choke point through which all LLM traffic flows, enterprises gain visibility into AI usage patterns, the models in use, purpose of the usage, and cost data.”

MCP additions simplify integration

Salesforce is also adding new Model Control Protocol features, including MCP Bridge to make it easier to access legacy APIs, and Informatica-hosted MCPs, that it says will simplify how agents interact with enterprise data and APIs.

These could save developers time and simplify the building of cross-environment, multi-agent systems.

Bickley said MCP Bridge will help enterprises with thousands of legacy APIs (REST, SOAP, GraphQL) built long before MCP existed.

“Agents speaking MCP cannot call those APIs natively so they require wrappers around the API endpoint; this would be a massive engineering lift. MCP Bridge allows these APIs to be exposed as MCP-compatible tools without modifying the underlying code,” he said.

And Wettemann said Informatica-hosted MCPs will further reduce development overhead by bringing built-in data quality and governance capabilities into agent workflow, particularly critical for enterprises in regulated industries and those with heightened risk concerns.

But Bickley added a note of caution. “APIs can behave oddly and have their own nuanced behavior,” he said. “Enterprises should test how MCP Bridge handles edge cases.”

Informatica-hosted MCPs will not be a miracle solution either, he warned: “Even if the Informatica data quality and governance capabilities are cleanly integrated in the Agent Fabric registry, these are not instantaneous operations. Checking data fields for accuracy, deduplication, and cross-system matching take time and carry latency measured in milliseconds or even multiple seconds, and that is pre-integration.”

A pivot for MuleSoft?

Bickley sees the updates as a broader strategy for Salesforce to reposition MuleSoft, which it acquired in 2018 for $5.7 billion, from a traditional API integration platform to an infrastructure layer for enterprise AI agents.

By layering orchestration, governance, and connectivity into Agent Fabric, Salesforce appears to be trying to position MuleSoft as the system of record for how agents are discovered, routed, and governed across the enterprise, deepening its role beyond API management into core AI infrastructure, he said.

Not all CIOs will welcome that move.

“If your agent control plane runs on Agent Fabric, switching costs rise materially, and the more agents you register, the more orchestration rules and governance policies defined, the more difficult it becomes to move to an alternative solution,” the analyst said.

As with any critical infrastructure dependency, “CIOs need to ask: What is the exit path? What components of Agent Fabric are portable and what is locked in? What’s the pricing model? What is the integration depth with non-Salesforce agents and data sources?” he said.

For now, though, enterprises have plenty of AI agent orchestration options to choose from.

The post MuleSoft Agent Fabric adds new ways to keep AI agents in line first appeared on Azalio.

Retirement: End of lift reminder of HBv2/HC-Series/NP-Series Azure Virtual Machine in Azure Batch Pool

Azalio tdshpsk — Wed, 15 Apr 2026 18:00:29 +0000

Microsoft Azure will retire support for HBv2-series,
HC-series, and NP-series VMs in Azure Batch pools on May 31, 2027,
including:
HBv2-series: 120 AMD EPYC 7V12 vCPUs, 480 GB RAM, 200
Gb/s HDR InfiniBand
HC-series: 44 Intel Xeon Platinum 8168

The post Retirement: End of lift reminder of HBv2/HC-Series/NP-Series Azure Virtual Machine in Azure Batch Pool first appeared on Azalio.

[Launched] Generally Available: Encrypt Premium SSD v2 and Ultra Disks with Cross Tenant Customer Managed Keys

Azalio tdshpsk — Wed, 15 Apr 2026 18:00:11 +0000

Cross-tenant customer-managed
keys (CMK) for Premium SSD v2 and Ultra Disks are
now generally available. This capability allows managed disks to be
encrypted using a customer-managed key stored in an Azure Key Vault
located in a different Microsoft Entra

The post [Launched] Generally Available: Encrypt Premium SSD v2 and Ultra Disks with Cross Tenant Customer Managed Keys first appeared on Azalio.

Salesforce launches Headless 360 to support agent‑first enterprise workflows

Azalio tdshpsk — Wed, 15 Apr 2026 12:59:49 +0000

Salesforce is packaging its developer and AI tooling, including its vibe coding environment Agentforce Vibes, into a new platform named Headless 360, designed to help enterprise teams build agent-first workflows.

The CRM software provider defines agent-first workflows as enterprise processes in which software agents, rather than human users, carry out tasks by directly invoking APIs, tools, and predefined business logic.

To support this approach, Headless 360 exposes Salesforce’s underlying data, workflows, and governance controls as APIs, MCP tools, and CLI commands, via its existing offerings, such as Data 360, Customer 360, and Agentforce, Joe Inzerillo, president of AI technology at Salesforce, said during a press briefing.

This allows agents to operate directly on the platform’s existing business logic and datasets, rather than relying on separate integrations or user interfaces, Inzerillo added.

Push to become a control layer for enterprise AI agents

Analysts, however, see Headless 360 as an effort by Salesforce to position itself as a central layer for managing agent-driven operations across different business functions in enterprises, moving from a system of record to being the system of execution.

“Salesforce knows the center of gravity is moving toward coding agents, conversational interfaces, agent harnesses, and external runtimes, so it is trying to keep Salesforce relevant as the system underneath,” said Dion Hinchcliffe, VP of the CIO practice at The Futurum Group.

With Headless 360, Hinchcliffe added, Salesforce is trying to move its positioning beyond “AI agents inside Salesforce” to framing “Salesforce as a programmable platform for agents operating across external tools, interfaces, and environments.”

Risks around lock-in, operational gaps

Analysts warn that CIOs need caution before adopting Headless 360.

“Salesforce’s ‘System of’ framework pitch with Headless 360 is the ultimate vendor lock-in architecture,” said Scott Bickley, advisory fellow at Info-Tech Research Group.

“Context (Data 360), Work (Customer 360), Agency (Agentforce), Engagement (Slack) are all required, according to Salesforce, and only they can provide them in an integrated manner. This is the strategy pitch awaiting CIOs, and frankly, it’s not true,” Bickley noted.

Bickley further pointed out that modern data stacks can replicate much of Headless 360’s functionality with more flexibility and less vendor concentration.

There are other issues that Bickley thinks should worry CIOs: “There is no mention of cost or the underlying licensing model for this ‘headless’ experience. Are all tools included at no cost?”

“Salesforce’s MO seems to be to announce new capabilities that require SKUs. CIOs should be asking about pricing now, before building in architectural dependencies on features that might land in a premium cost tier,” Bickley cautioned.

Also, the analyst pointed out that Salesforce’s announcement is silent on SLAs for operations such as MCP tool calls, which matter materially for real-time agent workflows.

Incremental gains for developers despite broader concerns

Despite these concerns, Bickley sees some of the new Headless 360 features, although undifferentiated from the competition, as offering practical benefits for developers in their daily tasks.

The analyst was referring to newer updates, such as new MCP tools that give external coding agents full access to Salesforce’s platform, the DevOps Center MCP, the Agentforce Experience Center, and newer governance features.

Enabling full access to external coding agents, such as Claude Code and Codex, in particular, Bickley said, helps Salesforce to meet the developer where they are or let them continue using the tool of their choice.

“Historically, developers were forced into Salesforce’s proprietary toolchain that included clunky VS Code extensions, painful metadata APIs, and quirky development pipelines that required Salesforce-specific expertise. Expanding the dev environment helps alleviate this pain,” Bickley pointed out.

The other updates, according to Hinchcliffe, should help curtail developer friction by helping avoid frequent switching between development tools, expanding real-time awareness of organization data, reducing the need for custom plumbing to expose business logic, and decreasing the effort needed to move from prototype to deployment.

Focusing specifically on the new DevOps Center MCP, which is a set of AI-powered tools that enable the use of natural language across the entire DevOps lifecycle, Bickley said that it will help developers alleviate pains around CI/CD processes.

“Salesforce development pipelines are notoriously fragile with metadata dependencies, org-specific configurations, artificial limits on work items, and UI response issues, among others,” Bickley added.

The governance tools, specifically the updates to the Testing Center, Custom Scoring Evals, Session Tracing, and A/B Testing API, according to Hinchcliffe, too, address real gaps that enterprise development teams face, especially moving agentic workflows or applications into production.

“Salesforce is correctly identifying that enterprise agent adoption will stall unless buyers can properly measure, govern, debug, and tune agent behavior over time,” the analyst said.

Concerns around the maturity of governance capabilities

However, Bickley cautioned about the efficacy of these tools, as most of these tools are in the very early stages of their release. In fact, the analyst suggested that enterprises should expect to supplement these tools with their own evaluation frameworks for the next 12-18 months.

The analyst also flagged additional concerns around newer components such as the Agentforce Experience Layer, which is a new UI service that allows developers to decouple what an agent does from how it surfaces across various services and applications.

“Ironically, this adds yet another layer to contend with in the development process for what is already considered a painful development experience. Salesforce has a pattern of shipping v1 tools that work great in demos but fall in real-world scenarios,” Bickley said.

“Development teams intending to avail themselves of these new feature sets should insist that Salesforce provide them an extended pilot and sandbox free of charge to validate the maturity level and ease of use of these new features,” Bickley added.

All the updates to Headless 360, Salesforce said, are expected to be released in phases. Generally available features include Agentforce Vibes 2.0, the DevOps Center MCP, Session Tracing, and the Agentforce Experience Layer. Features that are in early access include Custom Scoring Evals. Other features, such as the Testing Center and the Salesforce Catalog, are scheduled for rollout in May and June, respectively.

The post Salesforce launches Headless 360 to support agent‑first enterprise workflows first appeared on Azalio.

Tap into the AI APIs of Google Chrome and Microsoft Edge

Azalio tdshpsk — Wed, 15 Apr 2026 09:59:39 +0000

With every passing year, local AI models get smaller, more efficient, and more comparable in power with their higher-end, cloud-hosted counterparts. You can run many of the same inference jobs on your own hardware, without needing an internet connection or even a particularly powerful GPU.

The hard part has been standing up the infrastructure to do it. Applications like ComfyUI and LM Studio offer ways to run models locally, but they’re big third-party apps that still require their own setup and maintenance. Wouldn’t it be great to run local AI models right in the browser?

Google Chrome and Microsoft Edge now offer that as a feature, by way of an experimental API set. With Chrome and Edge, you can perform a slew of AI-powered tasks, like summarizing a document, translating text between languages, or generating text from a prompt. All of these are accomplished with models downloaded and run locally on demand.

In this article I’ll show a simple example of Chrome and Edge’s experimental local AI APIs in action. While both browsers are in theory based on the same set of experimental APIs, they do support different varieties of functionality, and use different models. For Chrome, it’s Gemini Nano; for Edge, it’s the Phi-4-mini models.

The following demo of the Summarizer API works on both browsers, although the performance may differ between them. In my experience, Summarizer ran significantly slower on Edge.

The available AI APIs in Chrome and Edge

Chrome and Edge share a common codebase — the Chromium project — and the AI APIs available to both stem from what that project supports. As of April 2026, the available AI APIs in Chrome are:

Translator API: Translate text from one language to another, assuming a model is available for that language pair.
Language Detector API: Determine the language for a given input text.
Summarizer API: Condense text into headlines, summaries, and bullet-point rundowns.

All three of these APIs are available immediately to Chrome users. All except the language detector API are also available to Edge users, although that is planned for future support.

Several other APIs, which are in a more experimental state, are available in both browsers on an opt-in basis:

Writer API: Generate text from a given prompt.
Rewriter API: Rewrite an existing text based on instructions from a prompt.
Prompt API: Make natural language requests directly to the model (e.g., “Search the web for up-to-date information about visiting Italy”).
Proofreader API: Examine a text for spelling and grammatical errors and suggest corrections.

The long-term ambition is to have these APIs accepted as general web standards, but for now they’re specific to Chrome and Edge.

Using the Summarizer API

We’ll use the Summarizer API as an example for how to use these APIs generally. The Summarizer API is available on both Chrome and Edge, and the way it’s used serves as a good model for how the other APIs also work.

First, create a web page which you’ll access through some kind of local web server. If you have Python installed, you can create an index.html file in a directory, open that directory in the terminal, and use py -m http.server to serve the contents on port 8080. You can’t, and shouldn’t, try to open the web page as a local file, as that may cause content-restriction rules to kick in and break things.

Here’s the source code of the page to create:

div style="display: flex;">
    textarea style="width:50%; height:24em" id="input" placeholder="Type text to be summarized">textarea>br>
    textarea style="width:50%; height:24em" id="output" placeholder="Summarization results">textarea>br>
div>
textarea style="width:100%; height:4em" id="context" placeholder="Additional context">textarea>
label for="type">Type of summarization:label>
select id="type" name="type">
    option value="teaser">Teaseroption>
    option value="tldr">tl;droption>
    option value="headline">Headlineoption>
    option value="key-points">Key pointsoption>
select>

label for="length">Length:label>
select id="length" name="length">
    option value="short">Shortoption>
    option value="medium">Mediumoption>
    option value="long">Longoption>
select>

button type="button" onclick="go();">Startbutton>
div style="background-color:beige" id="log">div>
script>
    const $log = document.getElementById("log")
    const $input = document.getElementById("input")
    const $output = document.getElementById("output")
    const $context = document.getElementById("context")
    const $type = document.getElementById("type")
    const $length = document.getElementById("length")

    function log(text) {
        $log.innerHTML += text + "
";
    }
    async function summarize() {
        $log.innerHTML = "";

        if (!'Summarizer' in self) {
            log("Summarizer not available")
            return false
        };

        const availability = await Summarizer.availability();
        log(`Summarizer status: ${availability}`);

        const summarizer = await Summarizer.create({
            sharedContext: $context.value,
            type: $type.value,
            length: $length.value,
            format: 'markdown',
            monitor(m) {
                m.addEventListener('downloadprogress', (e) => {
                    log(`Downloaded ${e.loaded * 100}%`);
                });
            }
        });

        log("Summarizer created, starting summarization");

        $output.value = "";

        const stream = summarizer.summarizeStreaming($input.value)
        for await (const chunk of stream) {
            $output.value += chunk;
        }

        log("Finished.")
    }
    function go() {
        summarize();
    }
script>

Most of what we want to pay attention to is in the summarize() function. Let’s walk through the steps.

Step 1: Verify the API is available

The line if (!'Summarizer' in self) will determine if the summarizer API is even available on the browser. The follow-up, const availability = await Summarizer.availability(); returns the status of the model required for the API:

downloadable: The model needs to be downloaded, so you’ll want to provide some kind of progress feedback for the download. (The above code has an example of how this could be implemented, via the monitor() function passed to the Summarizer.create() method.)
available: The model is on the device and can be used right away.

Step 2: Create the Summarizer object

The next step is to create the Summarizer object, which can take several parameters:

sharedContext: A text which gives the summarizer additional context for how to do its work (e.g. “Format the output as a bullet list of questions”).
type: One of four values that describes the format for the summary. teaser tries to create interest in the text’s contents without revealing full details; tldr provides a quick and concise summary, no more than a sentence or two; headline generates a suitable headline for the text; and key-points produces a bullet list of takeaways.
length: One of short, medium, or long; this parameter controls how long the output should be.
format: The format of the input text. markdown is the default; another allowed value is plain-text. If you are using HTML as your source, you may want to use .innerText to derive a text-only version of the input.

Step 3: Stream and iterate over the output

Most of the time, we want to see the output streamed a token at a time, so we have some sense that the model is working. To do this, we use const stream = summarizer.summarizeStreaming($input.value) to create an object we can iterate over ($input.value is the text to summarize). We then use for await (const chunk of stream){} to iterate over each chunk and add it to the $output field.

Here’s an example of some input and output:

Example output for built-in text summarizer AI model in Chrome and Edge. The model runs entirely on the device hosting the browser and does not call out to an external service to deliver its results.

Foundry

Caveats for using Summarizer (and other local AI APIs)

The first thing to keep in mind is that the model will take some time to download on first use. The sizes of the models vary, but you can expect them to be in the gigabyte range. That’s why it’s a good idea to provide some kind of UI feedback for the download process. Ideally, you’d want to provide some way to run the model download process and then ping the user when it’s ready for use.

Once models are downloaded, there’s no programmatic interface to how they’re managed — at least, not yet. On Google Chrome there’s a local URL, chrome://on-device-internals/, that shows which models have been loaded and provides statistics about them. You can use this page to remove models manually or inspect their stats for the sake of debugging, but the JavaScript APIs don’t expose any such functionality.

When you start the inference process, there may be a noticeable delay between the time the summarization starts and the appearance of the first token. Right now there’s no way for the API to give us feedback about what’s happening during that time, so you’ll want to at least let the user know the process has started.

Finally, while Chrome and Edge support a small number of local AI APIs now, how the future of browser-based local AI will play out is still open-ended. For instance, we might see a more generic standard emerge for how local models work, rather than the task-specific versions shown here. But you can still get going right now.

The post Tap into the AI APIs of Google Chrome and Microsoft Edge first appeared on Azalio.