<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Azalio</title>
	<atom:link href="https://www.azalio.io/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.azalio.io</link>
	<description>Your technology partner</description>
	<lastBuildDate>Tue, 16 Jun 2026 13:59:34 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.6.5</generator>

<image>
	<url>https://www.azalio.io/wp-content/uploads/2021/12/cropped-logo@3x-32x32.png</url>
	<title>Azalio</title>
	<link>https://www.azalio.io</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Databricks pitches LTAP as a new foundation for agentic applications</title>
		<link>https://www.azalio.io/databricks-pitches-ltap-as-a-new-foundation-for-agentic-applications/</link>
		
		<dc:creator><![CDATA[Azalio tdshpsk]]></dc:creator>
		<pubDate>Tue, 16 Jun 2026 13:59:34 +0000</pubDate>
				<category><![CDATA[Cloud]]></category>
		<guid isPermaLink="false">https://www.azalio.io/databricks-pitches-ltap-as-a-new-foundation-for-agentic-applications/</guid>

					<description><![CDATA[<p>As enterprises rush to build AI agents that can reason over business data and take action, Databricks argues that the long-standing practice of separating operational and analytical data systems is turning into a liability. That separation, the cloud-based data warehouse provider says, is becoming increasingly strained as AI agents require simultaneous access to live operational [&#8230;]</p>
<p>The post <a href="https://www.azalio.io/databricks-pitches-ltap-as-a-new-foundation-for-agentic-applications/">Databricks pitches LTAP as a new foundation for agentic applications</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></description>
										<content:encoded><![CDATA[<div>
<div id="remove_no_follow">
<div class="grid grid--cols-10@md grid--cols-8@lg article-column">
<div class="col-12 col-10@md col-6@lg col-start-3@lg">
<div class="article-column__content">
<section class="wp-block-bigbite-multi-title">
<div class="container"></div>
</section>
<p>As enterprises rush to build AI agents that can reason over business data and take action, Databricks argues that the long-standing practice of separating operational and analytical data systems is turning into a liability.</p>
<p>That separation, the cloud-based data warehouse provider says, is becoming increasingly strained as AI agents require simultaneous access to live operational data and historical context to make decisions and take actions in real time, unlike humans, who traditionally can work with data that is minutes or hours old.</p>
<p>At its annual Data + AI Summit, the data warehouse provider introduced Lake Transactional and Analytical Processing (LTAP), a new architecture designed to unify transactional and analytical data on a single storage layer.</p>
<p>The new approach, according to Databricks, differs from traditional <a href="https://www.infoworld.com/article/2334535/what-is-oltp-the-backbone-of-ecommerce.html">online transaction processing (OLTP)</a> and <a href="https://www.infoworld.com/article/2334471/what-is-olap-analytical-databases.html">online analytical processing (OLAP)</a> architectures, which typically store operational and analytical data in separate systems.</p>
<p>Traditionally, OLTP databases are optimized for running day-to-day business operations such as order processing, payments, and inventory updates, while OLAP systems are designed for large-scale analytical queries and reporting.</p>
<p>As a result, enterprises often need to rely on <a href="https://www.infoworld.com/article/3487711/the-definitive-guide-to-data-pipelines.html">ETL pipelines</a>, data replication, and separate infrastructure to move information between the two environments.</p>
<p>LTAP, Databricks said, seeks to eliminate the reliance on ETL pipelines, replicated databases or separate data copies by storing data once in a shared <a href="https://www.infoworld.com/article/2334907/review-databricks-lakehouse-platform.html">lakehouse</a> layer while allowing dedicated compute engines to handle transactional and analytical workloads independently.</p>
<p>This approach, the company argued, provides AI-driven agents and applications access to both live operational data and historical analytical context without requiring data movement or duplicate copies.</p>
<h2 class="wp-block-heading" id="developer-simplicity-in-the-agentic-era">Developer simplicity in the agentic era</h2>
<p>Analysts, too, agree with Databricks’ contention that AI agents place new demands on enterprise data architectures.</p>
<p>“Agents don’t behave like people, or even like the apps we built for people. They read for context, loop, try things, then write something back, thousands of times over in ways you can’t fully predict. At that volume, the constant bouncing between production and analytics systems starts becoming the bottleneck. The pressure to collapse that gap is real, and LTAP is one way to approach it,” said <a href="https://moorinsightsstrategy.com/team/mike-leone/" target="_blank" rel="noreferrer noopener">Michael Leone</a>, principal analyst at Moor Insights and Strategy.</p>
<p><a href="https://www.linkedin.com/in/bhupendrachopra/" target="_blank" rel="noreferrer noopener">Bhupendra Chopra</a>, cofounder and CRO at IT consulting firm Kanerika, pointed out that an autonomous agent’s data access pattern makes the traditional architectures brittle: “We’re seeing this directly with clients deploying multi-agent systems, the pipeline layer becomes the ceiling almost immediately as an agent runs hundreds of times per task.”</p>
<p>The analysts also pointed out that the ability to collapse the gap between OLAP and OLTP is likely to help developers design more robust AI agents or applications that enterprises are currently targeting to deploy.</p>
<p>“The most interesting workflow or application patterns are real-time, context-aware applications that combine transactions, analytics, and AI in one flow,” said <a href="https://www.linkedin.com/in/slwalter/" target="_blank" rel="noreferrer noopener">Stephanie Walter</a>, practice leader of AI stack at HyperFRAME Research.</p>
<p>“Examples include AI agents that update customer workflows while seeing historical account context and fraud systems that act on live transactions and long-term behavioral patterns,” Walter added.</p>
<p>Designing such applications today, however, according to Leone, would require developers to pull together data from transactional systems, data warehouses, vector databases, and other sources through custom integrations, creating significant engineering complexity and maintenance overhead.</p>
<h2 class="wp-block-heading" id="operational-simplicity-and-governance-gains-for-cios">Operational simplicity and governance gains for CIOs</h2>
<p>For CIOs,  LTAP’s ability to reduce that engineering complexity, according to <a href="https://www.hfsresearch.com/team/ashish-chaturvedi/" target="_blank" rel="noreferrer noopener">Ashish Chaturvedi</a>, leader of executive research at HFS Research, will result in operational simplicity as well as cost savings.</p>
<p>“Most prominent advantage would be fewer data pipelines and everything that cascades from eliminating them. Most enterprises don’t realize how much of their data engineering budget is pure plumbing maintenance,” Chaturvedi said.</p>
<p>Kanerika’s Chopra pointed out that a substantial portion of data engineering capacity in mid-to-large enterprises today is consumed by maintaining synchronization between transactional and analytical systems.</p>
<p>The implications, however, Chaturvedi noted, are not limited to developer productivity, architectural simplicity, or cost savings: “The strategic prize is simplified governance. When you have one copy of data under one governance model instead of the same data scattered across operational stores, replicas, warehouses, and vector databases, you’ve solved the governance fragmentation problem.”</p>
<p>That simplification, according to Chopra, will matter operationally for enterprises deploying multiple AI agents, as these workflows can amplify governance gaps at a speed and scale that no human workflow ever did.</p>
<h2 class="wp-block-heading" id="ltap-versus-htap">LTAP versus HTAP</h2>
<p>Despite all its benefits, though, LTAP isn’t the first effort to unify operational and analytical workloads under a single architecture and for years.</p>
<p>The industry has pursued a similar goal through <a href="https://www.infoworld.com/article/2260125/how-in-memory-computing-drives-digital-transformation-with-htap.html">Hybrid Transactional and Analytical Processing (HTAP)</a> architecture, which sought to combine operational and analytical workloads on tightly coupled infrastructure to serve both workload types from the same system.</p>
<p>LTAP, in contrast, separates storage from compute, allowing different engines to access a common data layer while remaining independently scalable, Databricks said.</p>
<p>That separation of compute engines is why analysts think that LTAP might be a better bet than HTAP.</p>
<p>“HTAP never took off because asking one tightly bound system to be great at transactions and great at analytics usually left it mediocre at both, so customers ended up paying a premium for that compromise,” Leone said.</p>
<p>“I think separating storage from compute is the right instinct, and it’s the same move that made the modern cloud data world work in the first place. It matters because the thing that sank HTAP was one workload starving the other, and giving each side its own dedicated engine is exactly how you keep that from happening,” Leone added.</p>
<p>Another reason for HTAP’s failure, according to <a href="https://isg-one.com/about-us/people/david-menninger">David Menninger</a>, executive director of software research at ISG, was its requirement for enterprises to replace existing data platforms with a new architecture.</p>
<p>LTAP, by contrast, builds on the now-common practice of separating compute and storage, making the addition of an operational layer less of an architectural transformation and potentially lowering the barrier to adoption, Menninger added.</p>
<h2 class="wp-block-heading" id="not-yet-the-default-architecture-for-ai-agents">Not yet the default architecture for AI agents</h2>
<p>However, despite the enthusiasm around LTAP, analysts warned CIOs against viewing this as the inevitable successor to existing data architectures.</p>
<p>“CIOs will still need to choose their data architecture based on latency, reliability, ecosystem fit, cost, compliance, and developer experience,” Walter said.</p>
<p>Echoing Walter, Chaturvedi pointed out that for LTAP to become the de facto standard for the industry, Databricks will need more than architectural elegance: “The architecture looks sound on paper. The proof will be in the commit-to-query latency numbers under real load.”</p>
<p>LTAP, Databricks said, is expected to be released soon as part of <a href="https://www.infoworld.com/article/4007541/databricks-data-ai-summit-2025-five-takeaways-for-data-professionals-developers.html">Lakebase,</a> without providing any specific timelines.</p>
</div>
</div>
</div>
</div>
</div><p>The post <a href="https://www.azalio.io/databricks-pitches-ltap-as-a-new-foundation-for-agentic-applications/">Databricks pitches LTAP as a new foundation for agentic applications</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Nvidia PCs don’t need cloud for AI</title>
		<link>https://www.azalio.io/nvidia-pcs-dont-need-cloud-for-ai/</link>
		
		<dc:creator><![CDATA[Azalio tdshpsk]]></dc:creator>
		<pubDate>Tue, 16 Jun 2026 09:59:45 +0000</pubDate>
				<category><![CDATA[Cloud]]></category>
		<guid isPermaLink="false">https://www.azalio.io/nvidia-pcs-dont-need-cloud-for-ai/</guid>

					<description><![CDATA[<p>Nvidia’s new RTX Spark is one of the most interesting personal computing announcements in years. That’s because it’s not just another PC platform, but tries to redefine the role of the personal computer in the age of AI. Announced at Computex 2026, RTX Spark is Nvidia’s new platform for slim Windows laptops and compact desktops, [&#8230;]</p>
<p>The post <a href="https://www.azalio.io/nvidia-pcs-dont-need-cloud-for-ai/">Nvidia PCs don’t need cloud for AI</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></description>
										<content:encoded><![CDATA[<div>
<div id="remove_no_follow">
<div class="grid grid--cols-10@md grid--cols-8@lg article-column">
<div class="col-12 col-10@md col-6@lg col-start-3@lg">
<div class="article-column__content">
<section class="wp-block-bigbite-multi-title">
<div class="container"></div>
</section>
<p>Nvidia’s new RTX Spark is one of the <a href="https://www.nvidia.com/en-us/products/rtx-spark/">most interesting personal computing announcements</a> in years. That’s because it’s not just another PC platform, but tries to redefine the role of the personal computer in the age of <a href="https://www.infoworld.com/article/4061121/a-brief-history-of-ai.html">AI</a>. Announced at Computex 2026, RTX Spark is Nvidia’s new platform for slim Windows laptops and compact desktops, designed to combine an Arm-based CPU, Blackwell-based RTX graphics, and a large, unified memory architecture into a single AI-first computing system.</p>
<p>We have all grown accustomed to a cloud-centric AI model over the past few years. We open an application, send a request over the network, and a hosted service in a distant data center provides the intelligence. ChatGPT, Grok, Gemini, and similar systems have trained the market to think of AI as something that lives elsewhere. RTX Spark proposes a different model. It asks a simple yet disruptive question: What if the model, the agent, the data, and the application could all live on your own machine? Nvidia is not just selling a faster PC. It is selling a new architectural premise.</p>
<h2 class="wp-block-heading" id="features-functions-and-prices">Features, functions, and prices</h2>
<p>On paper, RTX Spark is designed to be a highly capable local AI system. Nvidia has described the platform as combining AI acceleration and RTX graphics on a single chip for thin laptops and small desktops. Public specifications for the platform indicate configurations with up to 6,144 Blackwell GPU cores, up to a 20-core CPU, up to 1 petaflop of FP4 AI performance, and up to 128GB of unified memory. These are not ordinary PC numbers. They are clearly intended to support serious local AI workloads.</p>
<p>The unified memory approach is especially important. In traditional PC architecture, the CPU and GPU often use separate memory pools, which can become a bottleneck when running large models. By contrast, RTX Spark’s design is intended to make it easier for the system to host and run AI models locally. This enables Nvidia to position the machine as capable of hosting persistent <a href="https://www.infoworld.com/article/3611465/how-ai-agents-will-transform-the-future-of-work.html">AI agents</a>, supporting local inference, and even allowing users to customize or fine-tune certain classes of language models.</p>
<p>Nvidia is also careful not to frame the system as only an AI box. In a smart move, the company is marketing RTX Spark for gaming, creative applications, AI development, and agentic workflows. This has been designed not as a one-trick pony, but as a capable computer first and an AI workstation second. Otherwise, it remains a niche developer experiment.</p>
<p>Pricing remains uncertain because Nvidia hasn’t published a universal price for every RTX Spark laptop or desktop. The platform will appear in products from different manufacturers, which means prices will vary. The best indicator comes from the related DGX Spark desktop, listed at about $4,699, though early estimates placed it between $2,999 and $3,999.</p>
<p>That probably gives us the right way to think about pricing for this broader category. These are unlikely to be inexpensive mainstream PCs, at least not at launch. They are more likely to arrive as premium systems aimed at developers, technical professionals, creators, and early adopters willing to pay for high-end AI capabilities on the device. Over time, that may broaden. For now, however, this looks like a new high-value, high-cost category rather than a commodity PC refresh.</p>
<h2 class="wp-block-heading" id="what-is-its-real-purpose">What is its real purpose?</h2>
<p>The most important thing about RTX Spark is not the chip. It is the purpose behind the chip. This machine is ultimately built to run AI agents locally, and that is a bigger deal than it may seem at first glance. An AI agent is more than a chatbot. It persists state, accesses tools, works across applications, remembers context, automates tasks, and increasingly acts as a software-based worker. Nvidia is explicitly positioning Spark systems to run personal AI agents directly on the local machine, potentially around the clock. That creates a very different computing model from what most of us use today.</p>
<p>There is another important layer to this story. These systems are also being positioned as platforms on which users can build and run smaller, more limited, locally tuned versions of large language model systems. Put plainly, you may be able to create your own model-based assistant that runs directly on the RTX Spark. It will not be as broadly capable as a frontier model operated by OpenAI or another hyperscaler. It is likely to be less generally capable, narrower in its expertise, and more constrained by local hardware limits. But it will be yours, it will be local, and it will respond without relying on a remote <a href="https://www.infoworld.com/article/2269032/what-is-an-api-application-programming-interfaces-explained.html">API</a> call to a hosted AI service hundreds or thousands of miles away.</p>
<p>Such a shift is conceptually significant. For years, the AI industry has conditioned us to believe that serious intelligence must be centralized. RTX Spark suggests a future in which at least some intelligence becomes personal, portable, and self-contained.</p>
<h2 class="wp-block-heading" id="welcome-to-the-revolution">Welcome to the revolution</h2>
<p>The breakthrough here is not that local models will instantly outperform remote models. They will not. But the architecture of AI use may begin to diversify. Today, the default assumption is centralization. We assume the model, the knowledge base, and the application stack will all live in the cloud, and the user is simply a client. With systems like RTX Spark, that assumption starts to weaken. The model can run on the local machine. The agent can run on the local machine. Sensitive data can remain on the local machine. The application logic can be executed on the local machine. This changes latency, privacy, resiliency, and cost models. It also changes who controls the AI.</p>
<p>That does not mean the cloud goes away. Far from it. Enterprise use cases that benefit from centralized models and data will continue to exist. Businesses want the same knowledge base, business rules, database consistency, and governance model available to everyone. Centralization remains powerful because it reduces fragmentation and keeps systems aligned. Yes, a single-tenant, RTX Spark-based AI environment can be useful for certain projects, but it can also create islands of intelligence that do not easily share knowledge across teams and systems.</p>
<h2 class="wp-block-heading" id="possible-use-cases">Possible use cases</h2>
<p>I see the strongest potential use cases in disconnected or semi-disconnected environments. Think about physicians doing diagnostics support in privacy-sensitive contexts, field engineers collecting and interpreting data in remote areas, military and public sector users operating at the edge, or professionals who need highly private, self-contained AI assistance without relying on constant connectivity. In those scenarios, the value proposition is very strong. Having the model, data, application, and agent all on one portable system is not a limitation. It is the point.</p>
<p>The bigger question is whether mainstream enterprise AI will migrate in that direction. I remain skeptical that most organizations want hundreds or thousands of individually tuned, locally hosted models to replace centralized AI services. I predict that this category will complement the cloud rather than displace it. The more likely future is hybrid: centralized AI where shared knowledge and governance matter, and local AI where privacy, portability, latency, or disconnected operations matter more.</p>
</div>
</div>
</div>
</div>
</div><p>The post <a href="https://www.azalio.io/nvidia-pcs-dont-need-cloud-for-ai/">Nvidia PCs don’t need cloud for AI</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Develop smarter AI agents with data fabrics</title>
		<link>https://www.azalio.io/develop-smarter-ai-agents-with-data-fabrics/</link>
		
		<dc:creator><![CDATA[Azalio tdshpsk]]></dc:creator>
		<pubDate>Tue, 16 Jun 2026 09:59:45 +0000</pubDate>
				<category><![CDATA[Cloud]]></category>
		<guid isPermaLink="false">https://www.azalio.io/develop-smarter-ai-agents-with-data-fabrics/</guid>

					<description><![CDATA[<p>Every organization has data scattered across data warehouses, data lakes, SaaS platforms, cloud drives, and data centers. Data fabrics enable organizations to centralize and control data access, making it easier for users, such as data scientists and citizen data analysts, to find and use trusted and governed data sources.  Data fabrics, data meshes, and distributed [&#8230;]</p>
<p>The post <a href="https://www.azalio.io/develop-smarter-ai-agents-with-data-fabrics/">Develop smarter AI agents with data fabrics</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></description>
										<content:encoded><![CDATA[<div>
<div id="remove_no_follow">
<div class="grid grid--cols-10@md grid--cols-8@lg article-column">
<div class="col-12 col-10@md col-6@lg col-start-3@lg">
<div class="article-column__content">
<section class="wp-block-bigbite-multi-title">
<div class="container"></div>
</section>
<p>Every organization has data scattered across data warehouses, data lakes, SaaS platforms, cloud drives, and data centers. Data fabrics enable organizations to centralize and control data access, making it easier for users, such as data scientists and <a href="https://drive.starcio.com/2026/03/citizen-analytics-ai-era-cios/">citizen data analysts</a>, to find and use trusted and governed data sources. </p>
<p><a href="https://www.infoworld.com/article/2338426/how-to-explain-data-meshes-fabrics-and-clouds.html">Data fabrics, data meshes, and distributed data clouds</a> are all platforms to help IT and data teams put some order to the chaos around the myriad of data sources they support. <a href="https://www.infoworld.com/article/3497094/does-your-organization-need-a-data-fabric.html">Large companies need data fabrics</a> due to the volume and variety of their data sources.</p>
<p>“A data fabric can be thought of as the connective tissue that ensures consistent accessibility, availability, and understanding of data across an organization,” says Dominic Wellington, data and AI expert at <a href="https://www.snaplogic.com/">SnapLogic</a>. “Individual siloed platforms may have their own internal data transfer systems, and particular teams or departments may adopt interchanges that work for that domain, but a data fabric operates at a higher level, ensuring that unified data policies are applied end-to-end across the entire enterprise.”</p>
<h2 class="wp-block-heading" id="types-of-data-fabrics">Types of data fabrics</h2>
<p>When reviewing data fabrics, it’s important to consider their primary use cases, supported data types, data processing capabilities, data management structures, and governance functions. Below are some considerations when reviewing data fabrics as features, platforms, and stand-alone products.</p>
<ul class="wp-block-list">
<li>Some data fabrics are optimized for analytics and machine learning use cases and may have limited support for unstructured data.</li>
<li>Other data fabrics extend the functionality of data governance platforms beyond data cataloging and metadata management and now include persistent data management, data quality, and dataops capabilities.</li>
<li>Many data integration and API connectivity platforms go beyond proxying, pipelining, and transforming data to include search, governance, and other capabilities from data centralization.</li>
<li>Some SaaS platforms are extending their connectivity and data integration capabilities, enabling multicloud portability and persistent data.</li>
<li>The more advanced data fabrics support features needed for AI agents and AI model training. These platforms create a semantic context layer for structured and unstructured data sources, support <a href="https://www.infoworld.com/article/4029634/what-is-model-context-protocol-how-mcp-bridges-ai-and-external-services.html">Model Context Protocol</a> (MCP) integrations, have real-time query capabilities, centralize policy-driven governance, and track data lineage.    </li>
</ul>
<h2 class="wp-block-heading" id="why-data-fabrics-are-needed-for-ai">Why data fabrics are needed for AI</h2>
<p>Data fabrics are not just for enterprises, and today, even smaller companies need them as part of their <a href="https://www.cio.com/article/4136302/how-to-get-ai-democratization-right.html">AI democratization programs</a>. Here are a few reasons why:</p>
<ul class="wp-block-list">
<li><a href="https://drive.starcio.com/2025/10/ai-agents-definitive-guide-saas-security-titans/">AI agents in enterprise SaaS</a> solutions need access to broader data sets than those core to their workflows. Platforms such as Adobe, Appian, Oracle, Salesforce, ServiceNow, SAP, and Workday offer data fabric capabilities to bring data outside of the business processes they manage into scope for their AI agents.</li>
<li><a href="https://www.infoworld.com/article/4160979/addressing-the-challenges-of-unstructured-data-governance-for-ai.html">Unstructured data</a> is important for setting the context for AI agents, and data fabrics are now used to provide access to documents, emails, transcripts, and other media formats.</li>
<li>Data fabrics provide data access standards for the devops teams experimenting with <a href="https://www.infoworld.com/article/4032989/a-developers-guide-to-code-generation.html">AI code generators</a>, <a href="https://www.infoworld.com/article/4058076/vibe-coding-and-the-future-of-software-development.html">vibe coding</a> tools, and spec-driven development approaches to develop applications and AI agents. </li>
<li>As companies use <a href="https://www.infoworld.com/article/4124612/5-requirements-for-using-mcp-servers-to-connect-ai-agents.html">MCP servers</a> to connect AI agents, data fabrics provide a standardized way for the agents to access governed, trusted data sources.</li>
</ul>
<p>“As AI agents move from generating insights to taking action, the data fabric becomes foundational in the agentic era,” says Irfan Kahn, president and chief product officer of  <a href="https://www.sap.com/index.html">SAP Data &amp; Analytics</a>. “Most enterprises operate across scattered data sources and diverse data landscapes, and what’s needed is shared business context, governed access, and clear accountability for how data is used in decision-making. Without that context, agents can’t fully understand or coordinate across the enterprise to deliver meaningful value.”</p>
<p>Sanjay Koppikar, chief product officer and cofounder of <a href="https://evoluteiq.com/">EvoluteIQ</a>, adds, “Multi-agent architectures become untrustworthy when a unifying data fabric architecture is missing, since agents will often work against each other in the service of their own objectives.”</p>
<h2 class="wp-block-heading" id="delivering-context-to-ai-agents">Delivering context to AI agents</h2>
<p>AI agents need a combination of real-time data, user information, problem details, and historical context to guide their decision-making. Vishal Sood, president of research and development  at <a href="https://www.typeface.ai/">Typeface</a>, says, “MCP and data fabrics give agents access, but the harder problem is contextualizing data across multiple sources and ensuring the underlying content, media, and unstructured data are trustworthy.”</p>
<p>Data fabrics are the foundational elements for providing current information and long-term memory to AI agents. They simplify the many-to-many problem of connecting multiple AI models, AI agents, and MCP server integrations to multiple structured and unstructured data sources.</p>
<p>“The data fabric does a beautiful job of encompassing three concepts needed to create applications and processes: the data catalog, the data model, and data access,” says Sanat Joshi, executive vice president of product and innovations at <a href="https://www.appian.com/">Appian</a>. “But now add business rules, process models, APIs, security groups, the organizational model, and their interrelationships into one unified view of the enterprise, and that becomes your context layer.” </p>
<h2 class="wp-block-heading" id="integrations-with-data-fabrics">Integrations with data fabrics</h2>
<p>Devops teams just getting started on an AI agent proof of concept may want to connect directly to the optimal data sources and APIs. Michel Tricot, CEO and cofounder at <a href="https://airbyte.com/">Airbyte</a>, says connecting agents to live APIs is a great start, but it creates two big problems: APIs only return data that an agent already knows to ask for, and every query is an expensive API call chain that, with overhead, can overwhelm infrastructure in production volumes.</p>
<p>Tricot says the data fabric for AI use cases must be dynamic, leveraging discovery of available information from replicated data, fetching live contextual information, and writing the data back to business applications to update records.</p>
<p>Moving data in and out of the data fabric requires an integration strategy. <a href="https://www.datacamp.com/blog/what-is-zero-etl">Zero-ETL</a> (extract, transform, load) is one low-cost, efficient approach for connecting to structured data sourced without replicating information. Once information is accessed centrally, it also enables streamlined security and governance.</p>
<p>“The promise of AI agents breaks down when they’re stuck waiting on brittle ETL, dealing with poor data quality, and lacking the right context to perform analysis,” says Preston Wood, chief security and strategy officer at <a href="https://databahn.ai/">Databahn</a>. “Generating AI-ready data within a data fabric gives agents real-time access to operational data without the latency and drift that undermine decision quality. A well-architected data fabric provides the governance and lineage controls that let you deploy agents confidently, knowing exactly what data they’re touching and why.”</p>
<h2 class="wp-block-heading" id="centralizing-ai-ready-data">Centralizing AI-ready data</h2>
<p>Data fabrics centralize <a href="https://www.infoworld.com/article/4091422/how-to-ensure-your-enterprise-data-is-ai-ready.html">AI-ready data</a> and help data governance teams address <a href="https://www.infoworld.com/article/3667314/3-data-quality-metrics-dataops-should-prioritize.html">data quality</a> issues, <a href="https://www.nature.com/articles/s41597-022-01705-8">biased data</a> concerns, <a href="https://drive.starcio.com/2026/02/data-privacy-week-leadership-accountability/">privacy compliance</a>, and other <a href="https://drive.starcio.com/2024/10/6-important-ai-and-data-governance-non-negotiables/">data governance non-negotiables</a>. Data fabrics also help address integration issues, monitor for <a href="https://www.infoworld.com/article/3487711/the-definitive-guide-to-data-pipelines.html">data pipeline errors</a>, and report on performance latencies. The result is that AI agents, models, and other analytics capabilities can then connect to trusted data sources with consistency.</p>
<p>“As AI agents and MCP architectures increasingly rely on data fabrics as their golden source of truth, data quality stops being a hygiene problem and becomes a trust problem, as we all know that trust is foundational to autonomous decision-making,” says Kellyn Gorman, database and AI advocate and engineer at <a href="https://www.red-gate.com/">Redgate Software</a>. “Organizations that invest now in semantic consistency, lineage tracking, and observable data contracts across data fabrics will be the ones whose AI agents can be trusted to act without constant human correction.”</p>
<p>Data fabrics that support zero-ETL and other bidirectional integrations with sources thus become an organizational knowledge base, the data source for training AI models, and a foundation for producing data metrics.</p>
<p>“AI agents are only as reliable as the data they’re built on, and most organizations underestimate how much implicit tribal knowledge lives in their transformation logic rather than their source systems,” says Tobias Ostwald, director of analytics at <a href="https://www.nmi.com/">NMI</a>. “If you’re exposing a data fabric to agents or MCP integrations, you need lineage, testing, and metric definitions baked into the layer itself, not just documented somewhere, because the agent can’t call a colleague to gut-check a number.”</p>
<h2 class="wp-block-heading" id="streamlining-security-and-governance">Streamlining security and governance</h2>
<p>With a data fabric in place, governance, security, and other risk management leaders have a central location to manage data security, centralize access controls, and fulfill other governance responsibilities. Miles Ward, CTO of AI in Solution Lines at <a href="https://www.insight.com/">Insight</a>, says, “We have to move past security by isolation to a governance model where the fabric itself enforces the pavement and walls of compliance.”</p>
<p>The data fabric also governs entitlements for AI agents and their users. Centralizing these business rules can help organizations avoid creating AI debt, a risk if controls are implemented directly in data sources or consumers.</p>
<p>“The convergence of AI-generated code sprawl and autonomous MCP connectivity creates a ‘perfect storm’ of architectural drift and toxic permission combinations,” says Karen Cohen, vice president of product at <a href="https://apiiro.com/">Apiiro</a>. “Effective governance requires a security data fabric that monitors these autonomous connections in real time to enforce intent-based policies and strictly limit agent scope to its specific purpose. By integrating guardrails that align AI-assisted development with secure architecture principles, enterprises can proactively secure their expanding attack surface without sacrificing developer velocity.”</p>
<h2 class="wp-block-heading" id="future-considerations-for-data-fabrics">Future considerations for data fabrics</h2>
<p>Expect vendors to expand the scope of their data fabrics beyond text and documents. Some will include <a href="https://www.infoworld.com/article/3833936/improving-intelligent-document-processing-with-generative-ai.html">specialized document processing</a> for common formats such as invoices, contracts, and product documentation. There will be skills and tools to support industry-specific documents such as health records and construction documents. Others will support multimedia file types and provide metadata extraction and search capabilities. </p>
<p>“Enterprises are asking agents to reason across contracts, images, PDFs, and video, and this is where most data fabrics break,” says Dave Shuman, chief data officer at <a href="https://www.precisely.com/">Precisely</a>. “Multimodal data must be chunked, embedded, and governed with the same rigor as structured data, including lineage and access controls.”</p>
<p>Several other emerging capabilities include:</p>
<ul class="wp-block-list">
<li>Extended support for AI agent interfaces to aid in data discovery, and with greater contextual controls on where and when AI agents can access sensitive data</li>
<li>Business ontologies, semantic layers, and knowledge graph capabilities, with management tools or integrations with third-party platforms</li>
<li>Support for data contracts, service-level agreements, centralized data observability, auditing, and other functions that will enhance explainable AI capabilities</li>
<li>Finops functions to track costs for data owners and consumers</li>
</ul>
<p>As more companies depend on AI agents in their operations, expect top data fabric platforms to release capabilities to expand scope, scale, use cases, and governance.  </p>
</div>
</div>
</div>
</div>
</div><p>The post <a href="https://www.azalio.io/develop-smarter-ai-agents-with-data-fabrics/">Develop smarter AI agents with data fabrics</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Shipping enterprise-quality code with AI agents</title>
		<link>https://www.azalio.io/shipping-enterprise-quality-code-with-ai-agents/</link>
		
		<dc:creator><![CDATA[Azalio tdshpsk]]></dc:creator>
		<pubDate>Tue, 16 Jun 2026 09:59:45 +0000</pubDate>
				<category><![CDATA[Cloud]]></category>
		<guid isPermaLink="false">https://www.azalio.io/shipping-enterprise-quality-code-with-ai-agents/</guid>

					<description><![CDATA[<p>Developers are caught between the joy — or pressure — of using agents to ship 10x faster today and the dread of how they will maintain that code tomorrow. The gap between “vibe” code and code that can be deployed to millions of users is vast and easy to underestimate. Closing the gap requires care, [&#8230;]</p>
<p>The post <a href="https://www.azalio.io/shipping-enterprise-quality-code-with-ai-agents/">Shipping enterprise-quality code with AI agents</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></description>
										<content:encoded><![CDATA[<div>
<div id="remove_no_follow">
<div class="grid grid--cols-10@md grid--cols-8@lg article-column">
<div class="col-12 col-10@md col-6@lg col-start-3@lg">
<div class="article-column__content">
<section class="wp-block-bigbite-multi-title">
<div class="container"></div>
</section>
<p>Developers are caught between the joy — or pressure — of using agents to ship 10x faster today and the dread of how they will maintain that code tomorrow. The gap between <a href="https://www.infoworld.com/article/4078884/what-is-vibe-coding-ai-writes-the-code-so-developers-can-think-big.html" data-type="link" data-id="https://www.infoworld.com/article/4078884/what-is-vibe-coding-ai-writes-the-code-so-developers-can-think-big.html">“vibe” code</a> and code that can be deployed to millions of users is vast and easy to underestimate. Closing the gap requires care, expertise, and effort, with the payoff coming later. Agents are able to complete increasingly complex programming tasks but without the quality we need. What’s missing, and how can we fill the gap?</p>
<div class="extendedBlock-wrapper block-coreImage undefined">
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" src="https://b2b-contenthub.com/wp-content/uploads/2026/06/agentic-coding-quality-gap-sonar.png?w=1024" alt="agentic coding quality gap - sonar" class="wp-image-4182520" width="1024" height="539" sizes="auto, (max-width: 1024px) 100vw, 1024px"></figure>
<p class="imageCredit">Sonar</p>
</div>
<h2 class="wp-block-heading"><a></a>Why agent-generated code degrades: the bloat problem</h2>
<p>Enterprise code has to clear three bars: it must be maintainable, reliable, and secure. Out-of-the-box AI agents can miss all three. Let’s focus on the biggest and most visible maintainability issue, which is bloat: redundant validation, defensive checks that cannot fire, near-duplicate functions, dead code that nothing removes. A <code>None</code> check on a parameter typed as <code>dict</code>. A <code>try</code>/<code>except</code> around a call that never throws. Two functions, identical except for the negation in their return statement.</p>
<p>Bloat varies dramatically by model. Sonar’s <a href="https://www.sonarsource.com/the-coding-personalities-of-leading-llms/leaderboard/">LLM Leaderboard</a> runs every frontier model through 4,400+ Java tasks and analyses the code generated. To complete the benchmark, GPT-5.4 High generated 1,159,000 lines of code at an 81.05% pass rate, while Claude Opus 4.7 Thinking generated only 336,000 lines of code to return a better than 82.52% pass rate. Different models generate dramatically different code to achieve similar outcomes.</p>
<p>Bloat is not just messy. <a href="https://arxiv.org/abs/2511.04427">Carnegie Mellon researchers studied</a> 807 open-source projects that had adopted Cursor, matched against 1,380 controls, measured by SonarQube. A short-term velocity gain disappeared by month three, while static analysis warnings rose 30% and code complexity rose 41% — both persistent. The harder it became to change the codebase and the more bugs it contained, the more the velocity was dragged down. Any experienced developer will know how this goes: quality problems compound until the code feels impossible to change and the only option is the dreaded rewrite.</p>
<p>Three forces produce bloat once a model is in use:</p>
<ol class="wp-block-list">
<li><strong>Agents do not feel the maintenance burden.</strong> Armin Ronacher, the creator of Flask, made the point on the <a href="https://newsletter.pragmaticengineer.com/p/building-pi-and-what-makes-self-modifying">Pragmatic Engineer podcast</a> in late April. Humans feel the cost of bad code over time, and as Ronacher put it, “if the pain gets too big, you as a human are incentivized to fix the cause of your pain” — so we refactor. Agents do not. They obliviously extend bad structure indefinitely. A senior engineer’s job is to say no to unnecessary abstraction. The agent has no equivalent reflex.</li>
<li><strong>Training rewards apparent completeness.</strong> Pretraining corpora are full of explanatory material — Stack Overflow answers, tutorials, README snippets — deliberately self-contained and verbose. Post-training compounds the effect: human raters prefer outputs that look thorough, so models learn that “comprehensive” reads as better. When uncertain which edge case matters, the safe move is to handle all of them. Each guard is locally defensible. The aggregate is bloat.</li>
<li><strong>Iterative generation has no deletion pressure.</strong> Agents add but rarely delete. Removing dead code does not make any test go green, so superseded functions accumulate alongside their replacements. <a href="https://arxiv.org/html/2603.24755v1">SlopCodeBench</a>, a March 2026 benchmark across 11 coding models, found rising structural complexity in 80% of trajectories and rising verbosity in 89.8%. Agents continue to patch bad code, treating every task as if it’s their last.</li>
</ol>
<h2 class="wp-block-heading"><a></a>AC/DC: the loop that compensates</h2>
<p>What closes the gap is a loop around each iteration of agent work. The agent does what it is good at — generating code — and our job is to wrap that with three steps the agent cannot reliably do on its own. At Sonar we call this the Agent Centric Development Cycle, or AC/DC: guide, verify, solve.</p>
<h3 class="wp-block-heading"><a></a>Guide</h3>
<p>Many teams overcorrect on context. They paste the style guide, three years of architectural decisions, and the entire onboarding doc into the agent’s instructions and expect output to improve. <a href="https://arxiv.org/abs/2602.11988">ETH Zurich researchers tested</a> this and found the opposite: large context files often reduced task success against no context at all, and added 20% or more to inference cost.</p>
<p>Keep agent-facing context short — under 200 lines is a useful heuristic — and restrict it to fundamentals that can’t easily be inferred from the code: naming conventions, architectural invariants, what has been tried and failed. However, this will only get you so far, so make sure to provide specific context for each task. If you have architectural expectations, don’t expect the agent to guess them. Software architecture tools can be used to provide additional context in the guide phase.</p>
<p>Task shape matters too. Break the work into steps and agree on a plan; ask the agent to provide three solutions and evaluate the impact on quality of each. There is no perfect software architecture, and you understand the trade-offs in your codebase best, so think critically about the changes before they happen. Without this, the agent will confidently pick an option, seemingly at random, and the further it goes the harder it is to “unpick.” If you want to test this, ask three instances of your preferred agent to complete a task that involves some polymorphism and watch each one confidently suggest a different solution.</p>
<h3 class="wp-block-heading"><a></a>Verify</h3>
<p>The most expensive verification mistake is doing it last. Reviewing 200-line pull requests (PRs) after the agent is done is the dynamic behind the <a href="https://addyo.substack.com/p/the-80-problem-in-agentic-coding">Faros/DORA figures Addy Osmani highlighted</a>: 98% more PRs merged in high-adoption teams, review times up 91%. Verification inside the loop is different. Unit test runs, static analysis, and security scanners produce output the agent can act on. This is where AI-native tooling belongs: purpose-built for the agent to invoke, not just for humans to consult through a UI.</p>
<p>Human reviewers cannot keep up. When agents merge twice as many PRs per week and each one takes nearly twice as long to review, doubling the review staff still leaves you behind. Automated verification is the only response that scales. Fast feedback has always been a fundamental tenet of good software engineering. Feeding it directly back to the agent protects the developer from simple mistakes and leaves them headroom to work on the harder ones.</p>
<h3 class="wp-block-heading"><a></a>Solve</h3>
<p>If verification happens within the agentic loop, the agent can fix any issues whilst the code is being generated without expensive remediation steps. Static analysis tools can guide the agent on how to resolve the issue quickly. Some cases need human judgment — a <code>None</code> check at a system boundary may document a real precondition. But most of the work is mechanical. Automate the obvious fixes and let engineers spend their attention on the cases that are not.</p>
<h2 class="wp-block-heading"><a></a>The investment that compounds</h2>
<p>Better models will keep arriving. They may not change the mechanism of bloat or the dynamics of compounding decay. The loop is what does — bounded tasks, sharp context, in-loop verification, and a deliberate “solve” step to clear bloat before it accumulates.</p>
<p>The same logic governs how autonomy should expand. Reduce human interventions only when the agent’s guide, verify, and solve cycle is making them redundant. Our biases can sting us here: an agent’s ability to write code can lead us to agree with it more than we should. Don’t trust blindly; wait for the evidence.</p>
<p>The teams that will be shipping enterprise-quality code with AI agents in 18 months are not the ones running the “best model.” They are the ones treating workflow as the engineering investment, with the seriousness once given to build systems and CI. The model is the tool, the workflow is the discipline. That is where the durable advantage compounds.</p>
<p><em>—</em></p>
<p><a href="https://www.infoworld.com/blogs/new-tech-forum"><strong><em>New Tech Forum</em></strong></a><em><strong> provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all </strong></em><em><strong>inquiries to </strong></em><a href="mailto:doug_dineley@foundryco.com"><strong><em>doug_dineley@foundryco.com</em></strong></a><em><strong>.</strong></em></p>
</div>
</div>
</div>
</div>
</div><p>The post <a href="https://www.azalio.io/shipping-enterprise-quality-code-with-ai-agents/">Shipping enterprise-quality code with AI agents</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>DocLang aims to make documents readable by AI, not humans</title>
		<link>https://www.azalio.io/doclang-aims-to-make-documents-readable-by-ai-not-humans/</link>
		
		<dc:creator><![CDATA[Azalio tdshpsk]]></dc:creator>
		<pubDate>Tue, 16 Jun 2026 08:00:10 +0000</pubDate>
				<category><![CDATA[Cloud]]></category>
		<guid isPermaLink="false">https://www.azalio.io/doclang-aims-to-make-documents-readable-by-ai-not-humans/</guid>

					<description><![CDATA[<p>AIs struggle to understand documents designed for humans; the DocLang working group seeks to flip that imbalance with its specification for machine-readable business documents “built from the ground up for LLM tokenizers.” The working group, founded by IBM, Nvidia, and Red Hat and hosted by the Linux Foundation’s LF AI &#38; Data project, aims to [&#8230;]</p>
<p>The post <a href="https://www.azalio.io/doclang-aims-to-make-documents-readable-by-ai-not-humans/">DocLang aims to make documents readable by AI, not humans</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></description>
										<content:encoded><![CDATA[<div>
<div id="remove_no_follow">
<div class="grid grid--cols-10@md grid--cols-8@lg article-column">
<div class="col-12 col-10@md col-6@lg col-start-3@lg">
<div class="article-column__content">
<section class="wp-block-bigbite-multi-title">
<div class="container"></div>
</section>
<p>AIs struggle to understand documents designed for humans; the DocLang working group seeks to flip that imbalance with its specification for machine-readable business documents “built from the ground up for LLM tokenizers.”</p>
<p>The working group, founded by IBM, Nvidia, and Red Hat and hosted by the Linux Foundation’s LF AI &amp; Data project, aims to create an open, universal, AI-native document format designed to improve how enterprises prepare, exchange, and govern document data for AI systems. ABBYY and Human Signal will also be involved in its development, and other contributors are welcome.</p>
<p>“Enterprises today work across a fragmented landscape of document formats, including PDFs, JPEGs, and other file types built primarily for human consumption rather than AI interpretation,” the group said in its launch <a href="https://www.linuxfoundation.org/press/lf-ai-data-foundation-launches-doclang-specification-working-group-to-advance-an-open-standard-for-ai-native-documents" target="_blank" rel="noreferrer noopener">announcement</a>.</p>
<p>“This disconnect can introduce complexity, raise costs, and reduce reliability when extracting meaning from business documents,” as organizations increasingly rely on generative AI and agentic systems, it said.</p>
<p><a href="https://www.linkedin.com/in/markcollier/" target="_blank" rel="noreferrer noopener">Mark Collier</a>, executive director of LF AI &amp; Data, said the goal of the <a href="https://doclang.ai/">DocLang Specification</a> Working Group is to “develop a vendor-neutral, interoperable standard that helps organizations prepare document data for AI more reliably, transparently, and at scale.”</p>
<p>DocLang defines a structured, machine-readable format for documents of any type, like JSON for data, that any tool can implement and any pipeline can consume. It builds on <a href="https://www.infoworld.com/article/3997240/docling-an-open-source-tool-kit-for-advanced-document-processing.html">DocLing</a>, a document processing toolkit hosted by LF AI &amp; Data that can transform human-readable PDFs, word processor documents or spreadsheets into structured data.</p>
<h2 class="wp-block-heading" id="standards-must-evolve-for-ai">Standards must evolve for AI</h2>
<p>Something like DocLang is needed, said independent technology analyst <a href="https://www.linkedin.com/in/carmi/" target="_blank" rel="noreferrer noopener">Carmi Levy</a>. “Existing document standards have done an admirable job allowing global stakeholders to confidently collaborate for decades, but it’s becoming increasingly clear that they are in desperate need of an update as AI reshapes the rules around how work gets done,” he explained.</p>
<p>Largely static document types, he said, “can be somewhat limiting when AI is redefining the very word, ‘document.’ In many ways. AI-age documents are far more iterative and dynamic than what they once were, and the definitions need to evolve with the times. The documents we currently live with simply weren’t designed for the AI age.”</p>
<p>Within that context, Levy said, “DocLang represents an early, best hope of achieving some kind of foundational baseline for document standards, one that will hopefully allow more intelligent, more efficient, lower-risk workflows than is currently the case.”</p>
<p>Taking an open-source, vendor-agnostic approach to the process ensures the collective will take precedence over the needs of specific vendors, he said, adding, “earlier standards-setting efforts around networking, documentation, the web, and the cloud powered the free-flowing digital landscape that defines modern life.”</p>
<p>An AI-centric documentation standard will carry that reality into the next generation of technology, said Levy.</p>
<h2 class="wp-block-heading" id="a-question-of-governance">A question of governance</h2>
<p>The entire concept of LLMs, <a href="https://moorinsightsstrategy.com/team/jason-andersen/" target="_blank" rel="noreferrer noopener">Jason Andersen</a>, principal analyst at Moor Insights &amp; Strategy said, “involves using natural human languages. The computer is supposed to understand us without us changing our syntax or language. Forcing a syntax on users is exactly what we have today with SEO and more advanced programming languages.”</p>
<p>With something like DocLang, where the standard can be applied to content ingestion, he said, “I would be OK with that being automated, which seems to be the intent. The use case I envision is that when I upload a document to an agent, a skill can be run to preprocess the document into the DocLang standard format, saving tokens.”</p>
<p>That makes sense, he said, adding that he thinks it’s good “if it can help generate outputs, like a visualization, that can be shared outside an AI tool. On that front, that is also why I am liking Web MCP, since you are just adding some code to the page, like CSS or JavaScript, and the consumer, in this case, an AI browser or skill, is better equipped to handle the site.”</p>
<p>The point, he said, is, “these standards need to preserve the fact that humans can still do what they want, and do not need to know any coding to be proficient. In terms of governance, I am not sure if it matters.”</p>
<p>But one analyst did foresee governance problems arising from DocLang’s use.</p>
<p><a href="https://www.infotech.com/profiles/yaz-palanichamy" target="_blank" rel="noreferrer noopener">Yaz Palanichamy</a>, senior research analyst at Info-Tech Research Group, said DocLang adoption will require organizations to implement and review controls in order to scale its use accountably and securely.</p>
<p><em>This article first appeared <em>on <a href="https://www.cio.com/article/4183187/doclang-aims-to-make-documents-readable-by-ai-not-humans.html">CIO</a></em>, <em>on June 10, 2026</em>.</em></p>
</p>
</div>
</div>
</div>
</div>
</div><p>The post <a href="https://www.azalio.io/doclang-aims-to-make-documents-readable-by-ai-not-humans/">DocLang aims to make documents readable by AI, not humans</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>AWS WAF adds AI traffic monetization capability to help content owners charge AI bots for content access</title>
		<link>https://www.azalio.io/aws-waf-adds-ai-traffic-monetization-capability-to-help-content-owners-charge-ai-bots-for-content-access/</link>
		
		<dc:creator><![CDATA[Azalio tdshpsk]]></dc:creator>
		<pubDate>Mon, 15 Jun 2026 20:59:34 +0000</pubDate>
				<category><![CDATA[AWS]]></category>
		<guid isPermaLink="false">http://13.127.31.42/aws-waf-adds-ai-traffic-monetization-capability-to-help-content-owners-charge-ai-bots-for-content-access/</guid>

					<description><![CDATA[<p>AWS WAF now includes AI traffic monetization capability that gives digital content owners and publishers a way to charge AI bots and agents for access to protected web content directly at the network edge. The capability helps content owners and publishers set per-request pricing by content path, bot category, or verification tier without modifying their [&#8230;]</p>
<p>The post <a href="https://www.azalio.io/aws-waf-adds-ai-traffic-monetization-capability-to-help-content-owners-charge-ai-bots-for-content-access/">AWS WAF adds AI traffic monetization capability to help content owners charge AI bots for content access</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></description>
										<content:encoded><![CDATA[<div>
<p>AWS WAF now includes AI traffic monetization capability that gives digital content owners and publishers a way to charge AI bots and agents for access to protected web content directly at the network edge. The capability helps content owners and publishers set per-request pricing by content path, bot category, or verification tier without modifying their origin infrastructure or writing application code. Content owners can define granular access policies per agent type, collect payments in stablecoins to their preferred wallet, and monitor revenue and bot activity from a single dashboard.</p>
<p>AI bot traffic now accounts for more than 50% of web traffic for many content providers, with AI-specific crawlers growing more than 300% year-over-year. Unlike traditional search engine crawlers, which index content and return measurable referral traffic back to publisher websites, AI bots consume the same content to generate summaries and responses in AI interfaces, with little to no traffic sent back to the original source. Publishers bear the infrastructure costs of serving that traffic without the page views, ad impressions, or subscription conversions that typically offset those costs. <a href="https://docs.aws.amazon.com/waf/latest/developerguide/waf-bot-control.html">AWS WAF Bot Control</a> already gives customers visibility into bot activity and the ability to block or rate-limit traffic, but setting pricing and collecting payment from AI agents has not been possible until now. AI traffic monetization is a new Bot Control capability that closes that gap, giving content owners and publishers a way to configure pricing rules directly through the AWS WAF console and collect payments from AI agents through third-party payment integrations, without building custom payment infrastructure or negotiating individual licensing agreements. Payment settlement and verification flows are provided by Coinbase’s x402 Facilitator. Integration with Stripe for direct account payments and Machine Payments Protocol (MPP) support is coming soon.</p>
<p><span style="text-decoration: underline"><strong>Getting Started with AI Traffic Monetization<br /></strong></span>Before configuring monetization, confirm that AWS WAF Bot Control is enabled at Common or Targeted level on the web ACL associated with your CloudFront distribution. Bot Control provides the agent classification that monetization rules depend on. If you have not set this up yet, visit <a href="https://docs.aws.amazon.com/waf/latest/developerguide/waf-bot-control-rg-using.html">Adding the AWS WAF Bot Control managed rule group to your web ACL</a> documentation. In the AWS Management Console, go to<strong> WAF &amp; Shield</strong> and choose <strong>Protection packs (web ACLs)</strong> in the left navigation pane to get started.</p>
<p>A protection pack is the core configuration unit for AI traffic monetization. It defines which content paths are monetized, what each agent verification tier is charged, which payment methods you accept, and what license terms apply. To create one, choose <strong>Create protection pack (web ACL)</strong>.</p>
<p><img decoding="async" class="alignnone wp-image-104464 size-full" src="http://13.127.31.42/wp-content/uploads/2026/06/1213987938308918-0e.png" alt="" width="1924" height="2626"></p>
<p>In <strong>Tell us about your app</strong>, select one or more app categories that describe your content (for example, Content &amp; publishing systems, E-commerce &amp; transaction platforms, or Enterprise &amp; business applications), and choose an <strong>App focus</strong>. AWS WAF uses these selections to recommend suitable security protections for your configuration.</p>
<p>In <strong>Select resources to protect</strong>, choose <strong>Add resources</strong> to associate regional or global resources such as CloudFront distributions with this protection pack. You can skip this step and add resources later.</p>
<p>In <strong>Choose initial protections</strong>, select from AWS WAF managed rule packages based on your app category and resource selections. You can also choose individual rules instead of packages.</p>
<p>In <strong>Name and describe</strong>, provide a name and optional description for the protection pack.</p>
<p>Optionally, expand <strong>Customize protection pack (web ACL) </strong>to configure additional settings including pricing tiers, payment methods, content scope, and license terms.</p>
<p>When finished, choose <strong>Create protection pack (web ACL)</strong>.</p>
<p>Once your protection pack is in place, review the AI traffic analysis dashboard to understand the impact of AI bot traffic on your content before setting your pricing strategy. In the WAF &amp; Shield console, go to <strong>AI traffic analysis </strong>in the left navigation pane. Select your protection pack (web ACL) from the dropdown to populate the dashboard.</p>
<p><img decoding="async" loading="lazy" class="alignnone wp-image-104466 size-full" src="http://13.127.31.42/wp-content/uploads/2026/06/1213987938308918-1a.png" alt="" width="1928" height="1787"></p>
<p>The AI traffic analysis dashboard breaks down traffic into four categories visible in the bot traffic overview panel: <strong>All bot requests</strong>, <strong>AI bot requests</strong>, <strong>Verified AI bot traffic</strong>, and <strong>Unverified AI bot traffic</strong>. The dashboard surfaces infrastructure impact metrics including bandwidth consumed, estimated monthly cost, and peak request rates. A per-path heatmap shows which content paths receive the most AI bot activity by hour, giving you the data you need to make informed pricing decisions.</p>
<p>AWS WAF Bot Control classifies over 650 distinct AI bot and agent types including GPTBot, Claude-Web, and Perplexity-Bot, and assigns each a verification tier:</p>
<ul>
<li><strong>Verified</strong> — Agent identity confirmed through Web Bot Auth (WBA) Ed25519 cryptographic signature, or sourced from a documented IP range with a known set of user-agents and domain names.</li>
<li><strong>Unverified</strong> — Agent recognized through user-agent matching, behavioral fingerprinting, and IP reputation, but identity not cryptographically confirmed.</li>
</ul>
<p>Once you have reviewed your traffic patterns, return to <strong>Protection packs (web ACLs)</strong>, select your protection pack from the list, and choose <strong>Configure AI monetization</strong> from the right panel to set pricing and access policies. Each protection pack defines the pricing, agent policies, accepted payment methods, and license terms that apply to a defined set of content paths. You can create multiple protection packs and apply different pricing to different content zones within the same distribution. Once created, associate the protection pack with your web ACL by opening the web ACL and choosing <strong>Add protection pack</strong>.</p>
<p>For each agent verification tier within the pack, you can assign one of six actions: <strong>Monetize</strong> (return a 402 with pricing), <strong>Allow</strong> (grant free access), <strong>Block</strong> (deny access entirely), <strong>Count</strong> (log without charging), <strong>CAPTCHA </strong>(present a puzzle to verify a human sender), or <strong>Challenge </strong>(run a silent check to verify the client is a browser, not a bot).</p>
<p><img decoding="async" loading="lazy" class="alignnone wp-image-104528 size-full" src="http://13.127.31.42/wp-content/uploads/2026/06/1213987938308918-3a.png" alt="" width="1279" height="1531"></p>
<p>In the <strong>Edit monetization configuration</strong> page, configure the following:</p>
<p>Under<strong> Payment settlement</strong>, select one or more blockchain networks for stablecoin payments. Any wallet address on the supported networks is accepted, whether self-managed or hosted by a wallet provider such as Coinbase. For each network, provide your wallet address and set a <strong>Base price per page</strong> in USDC. You can add multiple networks using <strong>Add network</strong>. AWS does not process payments or take a fee on content revenue; disbursement is self-managed or managed by your wallet provider.</p>
<p>When a <strong>Monetize</strong> rule matches an incoming request, AWS WAF returns an HTTP 402 Payment Required response. The response body contains a machine-readable price manifest in JSON format using the x402 open protocol for machine-to-machine payments. The manifest includes the content price in USDC, accepted blockchain networks such as Base and Solana, the destination wallet address, the maximum payment timeout, and the payment scheme.</p>
<p>Any x402-compatible agent runtime can complete this flow autonomously. The client submits a signed payment authorization on their payment network of choice. AWS WAF verifies it, fetches the content, integrates with third-party facilitator services for settling the payment on-chain, and serves the response.</p>
<p>Note that the <strong>Monetize</strong> action is supported exclusively for web ACLs associated with Amazon CloudFront distributions. Adding a <strong>Monetize</strong> rule to a regional web ACL is not supported.</p>
<p>Since the <strong>Currency mode</strong> toggle is available directly in the monetization configuration page, you can switch between <strong>Real</strong> and <strong>Test</strong> mode at any time. Before going live, use test mode on non-production traffic to validate pricing, wallet configuration, and x402 payment flows. Note that test mode still enforces x402 payments, but those payments can be made on testnets such as Base Sepolia or Solana Devnet using test funds obtained from faucets such as faucet.circle.com. To activate test mode, toggle <strong>Currency mode</strong> to <strong>Test</strong> in your protection pack configuration. AWS WAF returns real price manifests and runs the full payment flow identically to production on the configured test chain. All events are logged with <code>CurrencyMode: TEST</code>. When satisfied with the configuration, toggle Currency mode back to Real to begin processing real payments.</p>
<p>Once you have switched <strong>Currency mode</strong> to <strong>Real</strong>, navigate to <strong>AI access monetization</strong> in the left navigation pane to track monetization outcomes in real time. Note that the <strong>AI access monetization</strong> dashboard only reflects activity from real currency mode and does not display test transactions.</p>
<p><img decoding="async" loading="lazy" class="alignnone wp-image-104468 size-full" src="http://13.127.31.42/wp-content/uploads/2026/06/1213987938308918-2b.png" alt="" width="1924" height="1906"></p>
<p>The Revenue dashboard shows <strong>Total revenue</strong>, revenue broken down by <strong>Verified bots</strong> and <strong>Unverified bots</strong>, and <strong>Avg. per request. </strong>The<strong> Top revenue sources</strong> panel groups earnings by bot category, and the AI access patterns panel ranks content paths by revenue generated. Use the <strong>Settlements</strong> tab to reconcile payments by provider and review payment method distribution and failed payment attempts.</p>
<p><span style="text-decoration: underline"><strong>Now Available</strong></span><br /> AI traffic monetization is available now for Amazon CloudFront customers at no additional charge beyond standard AWS WAF pricing. The capability is available in all edge locations where AWS WAF web ACLs are associated with Amazon CloudFront distributions.</p>
<p>To learn more about AI traffic monetization, see the <a href="https://docs.aws.amazon.com/waf/latest/developerguide/waf-ai-traffic-monetization.html">AWS WAF Developer Guide</a>.</p>
<p><a href="https://www.linkedin.com/in/esrakayabali/">— Esra</a></div><p>The post <a href="https://www.azalio.io/aws-waf-adds-ai-traffic-monetization-capability-to-help-content-owners-charge-ai-bots-for-content-access/">AWS WAF adds AI traffic monetization capability to help content owners charge AI bots for content access</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>AWS Weekly Roundup: AWS FinOps Agent in preview, Gemma 4 on Bedrock, Kiro Pro Max, and more (June 15, 2026)</title>
		<link>https://www.azalio.io/aws-weekly-roundup-aws-finops-agent-in-preview-gemma-4-on-bedrock-kiro-pro-max-and-more-june-15-2026/</link>
		
		<dc:creator><![CDATA[Azalio tdshpsk]]></dc:creator>
		<pubDate>Mon, 15 Jun 2026 11:59:50 +0000</pubDate>
				<category><![CDATA[AWS]]></category>
		<guid isPermaLink="false">https://www.azalio.io/aws-weekly-roundup-aws-finops-agent-in-preview-gemma-4-on-bedrock-kiro-pro-max-and-more-june-15-2026/</guid>

					<description><![CDATA[<p>This week, New York City is hosting AWS Summit, bringing together builders, customers, and AWS teams for a full day of announcements, demos, and technical sessions at the Javits Center. I wrote blog posts for some of the Summit launches, so I am excited to see them go live this week. I just won’t be [&#8230;]</p>
<p>The post <a href="https://www.azalio.io/aws-weekly-roundup-aws-finops-agent-in-preview-gemma-4-on-bedrock-kiro-pro-max-and-more-june-15-2026/">AWS Weekly Roundup: AWS FinOps Agent in preview, Gemma 4 on Bedrock, Kiro Pro Max, and more (June 15, 2026)</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></description>
										<content:encoded><![CDATA[<div>
<p>This week, New York City is hosting <a href="https://aws.amazon.com/events/summits/new-york/">AWS Summit</a>, bringing together builders, customers, and AWS teams for a full day of announcements, demos, and technical sessions at the Javits Center. I wrote blog posts for some of the Summit launches, so I am excited to see them go live this week. I just won’t be watching from the Javits Center. I’ll be at a four-day music festival, following the launches on my phone while trying to figure out how to put up a tent. If you weren’t able to attend in person like me, the keynote <a href="https://pages.awscloud.com/aws-summit-nyc-livestream-2026-registration.html">livestream</a> is available on June 17, with Dr. Swami Sivasubramanian, VP of Agentic AI, and Chet Kapoor, VP of Security Services and Observability, covering new capabilities across developer tools, AI infrastructure, and security.</p>
<p><img decoding="async" loading="lazy" class="alignnone wp-image-104579 size-full" src="http://www.azalio.io/wp-content/uploads/2026/06/WIR-Why1.5d9838d88ff23b99b4fe14be6598b68ef4493215.png" alt="" width="800" height="533"></p>
<p>Here’s what happened this week.</p>
<p><span style="text-decoration: underline"><strong>Headlines</strong></span><br /><a href="https://aws.amazon.com/blogs/machine-learning/how-frontier-teams-are-reinventing-ai-native-development/">How frontier teams are reinventing AI-native development</a> — Swami published a detailed post this week drawing on data from experiments across hundreds of Amazon engineering teams. The findings are worth reading carefully if you are thinking about how to structure AI adoption on your own team.</p>
<p>A six-engineer team rebuilt the Amazon Bedrock inference engine in 76 days, a project originally scoped for 30 developers over 12 to 18 months. The median productivity gain across structured pilots with Amazon Stores teams was 4.5x in normalized deployment velocity, with some teams exceeding 10x. Perfect Order Experience went from a two-week feature cycle to shipping in an afternoon. WW Grocery cut design document creation from five days to a few hours.</p>
<p>The post distills these results into five practices for becoming a frontier team. First, invest in agent context: build steering files, coding standards, and structured repositories before writing production code. Second, expect an initial slowdown while workflows are restructured, and push through it. Third, maintain a steady backlog of well-scoped tasks so agents can run in parallel without constant supervision. Fourth, make intent explicit through structured specifications before code generation begins. Fifth, shift testing left so agents can self-correct before code reaches the pipeline.</p>
<p>The post closes with a note that commit velocity is only part of the picture, and that a follow-up will cover release management, operations, security operations, and EOL upgrades.</p>
<p><a href="https://aws.amazon.com/about-aws/whats-new/2026/06/aws-finops-agent-preview/">AWS FinOps Agent is now available in preview</a> — AWS FinOps Agent is a new agent for FinOps practitioners and engineering teams that answers cost questions, surfaces optimization opportunities, investigates cost anomalies, and runs recurring FinOps workflows on a defined schedule. You can use it to query your AWS costs, generate cost reports for finance and engineering teams, and surface rightsizing, idle resource, and Savings Plans recommendations from AWS Cost Optimization Hub and AWS Compute Optimizer. The agent can open Jira tickets on your behalf based on those recommendations. When a cost anomaly is detected, FinOps Agent can automatically investigate the root cause and post findings to a Slack channel.</p>
<p><span style="text-decoration: underline"><strong>Last week’s launches</strong></span><br /> I’ll start with one I wrote this week, then cover the other launches that caught my attention:</p>
<ul>
<li><a href="https://aws.amazon.com/blogs/aws/now-available-amazon-ec2-m9g-and-m9gd-instances-powered-by-new-aws-graviton5-processors/">Amazon EC2 M9g and M9gd instances are now generally available</a> — Powered by AWS Graviton5 processors and built on the sixth-generation AWS Nitro System, M9g instances deliver up to 25% better compute performance compared to Graviton4-based instances, with up to 35% faster performance for web applications, up to 35% for machine learning inference, and up to 30% for databases. Graviton5 is the first processor in the AWS fleet to support PCIe Gen6 and DDR5-8800 memory, and includes a 5x larger L3 cache compared to the previous generation. M9g and M9gd instances offer up to 15% higher network bandwidth and 20% higher Amazon EBS bandwidth on average across sizes compared to M8g. This release also introduces the Nitro Isolation Engine, an enhancement to the Nitro System that uses formal verification to provide mathematically proven isolation between virtual machines — establishing Nitro as the first formally verified cloud hypervisor. M9gd instances add up to 11.4 TB of NVMe SSD local storage with 30% higher IOPS compared to M8gd. Both instance types support Instance Bandwidth Configuration (IBC) for adjusting bandwidth allocation between EBS and VPC networking by up to 25%.</li>
<li><a href="https://aws.amazon.com/blogs/aws/anthropic-claude-fable-5-on-aws-mythos-class-capabilities-with-built-in-safeguards-now-available/">Anthropic Claude Fable 5 on Amazon Bedrock</a> — Claude Fable 5 launched on Amazon Bedrock on June 9, bringing extended asynchronous task execution, advanced vision capabilities across diagrams, charts, and PDFs, and proactive self-verification. Access requires opting into data sharing via the Data Retention API before invoking the model; Anthropic requires 30-day retention of inputs and outputs for Mythos-class models. <strong>Important note on availability:</strong> On June 12, Anthropic asked AWS to revoke access to Claude Fable 5 and Claude Mythos 5 for all users to support compliance with a US Government export control directive. All other models, including Opus 4.8, are unaffected. Read the <a href="https://www.anthropic.com/news/fable-mythos-access">Anthropic statement</a> for details. AWS will share further updates as they become available.</li>
<li><a href="https://aws.amazon.com/about-aws/whats-new/2026/06/gemma-4-amazon-bedrock/">Gemma 4 models are now available on Amazon Bedrock</a> — The Gemma 4 family from Google DeepMind is now available on Amazon Bedrock across three variants: Gemma 4 31B (dense, 256K-token context window, suited for reasoning and coding workloads), Gemma 4 26B-A4B (mixture-of-experts architecture, targeting cost- and latency-sensitive workloads), and Gemma 4 E2B (smallest variant, designed for low-latency interactive use cases). All three support native function calling, structured output, reasoning, response streaming, multimodal input across text, image, video, and audio, and more than 35 languages.</li>
<li><a href="https://aws.amazon.com/about-aws/whats-new/2026/06/opensearch-agentic-observability-mcp-app/">Amazon OpenSearch Service launches MCP Apps for agentic observability</a> — Amazon OpenSearch Service now supports MCP Apps, enabling observability workflows inside compatible agentic IDEs including Claude Desktop and VS Code. An AI agent in your local environment can investigate incidents using logs, traces, metrics, and alerts stored in OpenSearch domains, collections, and Amazon Managed Service for Prometheus. Each MCP App tool call returns a dual response: a text summary for the agent to reason over and an interactive visualization rendered in the same conversation thread. Available MCP App tools cover log, metrics, and trace investigation; service performance; topology; dynamic visualizations; agent health; cluster health; and instrumentation scoring.</li>
</ul>
<p><span style="text-decoration: underline"><strong>Other AWS news</strong></span><br /> Here are some additional posts and updates you may find useful:</p>
<ul>
<li><a href="https://aws.amazon.com/blogs/developer/aws-cli-v1-maintenance-mode-announcing-changes-to-dependency-updates/">AWS CLI v1 enters maintenance mode</a> — When CLI v1 enters maintenance mode, the botocore and s3transfer dependencies will be vendored directly into the CLI v1 codebase rather than installed as separate packages. This means upgrading CLI v1 will no longer update the standalone botocore or s3transfer packages, and installing those packages independently will have no effect on the versions used by CLI v1. Environments with both CLI v1 and boto3 installed will contain separate copies of these libraries. New CLI v1 releases will be limited to critical bug fixes and security issues. The recommended path is to migrate to AWS CLI v2.</li>
<li><a href="https://aws.amazon.com/about-aws/whats-new/2026/06/aws-workload-credentials-provider/">AWS Workload Credentials Provider is now available</a> — AWS has launched a new Workload Credentials Provider that enables workloads to obtain short-term AWS credentials without requiring long-term access keys. This supports credential management for applications running outside of AWS, giving teams a way to follow least-privilege access patterns for workloads in third-party or on-premises environments.</li>
<li><a href="https://kiro.dev/blog/kiro-pro-max/">Kiro Pro Max is now available</a> — Kiro has introduced a new Pro Max tier, adding higher usage limits, access to the latest frontier models, and additional agentic capabilities for development teams. Kiro Pro Max is designed for professional developers who need sustained, high-volume use across coding, specification generation, and agent-driven tasks.</li>
</ul>
<p><span style="text-decoration: underline"><strong>Upcoming AWS events<br /></strong></span>Check your calendar and sign up for upcoming AWS events:</p>
<ul>
<li><a href="https://aws.amazon.com/events/summits/">AWS Summits</a> — AWS Summits are free in-person events covering cloud and AI. Coming up: <a href="https://aws.amazon.com/events/summits/new-york/">New York City</a> (June 17), <a href="https://aws.amazon.com/events/summits/hongkong/">Hong Kong</a> (June 17), <a href="https://aws.amazon.com/events/summits/shanghai/">Shanghai</a> (June 23-24), <a href="https://aws.amazon.com/jp/events/summits/japan/">Japan</a> (June 25), <a href="https://aws.amazon.com/events/summits/washington-dc/">Washington, D.C.</a> (June 30 – July 1), <a href="https://aws.amazon.com/tw/events/summits/taipei/">Taipei</a> (July 15), and <a href="https://aws.amazon.com/es/events/summits/bogota/">Bogotá</a> (July 30).</li>
<li><a href="https://aws.amazon.com/events/community-day/">AWS Community Days</a> — Community-led conferences planned and delivered by community leaders. Upcoming events include <a href="https://awscommunitydayeast.ca/">Montreal, Canada</a> (June 20), <a href="https://www.midwestcommunityday.com/">Indianapolis, USA</a> (June 24), <a href="https://2026-summer.awscommunityday.cn/">Hangzhou, China</a> (June 28), <a href="https://acd.awsugblr.in/">Bengaluru, India</a> (July 11), and <a href="https://communityday.awscmr.com/en">Yaoundé, Cameroon</a> (July 25).</li>
</ul>
<p>Visit the <a href="https://builder.aws.com/?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">AWS Builder Center</a> to meet other builders, contribute solutions, and find resources that help you keep building. You can also browse upcoming <a href="https://aws.amazon.com/events/explore-aws-events/?refid=e61dee65-4ce8-4738-84db-75305c9cd4fe">AWS-led in-person and virtual events</a>, plus <a href="https://builder.aws.com/connect/events?trk=e61dee65-4ce8-4738-84db-75305c9cd4fe&amp;sc_channel=el">developer-focused sessions</a>.</p>
<p><a href="https://www.linkedin.com/in/esrakayabali/">— Esra</a> </p>
<p><em>This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!</em></p>
</div><p>The post <a href="https://www.azalio.io/aws-weekly-roundup-aws-finops-agent-in-preview-gemma-4-on-bedrock-kiro-pro-max-and-more-june-15-2026/">AWS Weekly Roundup: AWS FinOps Agent in preview, Gemma 4 on Bedrock, Kiro Pro Max, and more (June 15, 2026)</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>AI needs young developers – and old developers</title>
		<link>https://www.azalio.io/ai-needs-young-developers-and-old-developers/</link>
		
		<dc:creator><![CDATA[Azalio tdshpsk]]></dc:creator>
		<pubDate>Mon, 15 Jun 2026 09:59:42 +0000</pubDate>
				<category><![CDATA[Cloud]]></category>
		<guid isPermaLink="false">https://www.azalio.io/ai-needs-young-developers-and-old-developers/</guid>

					<description><![CDATA[<p>Enterprises are increasingly investing copious amounts of cash in AI without a lot to show for it. This could be, in part, because the wrong people are leading the change. As I’ve argued before⁠, AI isn’t likely to eliminate developers so much as change what we need from them. For example, we keep asking whether [&#8230;]</p>
<p>The post <a href="https://www.azalio.io/ai-needs-young-developers-and-old-developers/">AI needs young developers – and old developers</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></description>
										<content:encoded><![CDATA[<div>
<div id="remove_no_follow">
<div class="grid grid--cols-10@md grid--cols-8@lg article-column">
<div class="col-12 col-10@md col-6@lg col-start-3@lg">
<div class="article-column__content">
<section class="wp-block-bigbite-multi-title">
<div class="container"></div>
</section>
<p>Enterprises are increasingly investing copious amounts of cash in AI without a lot to show for it. This could be, in part, because the wrong people are leading the change.</p>
<p>As I’ve<a href="https://www.infoworld.com/article/3955073/ai-will-require-more-software-developers-not-fewer.html"> argued before</a>⁠, AI isn’t likely to eliminate developers so much as change what we need from them. For example, we keep asking whether junior developers are needed in a world where <a href="https://www.infoworld.com/article/2335213/large-language-models-the-foundations-of-generative-ai.html">large language models</a> can write code faster and cheaper. What this overlooks is the reality that these younger developers and their relative inexperience may be exactly what we need to rewrite the rules of software development.</p>
<p>This thought hit me while reading <a href="https://www.linkedin.com/posts/jamesgovernor_years-ago-i-walked-out-of-a-tech-conference-activity-7470099784259936256-6Ism?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAAAAFGQBusM9dhqroHv1eNSAFO6rYEe_n1M">James Governor’s riff</a> on something <a href="https://www.linkedin.com/feed/update/urn:li:activity:7469848292647075840/">Ben Griffiths⁠ wrote</a> about our industry’s habit of confusing age with authority. Griffiths remembered sitting through a conference talk in which a speaker tried to shame a young audience for not recognizing some of the older men who had shaped computing. The irony, Ben noted, was that many of those “old men” had done their world-changing work when they were younger than the people being lectured. Bill Joy wrote vi when he was 22, John Carmack created Doom at 23, Linus Torvalds launched Linux at 22, etc. Many of our industry’s titans made their biggest contributions before they had decades of experience.</p>
<p>The point isn’t that young people are smarter. They’re not. The point isn’t that the key to AI success is to ignore more experienced developers. That’s dumb. Rather, it’s a suggestion that Griffiths’ larger point is right: At the beginning of big shifts, experience can be a mixed blessing. It can help you see risk, but it can also make you overconfident in old ways. The most successful enterprises will find ways to balance youthful innovation with experienced guardrails.</p>
<h2 class="wp-block-heading"><a></a>The factory doesn’t redesign itself</h2>
<p>Zara Zhang recently pointed to Paul David’s classic 1990 paper, <a href="https://www.almendron.com/tribuna/wp-content/uploads/2018/03/the-dynamo-and-the-computer-an-historical-perspective-on-the-modern-productivity-paradox.pdf">“The Dynamo and the Computer,”</a>⁠ as a way to understand why so many companies have “adopted” AI without much to show for it. David’s argument, simplified, is that electricity didn’t immediately transform factories. For a long time, factories simply swapped out the central steam engine for an electric motor while keeping the same layout, the same workflows, and the same assumptions.</p>
<p>Electricity was new, but we largely stifled its potential by force-fitting it into old factory systems.</p>
<p>The big productivity gains came later, when factories stopped treating electricity as a cleaner steam engine and started redesigning work around smaller motors distributed throughout the factory. Once each machine could have its own motor, the factory no longer had to organize itself around a single driveshaft. Work could instead be reorganized around the flow of production.</p>
<p>That’s a decent description of where many enterprises are with AI. Enterprises today are buying copilot licenses by the thousands, wiring agents into existing applications, etc., and then wondering why the<a href="https://www.infoworld.com/article/4151572/the-starkly-uneven-reality-of-enterprise-ai-adoption.html"> results are so uneven</a>, as I’ve written⁠. This is the equivalent of swapping the steam engine for an electric one and declaring that the AI modernization work is done. It’s not. Not even close.</p>
<p>The real payoff won’t come from asking AI to write the same tickets a bit faster. It will instead come from changing how teams define work and how (and what) developers build. The “factory” has to change.</p>
<p>So here’s the uncomfortable question: Who is most likely to build the new factory?</p>
<h2 class="wp-block-heading"><a></a>Experience cuts both ways</h2>
<p>There’s an obvious danger in romanticizing youth. Plenty of bad software has been written by people with unlimited confidence and limited context. Enterprises need software that works, yes, but “works” also means it complies, scales, respects security boundaries, and more.</p>
<p>This is where experienced developers matter. A lot.</p>
<p><a href="https://www.infoworld.com/article/4176534/ai-coding-agents-need-good-software-engineers.html">As I pointed out recently</a>, the agent era makes engineering judgment more important than ever. After all, AI makes it easier to generate code, but easier code generation can become easier technical debt generation. Hence, the limiting factor becomes less of “Can we create something?” and more of “Can we create the right thing, in the right place, with the right constraints?” Taste is required, in other words.</p>
<p>Senior engineers are often better at seeing those constraints because their experience gives them “taste.” They know why the weird validation rule exists, and they remember the customer who depended on the undocumented behavior. They understand why a simple schema change can turn into a multi-week migration.</p>
<p>But experience also has a shadow side, because it can make the current process feel inevitable. A senior engineer may see an AI assistant as a faster autocomplete because that’s the easiest way to fit AI into their existing mental model. A junior developer, less invested in the old workflow, may ask the more interesting questions: Why are we doing this ticket at all? Why isn’t the spec executable? Why can’t the agent generate the test harness first?</p>
<p>It’s not that the more experienced developers don’t know these questions. Rather, they may simply not have the energy to rage against the machine, as it were.</p>
<h2 class="wp-block-heading"><a></a>The value of inexperience</h2>
<p>The worst way to use junior developers in the AI era is to treat them as cheaper versions of senior developers. That was always a bad idea, but AI makes it worse. If the job is “take this ticket, generate some code, and send it to a senior person for review,” the junior developer becomes a human wrapper around a coding assistant. That helps no one. The junior doesn’t learn much, the senior gets buried in review, and the enterprise ends up with more code, which, <a href="https://www.infoworld.com/article/4181971/making-sense-of-too-much-code.html">as I’ve said</a>, is hardly a good thing.</p>
<p>Instead, junior developers should be given room to explore new workflows, with just enough oversight from experienced colleagues. That might mean giving these newer developers interesting questions to answer, such as:</p>
<ul class="wp-block-list">
<li>How would we redesign onboarding if every internal API had an AI-readable contract and examples that actually worked?</li>
<li>How would we change code review if the agent produced a change summary, test evidence, dependency risk, and rollback plan with every pull request?</li>
<li>How would we build features if product requirements were written as executable acceptance tests rather than vague prose?</li>
<li>How would we reduce toil if agents could safely perform routine migrations, dependency updates, or incident triage within clearly defined boundaries?</li>
</ul>
<p>These are not toy problems. They’re not “junior work.” They’re exactly the sort of process redesign that enterprises need but generally avoid because everyone is too busy running on the existing hamster wheel.</p>
<h2 class="wp-block-heading"><a></a>Finding the balance</h2>
<p>So what should engineering leaders do? First, stop treating AI adoption as an individual productivity contest. We seem to be moving quickly away from the idea that “lots of tokens” equals “great engineer,” but the fact that we even flirted with it is damning. I love how <a href="https://x.com/svpino/status/2064326898034118785?s=20">Santiago Valdarrama eviscerates this vanity metric</a>: “Measuring AI productivity in number of lines written is a stupid mistake. One day, everyone will have always been against this.” Instead we should be asking questions like, “What part of our software delivery process no longer makes sense?” AI’s biggest gains will come when we change how we specify, test, review, and ship software.</p>
<p>Second, mix up your AI workflow teams. No, not committees or PowerPoint-producing centers of excellence. I’m talking about combining two or three newer developers who are already fluent in AI-native tools with two or three senior engineers who understand production, security, architecture, and organizational constraints. Then give them a real workflow to redesign, such as dependency upgrades or test creation.</p>
<p>Third, make the senior engineer’s job less about saying no and more about defining the guardrails within which others can say yes. <a href="https://www.infoworld.com/article/4118288/ai-coding-requires-developers-to-become-better-managers.html">I’ve argued that golden paths are key</a> to using AI effectively. Good senior engineers should define the paved roads: approved patterns, test requirements, observability standards, etc. Then let junior developers and <a href="https://www.infoworld.com/article/3812583/what-you-need-to-know-about-developing-ai-agents.html">agents </a>move quickly inside those boundaries.</p>
<p>Fourth, reward deletion. This may be the most important point. Going back to the factory electricity metaphor, we’ll fail with AI modernization if we simply add AI without removing outdated processes.</p>
<h2 class="wp-block-heading"><a></a>Bring everyone to the table</h2>
<p>The future of software development won’t belong to the young. It won’t belong to the old, either. It will belong to teams that combine the talents of both.</p>
<p>Newer developers often bring impatience. They’re less likely to accept the existing workflow as sacred. They’re more likely to try weird tools, compose them in unexpected ways, and wonder why enterprise software development feels like a ritualized exercise in waiting for permission.</p>
<p>Experienced developers bring judgment. They know that software has users, auditors, attackers, budgets, latency, history, and consequences. They know that the right answer is often boring, and boring is good.</p>
<p>Enterprises need both. They need the developer who asks why the factory is still organized around the old drive shaft, and they need the developer who knows which machines will kill someone if moved casually. In sum, every development team needs people who know why the old system exists… as well as those who don’t.</p>
</div>
</div>
</div>
</div>
</div><p>The post <a href="https://www.azalio.io/ai-needs-young-developers-and-old-developers/">AI needs young developers – and old developers</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>33 LLM metrics to watch closely</title>
		<link>https://www.azalio.io/33-llm-metrics-to-watch-closely/</link>
		
		<dc:creator><![CDATA[Azalio tdshpsk]]></dc:creator>
		<pubDate>Mon, 15 Jun 2026 09:59:42 +0000</pubDate>
				<category><![CDATA[Cloud]]></category>
		<guid isPermaLink="false">https://www.azalio.io/33-llm-metrics-to-watch-closely/</guid>

					<description><![CDATA[<p>We’ve all heard the mantra from the quants in the business community: you can’t manage what you can’t measure. And if that’s true for human intelligence, it should be true for the artificial kind too. How do we measure agents and large language models (LLMs)? We’re just beginning to come up with statistical metrics. Here are [&#8230;]</p>
<p>The post <a href="https://www.azalio.io/33-llm-metrics-to-watch-closely/">33 LLM metrics to watch closely</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></description>
										<content:encoded><![CDATA[<div>
<div id="remove_no_follow">
<div class="grid grid--cols-10@md grid--cols-8@lg article-column">
<div class="col-12 col-10@md col-6@lg col-start-3@lg">
<div class="article-column__content">
<section class="wp-block-bigbite-multi-title">
<div class="container"></div>
</section>
<p>We’ve all heard the mantra from the quants in the business community: you can’t manage what you can’t measure. And if that’s true for human intelligence, it should be true for the <a href="https://www.infoworld.com/article/4061121/a-brief-history-of-ai.html" data-type="link" data-id="https://www.infoworld.com/article/4061121/a-brief-history-of-ai.html">artificial kind</a> too.</p>
<p>How do we measure <a href="https://www.infoworld.com/article/3812583/what-you-need-to-know-about-developing-ai-agents.html" data-type="link" data-id="https://www.infoworld.com/article/3812583/what-you-need-to-know-about-developing-ai-agents.html">agents</a> and <a href="https://www.infoworld.com/article/2335213/large-language-models-the-foundations-of-generative-ai.html" data-type="link" data-id="https://www.infoworld.com/article/2335213/large-language-models-the-foundations-of-generative-ai.html">large language models</a> (LLMs)? We’re just beginning to come up with statistical metrics. Here are several of the most common metrics that designers and users toss about when they’re evaluating a model.</p>
<h6 class="wp-block-heading">[ See also: <a href="https://www.infoworld.com/article/4152738/27-questions-to-ask-before-choosing-an-llm.html" data-type="link" data-id="https://www.infoworld.com/article/4152738/27-questions-to-ask-before-choosing-an-llm.html">27 questions to ask before choosing an LLM</a> ]</h6>
<h2 class="wp-block-heading" id="time-to-first-token">Time to first token</h2>
<p>How long does it take to generate the first token? For real-time applications with time constraints, faster responses can be essential. It’s well-known that people hate waiting even a few milliseconds. The teams that develop user interfaces learned decades ago that it’s important for the software to respond quickly when a human is waiting for an answer. Even a few seconds of delay mean that the human will wander off to another window to check some email or place some bet on a prediction market. Time to first token is a good measure for models that will be working directly with the fickle human intelligences and their latent attention deficit disorder.</p>
<h2 class="wp-block-heading" id="time-per-output-token">Time per output token</h2>
<p>Take the total time it takes to respond and divide by the total number of tokens. The time to first token measures how long it takes to start a response and this measures the average speed as the model through all of the tokens. In basic LLMs, this value is generally fairly constant. Once the prefill is done and the LLM enters the decode phase, the output tokens usually appear at a constant stream. When the output is long enough, the startup time to first token is amortized away. In some of the more complicated architectures with loops for planning or gathering data from various tools, the average speed can vary as the model shifts in and out of making agentic decisions.</p>
<h2 class="wp-block-heading" id="tokens-per-second">Tokens per second</h2>
<p>This is just the reciprocal of the average time per token. Sometimes it is reported separately for different stages in the pipeline.</p>
<h2 class="wp-block-heading" id="throughput-requests-per-minute">Throughput (requests per minute)</h2>
<p>If a system supports more than a single user, tracking the number of different requests makes sense. These throughput numbers can be quite useful for measuring the power of some of the newer pipelines that are more efficient when they’re answering multiple prompts at the same time.</p>
<h2 class="wp-block-heading" id="error-rate">Error rate</h2>
<p>Not every request gets an answer. The error rate tracks how often rate limits, timeouts, or model “refusals” get in the way. Better accounting tracks each independently because the number of failures in each category can be very different.</p>
<h2 class="wp-block-heading" id="token-efficiency">Token efficiency</h2>
<p>Not all work tokens are visible and not all tokens are part of the final outcome. This measures how much work is done to produce the final result. As models become more complex or agentic and the pipelines become more sophisticated, the efficiency tends to drop. Agentic reasoning and strategic planning typically require more tokens that don’t appear in the final answer. This is generally a measure of how expensive a model might be to run.</p>
<h2 class="wp-block-heading" id="tail-latency">Tail latency</h2>
<p>It’s all well and good to measure the average time to answer, but in some cases a few very slow responses can really color people’s judgement. Some applications require good performance all of the time. Would you want to ride in an autonomous car that gets steering instructions very quickly “on average” instead of always? What if that’s only 99% of the time? Tail latency uses a mixture of queuing theory and detailed measurements to track the worst moments in the long tail of the latency graph. It’s useful when even occasional delays are problematic. </p>
<h2 class="wp-block-heading" id="total-cost-of-ownership">Total cost of ownership</h2>
<p>Projects that use an API or buy output from providers just look at the cost per 1M tokens. They’re effectively renters. The groups that are buying GPUs and paying for electricity, though, will add up these costs and other indirect costs like depreciation and maintenance to come up with a number that estimates how much the tokens really cost to produce. This value will depend upon demand and utilization rates—that is, on how many users are sending in prompts and how efficiently the model fits in a particular GPU and its RAM.</p>
<h2 class="wp-block-heading" id="parameters">Parameters</h2>
<p>Many models have numbers in their name followed by a B. This is meant to roughly capture the number of parameters, or the number of variables the model uses to generate outputs from inputs. The number “70B” means that there are about 70 billion parameters in the model. This is a good estimate for the complexity of the model and the size of the training set that has been stuffed into it. Generally bigger numbers mean a larger amount of information is hiding inside the model. It often means that it will take a bigger GPU with more RAM to generate an answer with it. It’s not a very precise number, though, because there are many other areas of the architecture that can influence whether the model can generate the answer you want inside your budget. There continue to be advances and it’s not uncommon for someone to claim that a new model with X parameters is better than an old model with 2X or 3X parameters.</p>
<h2 class="wp-block-heading" id="hallucination-rate">Hallucination rate</h2>
<p>While everyone wants LLMs to generate accurate output, measuring it can be difficult because deciding what’s accurate is sometimes complicated. One approach is to ask the LLM to summarize a document. Then another model evaluates how well the summary matches the original. While this may not catch all subtle slips, it will capture enough of the worst departures from reality. Some researchers have built complex test sets with curated answers. The LLMs that deliver the expected answers get the highest scores. Some common benchmarks are <a href="https://github.com/sylinrl/TruthfulQA">TruthfulQA</a>, <a href="https://arxiv.org/abs/2305.11747">HaluEval</a>, <a href="https://github.com/salesforce/QAFactEval">QAFactEval</a>, and Vectara’s <a href="https://github.com/vectara/hallucination-leaderboard">Hallucination Evaluation Model</a> (HHEM).</p>
<h2 class="wp-block-heading" id="toxicity-and-bias-scores">Toxicity and bias scores</h2>
<p>If measuring accuracy is difficult, building a metric to detect toxic or biased output is even more challenging because the definitions can be so protean. Still, some teams have built LLMs that key on particular concepts or word choices. They can detect some of the most obvious red flags that could generate political trouble. Some well-known versions include <a href="https://www.granica.ai/blog/granica-launches-ai-data-safety-solution-granica-screen-on-aws-marketplace">Granica Screen</a> and <a href="https://perspectiveapi.com/">Perspective API</a>.</p>
<h2 class="wp-block-heading" id="pii-leakage">PII leakage</h2>
<p>One of the biggest fears is that LLMs will somehow absorb information that may be considered personal and private. Some of the simplest measures can be as simple as regular expressions that look for the sixteen digit numbers used for credit card transactions. Many of the model builders work on eliminating personally identifiable information (PII) from the training set before beginning.</p>
<h2 class="wp-block-heading" id="tool-calling-accuracy">Tool-calling accuracy</h2>
<p>As models grow more complex and agentic, they often gain access to various tools or <a href="https://www.infoworld.com/article/4029634/what-is-model-context-protocol-how-mcp-bridges-ai-and-external-services.html" data-type="link" data-id="https://www.infoworld.com/article/4029634/what-is-model-context-protocol-how-mcp-bridges-ai-and-external-services.html">Model Context Protocol</a> (MCP) gateways that can help them find the best answers. Not all models take advantage of this help. The tool-calling accuracy scores count how often the models choose the best tool for the job. One particular example of this measurement is <a href="https://gorilla.cs.berkeley.edu/leaderboard.html">BFCL</a> (Berkeley Function Calling Leaderboard).</p>
<h2 class="wp-block-heading" id="prompt-sensitivity">Prompt sensitivity</h2>
<p>The value captures how small changes in the language of the prompt induces the model to produce different results. It’s like a derivative from calculus class, although it’s generally computed experimentally using some collection of test prompts. There are a number of different approaches that depend upon different types of changes. Some test sets are built with small rephrasing of the request that are semantically the same. Others mix together different ways of specifying the problem, some with examples, say, and some without. Some specific examples include <a href="https://arxiv.org/html/2509.13680">PromptSE</a> and <a href="https://arxiv.org/abs/2410.12405">ProSA</a>.</p>
<h2 class="wp-block-heading" id="semantic-similarity-and-conciseness">Semantic similarity and conciseness</h2>
<p>Some metrics evaluate the answers by comparing them to a set of gold standard answers. This often involves feeding them to a <a href="https://www.infoworld.com/article/2335281/vector-databases-in-llms-and-search.html" data-type="link" data-id="https://www.infoworld.com/article/2335281/vector-databases-in-llms-and-search.html">vector embedding</a> model and searching a <a href="https://www.infoworld.com/article/2335814/what-is-retrieval-augmented-generation-more-accurate-and-reliable-llms.html" data-type="link" data-id="https://www.infoworld.com/article/2335814/what-is-retrieval-augmented-generation-more-accurate-and-reliable-llms.html">retrieval-augmented generation</a> (RAG) database for similar answers. This can track how concise or fluffy the answers might be as well as looking for how much variability might be introduced through changing parameters like the temperature. One common example is the <a href="https://bertscore.com/">BERTScore</a>.</p>
<h2 class="wp-block-heading" id="grounding-score">Grounding score</h2>
<p>Many systems that combine an LLM with a vector search tool for RAG measure the effectiveness of the combination with a benchmark like the grounding score. The LLM is presented with extra data from the vector search and the benchmark measures how closely it follows this extra information. That is, how much of the answer comes from the provided source documents and how much is synthesized using the data in its training set. Some examples include <a href="https://aclanthology.org/2024.eacl-demo.16/">RAGAS</a>, <a href="https://www.trulens.org/">TruLens</a>, <a href="https://ares-ai.vercel.app/">ARES</a> (Automated RAG Evaluation System), <a href="https://github.com/chen700564/RGB">RGB</a> (Retrieval-Augmented Generation Benchmark), <a href="https://arxiv.org/abs/2305.11747">HaluEval</a>, and <a href="https://halluhard.com/">HalluHard</a>. A similar concept is called “context adherence,” “context precision,” “context recall,” or “faithfulness.”</p>
<h2 class="wp-block-heading" id="model-variability">Model variability</h2>
<p>Most LLMs fold in a certain amount of random entropy, and this amount is often controlled by a parameter called the “temperature.” The model variability is a measure of how much the answers will change between runs. Some applications like chatbots require a certain amount of variability because the randomness adds a bit of “life” to the answers. Other applications like those in mission-critical areas like law or medicine will undermine confidence if the answers vary.</p>
<h2 class="wp-block-heading" id="format-compliance-rate">Format compliance rate</h2>
<p>In some roles, LLMs are asked to produce data in strict formats like JSON or CSV. This is often important if the data will be fed into some pipeline for further processing or storage. The format compliance rate tests a number of common formats and measures how often the LLM returns semantically correct data. Agentic systems that glue together multiple LLMs and other tools rely heavily on LLMs with good scores on this benchmark.</p>
<h2 class="wp-block-heading" id="instruction-following">Instruction following</h2>
<p>Some prompts include very specific instructions and the adherence can be measured empirically. For example, some prompts will ask the LLM to produce exactly 300 words or a poem in rhyming couplets. These tests use a collection of sample prompts that ask for answers that can be easily measured. Some specific examples include <a href="https://arxiv.org/abs/2311.07911">IFEval</a>, <a href="https://github.com/YJiangcm/FollowBench">FollowBench</a>, and the <a href="https://gorilla.cs.berkeley.edu/leaderboard.html">BFCL</a> (Berkeley Function Calling Leaderboard), a value that is mentioned above in the section on tool usage.   </p>
<h2 class="wp-block-heading" id="subgoal-success-rate">Subgoal success rate</h2>
<p>As agentic models become more common, it’s helpful to track how well the model performs on each of the various parts of the agent’s strategic plan. All of the metrics here can be broken down and tracked for each of the subgoals.</p>
<h2 class="wp-block-heading" id="plan-stability">Plan stability</h2>
<p>Agentic models start with a plan. Some of them are smart enough to abandon the plan or at least adjust it as the work evolves. Plan stability measures how often the plans are adjusted. A high rate of adjustment could mean that the agent is a bad planner or just flexible or maybe both.</p>
<h2 class="wp-block-heading" id="self-correction-score">Self-correction score</h2>
<p>Some agents are able to dive deeper and recognize their mistakes. The self-correction score measures how often the model will make a mistake and then recognize it, either on its own or after being prompted with the question, “Are you really sure?”</p>
<h2 class="wp-block-heading" id="jailbreak-resistance">Jailbreak resistance</h2>
<p>Some users try to find clever ways to lure the LLM into tossing aside any restrictions on topics or answers. In the past, some LLMs could be fooled by being told the answer was part of a play or a work of fiction. So discussing forbidden subjects wasn’t a problem because it was all pretend. Newer models have more elaborate defenses. Measures of the ability to resist deception include <a href="https://jailbreakbench.github.io/">JailbreakBench</a>, <a href="https://arxiv.org/abs/2410.09024">AgentHarm</a>, and <a href="https://arxiv.org/pdf/2512.05485">Tele-AI-Safety</a>. </p>
<h2 class="wp-block-heading" id="prompt-injection-vulnerability">Prompt injection vulnerability</h2>
<p>Sometimes untrusted data from extra sources or skills may include malicious instructions that can exploit the LLM. Benchmarks such as <a href="https://arxiv.org/abs/2602.20156">Skill-Inject</a> and <a href="https://spikee.ai/">SPIKEE</a> (Simple Prompt Injection Kit for Evaluation and Exploitation) work with known attack vectors and measure how susceptible a model is to targeted prompt injection attacks. </p>
<h2 class="wp-block-heading" id="copyright-infringement-score">Copyright infringement score </h2>
<p>Some LLMs can regurgitate the data in their training corpus in a way that seems like plagiarism or copyright infringement. This can be an issue when the training material wasn’t carefully licensed. The copyright infringement score measures how often the LLM may parrot the training material a bit too closely. Tools for defending against this include <a href="https://www.patronus.ai/blog/introducing-copyright-catcher">CopyrightCatcher</a> and <a href="https://arxiv.org/abs/2402.09910">DE-COP</a>. </p>
<h2 class="wp-block-heading" id="ruler">RULER</h2>
<p>How well can a model extract information from the entire context? <a href="https://github.com/gkamradt/needle-in-a-haystack" data-type="link" data-id="https://github.com/gkamradt/needle-in-a-haystack">NIAH</a> (needle-in-a haystack) <a href="https://arxiv.org/pdf/2504.04713" data-type="link" data-id="https://arxiv.org/pdf/2504.04713">benchmarks</a> measure how well a model can retrieve small, crucial bits of information from long contexts. <a href="https://github.com/NVIDIA/RULER">RULER</a> takes NIAH tests further with the ability to vary the types and quantities of needles, the size of the haystack, and the complexity of the task. </p>
<h2 class="wp-block-heading" id="gsm8k">GSM8K </h2>
<p>The developers of <a href="https://arxiv.org/abs/2110.14168" data-type="link" data-id="https://arxiv.org/abs/2110.14168">GSM8K</a> (Grade School Math 8K) set out to benchmark an LLM’s ability to tackle multistep mathematical problems, so they gathered <a href="https://huggingface.co/datasets/openai/gsm8k">8,500 problems</a> that are common in grade school math classes. While the focus is explicitly on solving math homework problems, the benchmark also measures the ability to construct reasoning chains.</p>
<h2 class="wp-block-heading" id="gpqa">GPQA</h2>
<p>The <a href="https://arxiv.org/pdf/2311.12022">Graduate-Level Google-Proof Q&amp;A</a> is composed of hundreds of hard questions that might normally be answered by humans in graduate school, generally in science. To make the benchmark harder, the researchers focused on questions that non-experts often get wrong. The term “Google-proof” means that the benchmark includes questions that can’t be easily answered by asking a search engine.</p>
<h2 class="wp-block-heading" id="mmlu-pro">MMLU-Pro</h2>
<p>The <a href="https://github.com/TIGER-AI-Lab/MMLU-Pro" data-type="link" data-id="https://github.com/TIGER-AI-Lab/MMLU-Pro">MMLU-Pro</a> benchmark builds on the Massive Multitask Language Understanding dataset to test a model’s understanding of a broad set of scientific knowledge. It includes more than 12,000 questions about general scientific fields like biology, chemistry, economics, and law. </p>
<h2 class="wp-block-heading" id="mbpp">MBPP</h2>
<p>Google created <a href="https://github.com/google-research/google-research/tree/master/mbpp">MBPP</a> (Mostly Basic Python Problems) to evaluate how well a model was solving coding questions. Each problem comes with a statement, a gold standard solution, and several similar test cases. The number of accurate answers to these questions is a good measure of how well the model will solve many of the simpler Python coding problems presented by users.</p>
<h2 class="wp-block-heading" id="swe-bench">SWE-bench</h2>
<p>This <a href="https://github.com/SWE-bench/SWE-bench">collection</a> of several thousand software engineering challenges evaluates how well a model solves programming problems. The developers created it by selecting a number of issues and corresponding pull-requests from a dozen or so Python projects. After some limitations appeared, the creators expanded the set by creating <a href="https://arxiv.org/abs/2410.06992" data-type="link" data-id="https://arxiv.org/abs/2410.06992">SWE-Bench+</a>, <a href="https://openai.com/index/introducing-swe-bench-verified/">SWE Bench Verified</a>, and <a href="https://arxiv.org/abs/2509.16941" data-type="link" data-id="https://arxiv.org/abs/2509.16941">SWE-Bench Pro</a>.</p>
<h2 class="wp-block-heading" id="lmsys-chatbot-arena">LMSYS Chatbot Arena</h2>
<p>Instead of creating a fixed set of test prompts, the Large Model Systems Organization’s <a href="https://www.lmsys.org/" data-type="link" data-id="https://www.lmsys.org/">Chatbot Arena</a> is a dynamic system that feeds the same prompt to different models and then asks humans to pick the best results. These head-to-head contests produce an <a href="https://en.wikipedia.org/wiki/Elo_rating_system">Elo</a>-like rating that is similar to the one used to score chess players.</p>
<h2 class="wp-block-heading" id="price">Price</h2>
<p>The rest of these metrics are useful, but as the real estate agents say, the three most important numbers on a property listing are price, price, and price. The cost is a bit less important for measuring AIs, but only a bit. Price can make a huge difference between a project being profitable and a moneysink. When the cost for each inference is a tad too high, it’s impossible to make it up with volume.</p>
<p>The key caveat is that a cheaper model isn’t a good idea if it generates answers that are filled with hallucinations or worse. The quality of the answers can differ greatly, and saving a few pennies can be a mistake. To make matters more complicated, there’s an explosion in different styles and approaches. Sometimes it makes sense to pay a bit more for a model that delivers answers with the right vibe.</p>
</div>
</div>
</div>
</div>
</div><p>The post <a href="https://www.azalio.io/33-llm-metrics-to-watch-closely/">33 LLM metrics to watch closely</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Google unveils DiffusionGemma, an AI model that breaks free of left-to-right processing</title>
		<link>https://www.azalio.io/google-unveils-diffusiongemma-an-ai-model-that-breaks-free-of-left-to-right-processing/</link>
		
		<dc:creator><![CDATA[Azalio tdshpsk]]></dc:creator>
		<pubDate>Fri, 12 Jun 2026 21:59:25 +0000</pubDate>
				<category><![CDATA[Cloud]]></category>
		<guid isPermaLink="false">https://www.azalio.io/google-unveils-diffusiongemma-an-ai-model-that-breaks-free-of-left-to-right-processing/</guid>

					<description><![CDATA[<p>Extremely powerful large language models (LLMs) still operate as though they’re typing on a keyboard, processing workloads in a simple left-to-right fashion. But in locally-run, single-user scenarios, this sequential processing can leave graphics processing units (GPUs) and tensor processing units (TPUs) underutilized. Google is betting that DiffusionGemma can get around this bottleneck. The new experimental [&#8230;]</p>
<p>The post <a href="https://www.azalio.io/google-unveils-diffusiongemma-an-ai-model-that-breaks-free-of-left-to-right-processing/">Google unveils DiffusionGemma, an AI model that breaks free of left-to-right processing</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></description>
										<content:encoded><![CDATA[<div>
<div id="remove_no_follow">
<div class="grid grid--cols-10@md grid--cols-8@lg article-column">
<div class="col-12 col-10@md col-6@lg col-start-3@lg">
<div class="article-column__content">
<section class="wp-block-bigbite-multi-title">
<div class="container"></div>
</section>
<p>Extremely powerful <a href="https://www.infoworld.com/article/2335213/large-language-models-the-foundations-of-generative-ai.html" target="_blank" rel="noopener">large language models</a> (LLMs) still operate as though they’re typing on a keyboard, processing workloads in a simple left-to-right fashion. But in locally-run, single-user scenarios, this sequential processing can leave graphics processing units (GPUs) and <a href="https://www.networkworld.com/article/4093957/what-are-tpus-your-guide-to-tensor-processing-units-and-ai-acceleration.html" target="_blank" rel="noopener">tensor processing units</a> (TPUs) underutilized.</p>
<p>Google is betting that <a href="https://deepmind.google/models/gemma/diffusiongemma/" target="_blank" rel="noreferrer noopener">DiffusionGemma</a> can get around this bottleneck. The new experimental open model generates text “exceptionally fast,” creating entire blocks of text simultaneously through diffusion techniques rather than through token-by-token processing. The company says this technique results in 4x faster inference compared to auto-regressive models that rely on sequential processing.</p>
<p>It can also save users money. Technology analyst <a href="https://ca.linkedin.com/in/carmi" target="_blank" rel="noreferrer noopener">Carmi Levy</a> noted that existing pay-per-token monetization models “penalize the use of less than optimally efficient AI solutions.”</p>
<p>But DiffusionGemma “could herald a new generation of task-defined, efficient solutions that can enable expanded compute capacity without draining the operations budget,” he said.</p>
<h2 class="wp-block-heading" id="a-contrast-to-left-to-right-processing">A contrast to left-to-right processing</h2>
<p>Built on Google’s Gemma 4 family and its <a href="https://deepmind.google/models/gemini-diffusion/" target="_blank" rel="noreferrer noopener">Gemini Diffusion</a> research, DiffusionGemma is a 26B mixture-of-experts (MoE) model designed to maximize text output generation.</p>
<p>It essentially shifts <a href="https://www.infoworld.com/article/4169605/21-llms-tuned-for-special-domains.html" target="_blank" rel="noopener">how models use hardware</a>, giving processors a larger hunk of work each cycle so it can draft full 256-token paragraphs in sequence. This allows the model to generate text up to 4x faster on GPUs, Google claims. It activates only 3.8B parameters during inference, and, when quantized, can fit within 18GB VRAM on high-end consumer GPUs like Nvidia RTX 5090.</p>
<p>“It upgrades your model inference from a single, sequential typewriter to a massive printing press that stamps the entire block of text simultaneously,” Google research scientists Brendan O’Donoghue and Sebastian Flennerhag wrote in a <a href="https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/" target="_blank" rel="noreferrer noopener">blog post</a>.</p>
<p>AI image generators begin with pure, random ‘visual noise’ and iteratively refine that into a finalized picture (what’s known as ‘diffusion’); DiffusionGemma applies this same process to text. It does not generate tokens in order, but begins with a “canvas of random placeholder tokens” that it processes in multiple passes, identifying the context tokens it feels are most relevant and using those to refine the rest.</p>
<p>The model has the ability to self-correct, using confidence scoring to re-evaluate tokens in the next pass. “The model iteratively refines its own output, allowing it to evaluate the entire text block at once to fix mistakes in real-time,” O’Donoghue and Flennerhag explained.</p>
<p>DiffusionGemma also has bidirectional attention, they wrote. “Generating 256 tokens in parallel with each forward pass allows every token to attend to all others.” This can be particularly helpful in domains that are non-linear in nature, such as mathematical graphs, code infilling, and in-line editing, they said.</p>
<p>DiffusionGemma is optimized across Nvidia’s hardware stack, making it compatible with consumer setups as well as with high-performance enterprise systems like Hopper and Blackwell.</p>
<p>Because it is released under the Apache 2.0 license, developers can freely use, modify, distribute, and commercialize the software using their preferred tools. It can be run on GPUs or in the cloud through <a href="https://console.cloud.google.com/agent-platform/publishers/google/model-garden/diffusiongemma" target="_blank" rel="noreferrer noopener">Google Cloud Model Garden</a> or <a href="https://catalog.ngc.nvidia.com/orgs/nim/teams/google/containers/diffusiongemma-26b-a4b-it?version=latest" target="_blank" rel="noreferrer noopener">Nvidia NIM</a>, and is available on <a href="https://huggingface.co/collections/mlx-community/diffusiongemma" target="_blank" rel="noreferrer noopener">Hugging Face</a>, <a href="https://github.com/google-gemma" target="_blank" rel="noreferrer noopener">GitHub</a>, and <a href="https://vllm-project.github.io/2026/06/10/diffusion-gemma" target="_blank" rel="noreferrer noopener">vLLM</a>, with support for the open-source library <a href="https://github.com/ggml-org/llama.cpp" target="_blank" rel="noreferrer noopener">llama.cpp</a> coming soon.</p>
<h2 class="wp-block-heading" id="key-use-cases">Key use cases</h2>
<p>The model is particularly useful in local workflows that are “speed critical,” such as generation of non-linear text structures, and unlocks what Google calls “new patterns of model behavior” like multimodal understanding and generating and rendering code in near real-time.</p>
<p>Levy explained, “DiffusionGemma is particularly well suited for interactive coding and editing where its efficiency allows rapid processing and iterations,” noting that its ability to fit within 18GB of VRAM and its deployability on commonly available local GPUs can potentially benefit customer service-related workloads that lean heavily on real-time interaction and local processing.</p>
<p>“DiffusionGemma also incorporates a thinking mode that is especially adept at problem solving,” he said. For instance, the model was fine-tuned to play Sudoku, a typically challenging task for autoregressive models because each token depends on future tokens. This “rather handily” illustrates the model’s capability to solve more complex problems, Levy noted.</p>
<h2 class="wp-block-heading" id="limitations">Limitations</h2>
<p>Google freely admits that DiffusionGemma is geared to specific workflows, and there are “key trade-offs.”</p>
<p>The model is engineered for small batch size inferencing and low-latency, high-speed generation low-to-medium batch sizes on a “single capable accelerator.”</p>
<p>In high-QPS cloud serving environments, (where infrastructure is designed to handle tens or hundreds of thousands of requests per second with ultra-low latency), DiffusionGemma’s parallel coding “offers diminishing returns,” and can even result in higher serving costs, Google conceded. In addition, its overall output quality is lower than that of standard Gemma 4, which is built for apps demanding maximum quality.</p>
<p>However, Levy noted that while DiffusionGemma “can be less precise than other models in certain workloads,” subsequent refinement cycles could overcome this limitation.</p>
<p>While Google isn’t sharing runtime costs, it’s clear that this is an efficiency play, he added. “When deployed across the kinds of workloads that would optimally benefit from its architecture, DiffusionGemma seems to have the potential to reduce processing overhead and related costs,” he said.</p>
</div>
</div>
</div>
</div>
</div><p>The post <a href="https://www.azalio.io/google-unveils-diffusiongemma-an-ai-model-that-breaks-free-of-left-to-right-processing/">Google unveils DiffusionGemma, an AI model that breaks free of left-to-right processing</a> first appeared on <a href="https://www.azalio.io">Azalio</a>.</p>]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
