Agentic Analytics & AI-Native Data Engineering Go Mainstream

Summary

The modern data landscape is rapidly shifting from a passive, dashboard-reading paradigm to an active, autonomous architecture. Integration of AI agents for data engineering and analytics has become a major focus across all leading data platforms. At Build 2026, Microsoft launched its Agentic Analytics Stack and Rayfin, a code-first Backend-as-a-Service (BaaS) platform. Databricks introduced Genie Code and Genie Spaces for Agentic BI, while Snowflake revamped its Cortex AI suite with the CoWork and CoCo agents. However, Anthropic’s research highlights a major production hurdle: without continuous context maintenance of “skill files,” agent accuracy can degrade significantly in dynamic data environments.

What happened

Over the last few days, major tech vendors and open-source communities have launched key agentic data solutions:

Microsoft Fabric: Announced Rayfin, an open-source SDK and BaaS enabling developers to deploy backend systems that automatically sync to OneLake. They also released Fabric Data Agents to general availability and launched agentic Power BI report builders.
Databricks: Introduced Genie Code, an autonomous AI coding assistant for data workflows (like Lakeflow) and BI dashboards, complementing Genie Spaces for business users.
Snowflake: Rebranded its agent offerings into CoCo (Cortex Copilot for pipeline and code generation) and CoWork (a collaborative workspace agent), powered by a new shared metadata layer called Cortex Sense.
Open Source: Projects like Datus-agent (context-engineered CLI SQL client) and Altimate AI’s Altimate Code (an agentic data engineering harness providing 100+ deterministic tools) are bringing agentic capabilities directly to developer environments.
MCP Ecosystem: The Model Context Protocol (MCP) has gained massive traction, offering standardized servers to bridge LLMs with SQL databases and modern data stacks.

Why it matters

Historically, business intelligence and data engineering suffered from severe operational bottlenecks: business units had to request reports, and data engineers had to manually construct ETL pipelines. AI agents promise to democratize access by planning, coding, and deploying analytical workloads on their own. However, unlike traditional deterministic software, data agents face a quiet degradation in quality—known as context drift—as database structures and business definitions inevitably change.

Evidence

Microsoft Rayfin: By automatically replicating app database schemas into OneLake, Rayfin ensures that AI-generated prototypes inherit Fabric’s native governance, storage, and analytics from day one.
Snowflake Cortex Sense: Snowflake’s shared semantic and context layer boosted Cortex agent query accuracy on complex enterprise schemas from 47% to 83% in benchmark tests.
Anthropic Case Study: Anthropic revealed that their internal agentic analytics system initially hit 95% accuracy using curated documentation (“skill files”). However, as database schemas evolved without updating the documentation, accuracy drifted down to 65% in just a single month.

Analysis

The 95% to 65% drop in accuracy reported by Anthropic points to the biggest operational challenge in Agentic BI: Context Drift. AI agents rely on metadata, descriptions, and semantic definitions (often stored as markdown files) to understand complex databases. In a real-world enterprise, tables are constantly deprecated, columns are added, and definitions change. Rather than upgrading the underlying LLM, Anthropic resolved this as an engineering maintenance problem: they co-located the agent’s skill files inside the transformation repositories and set up automated code-review checks that block a pull request if data schema changes are merged without updating the corresponding skill files.

Practical Takeaways

To deploy reliable agentic data platforms, organizations should adopt these practices:

Co-Locate Skill Files: Keep the agent’s context and documentation files in the same repository as the data transformations (e.g., dbt projects or SQL scripts).
Automated CI/CD Sync: Write CI/CD hooks that reject pull requests modifying data structures if the matching agent markdown context is not updated.
Leverage Semantics: Utilize tools like Snowflake’s Cortex Sense or centralized semantic layers to provide agents with a unified, governed definition of data.
Use Data Harnesses: Instead of exposing raw LLMs directly to databases, wrap them in specialized harnesses (like Altimate Code or Rayfin) that provide column-level lineage and manifest awareness.

Open Questions

To what extent will enterprise security teams trust autonomous AI agents with write permissions on production databases and data pipelines?
Will open standards like MCP prevent vendor lock-in, or will platforms build proprietary semantic layers that restrict agent portability?