Across U.S. local government, AI adoption is accelerating — but confidence in its value is not. City managers report chatbots, document automation, and analytics pilots going live, yet when council members ask a harder question — "How do we know this is working?" — many organizations struggle to answer.

Recent research makes this tension clear: adoption without defensible success measurement creates financial risk, governance gaps, and erodes public trust. This post synthesizes emerging scholarly and policy research to help public-sector leaders move past binary adoption metrics toward practical, multi-dimensional ways of evaluating AI success.

Research Context: What the Evidence Is Actually Studying

Over the past three years, researchers across public administration, information systems, and governance have examined why AI adoption succeeds in some public organizations and stalls — or backfires — in others. Much of this work builds on two foundational theories:

More recent studies extend these theories by asking a tougher question: what does "success" look like after AI is deployed? Comparative case studies, large-scale surveys, and governance reviews consistently show that adoption alone is a weak indicator of value. The findings are converging on a clear message: AI success is multi-dimensional, stage-dependent, and inseparable from governance and human capacity.

Key Findings from the Research

Most governments measure adoption — not performance

Several studies using TOE-based frameworks find that public organizations focus heavily on whether AI is adopted, rather than how well it performs or integrates. Agencies reported "successful" adoption once tools were procured — even when usage was inconsistent and outcomes unclear (Madan, 2023; Neumann et al., 2024).

Deployment milestones answer procurement questions ("Did we buy it?") but not governance questions ("Should we keep it?"). Without performance-oriented criteria — accuracy, reliability, equity, workload impact — leaders lack an evidence base for continuation or scaling decisions.

Organizational readiness predicts outcomes more than technology choice

Across multiple jurisdictions, organizational readiness consistently explains variance in AI outcomes better than technical sophistication. Readiness includes leadership alignment, data governance maturity, staff skills, and cross-department coordination. Importantly, readiness is not static — it evolves across stages, and success metrics must evolve with it.

Human capital alignment determines whether AI delivers public value

UTAUT-based and organizational culture studies repeatedly show that staff acceptance, trust, and role clarity determine whether AI improves service delivery or sits unused. Several surveys found a paradox: employees may support AI in principle but disengage when governance, training, or accountability is unclear. This especially matters in public services where discretion, ethics, and transparency are core to legitimacy.

Governance is not a compliance layer — it is a success enabler

Governance-focused research challenges the assumption that ethics and accountability slow innovation. Studies on responsible AI governance argue the opposite: clear structures, documentation, and oversight enable sustainable AI use by reducing uncertainty and political risk. Organizations with explicit governance practices are better positioned to evaluate AI impacts over time — not just at launch.

Public value is the missing — but measurable — dimension

Public administration scholars increasingly argue that AI success in government must be assessed through public value, not efficiency alone. This includes fairness, transparency, service quality, and trust outcomes (Madan, 2023; Neumann et al., 2024). While harder to quantify, research shows these dimensions can be evaluated through structured indicators and qualitative review.

Bridging to Practice

Taken together, this research suggests a reframing that many public organizations have not yet made: AI success is not a single scorecard — it is a structured conversation supported by evidence.

For time-constrained executives, this does not mean building complex dashboards or hiring data scientists. It means:

At Bridge Public Advisors, this is why our work emphasizes judgment-first frameworks over tools. Research consistently shows that when leaders define success criteria before scaling — and revisit them over time — they retain control of AI decisions even as technology evolves.

Reflection Questions for Government Leaders

Sources

Want to improve your organization's approach to AI evaluation?

Contact Bridge Public Advisors
← Back to all insights