AI Assessment

AI Assessment: Why We Need a New Strategy

Artificial Intelligence (AI) systems are not static, they learn, adapt, and evolve over time. This impermanent nature makes traditional assessment methods (like certifications or audits) increasingly unreliable. Our research highlights why current approaches fail and proposes a foundation for a new paradigm in AI assessment.

The Core Challenge

AI systems change after deployment due to:

  • Self-learning and retraining
  • External factors (e.g., market shifts, cultural changes)
  • Opaque decision-making processes

These dynamics undermine the validity of assessments that assume systems remain unchanged.

Key Findings

Through 25 expert interviews across industry and academia, we identified eight impermanence-related implications that threaten assessment reliability:

  1. Difficulty distinguishing desired vs. undesired changes.
  2. Inability to anticipate external changes impacting AI behavior.
  3. Limited explainability hampers reassessment.
  4. Rare cases can invalidate guarantees.
  5. Risk of reintroduced human bias over time.
  6. Limited transferability of assessment results across contexts.
  7. Assessment validity depends on evolving environments, not just system changes.
  8. Continuous reassessment is required for self-learning systems.

Why It Matters

Regulatory frameworks (e.g., EU AI Act) increasingly mandate AI audits and certifications. Yet, without adapted methods, these assessments risk becoming meaningless, eroding trust and slowing innovation.

Toward a Generic AI Assessment Strategy

A viable strategy must:

  • Shift from static to dynamic assessment models
    Continuous or trigger-based reassessment is essential.
  • Integrate lifecycle monitoring
    Cover system changes, input data evolution, and environmental factors.
  • Leverage technical tools and automation
    MLOps platforms, anomaly detection, and explainability techniques can support scalable reassessment.
  • Define context-specific validity periods
    Avoid fixed timelines; base validity on system behavior and risk profiles.
  • Balance cost and risk
    Prioritize critical applications for intensive monitoring while allowing flexibility for low-risk use cases.

References

K. Brecker, S. Lins, N. Bena, C. A. Ardagna, M. Anisetti and A. Sunyaev, “AI Impermanence: Achilles’ Heel for AI Assessment?,” in IEEE Access, vol. 13, pp. 194435-194455, 2025

https://ieeexplore.ieee.org/document/11237087