AI Assessment: Why We Need a New Strategy
Artificial Intelligence (AI) systems are not static, they learn, adapt, and evolve over time. This impermanent nature makes traditional assessment methods (like certifications or audits) increasingly unreliable. Our research highlights why current approaches fail and proposes a foundation for a new paradigm in AI assessment.
The Core Challenge
AI systems change after deployment due to:
- Self-learning and retraining
- External factors (e.g., market shifts, cultural changes)
- Opaque decision-making processes
These dynamics undermine the validity of assessments that assume systems remain unchanged.
Key Findings
Through 25 expert interviews across industry and academia, we identified eight impermanence-related implications that threaten assessment reliability:
- Difficulty distinguishing desired vs. undesired changes.
- Inability to anticipate external changes impacting AI behavior.
- Limited explainability hampers reassessment.
- Rare cases can invalidate guarantees.
- Risk of reintroduced human bias over time.
- Limited transferability of assessment results across contexts.
- Assessment validity depends on evolving environments, not just system changes.
- Continuous reassessment is required for self-learning systems.
Why It Matters
Regulatory frameworks (e.g., EU AI Act) increasingly mandate AI audits and certifications. Yet, without adapted methods, these assessments risk becoming meaningless, eroding trust and slowing innovation.
Toward a Generic AI Assessment Strategy
A viable strategy must:
- Shift from static to dynamic assessment models
Continuous or trigger-based reassessment is essential. - Integrate lifecycle monitoring
Cover system changes, input data evolution, and environmental factors. - Leverage technical tools and automation
MLOps platforms, anomaly detection, and explainability techniques can support scalable reassessment. - Define context-specific validity periods
Avoid fixed timelines; base validity on system behavior and risk profiles. - Balance cost and risk
Prioritize critical applications for intensive monitoring while allowing flexibility for low-risk use cases.
References
K. Brecker, S. Lins, N. Bena, C. A. Ardagna, M. Anisetti and A. Sunyaev, “AI Impermanence: Achilles’ Heel for AI Assessment?,” in IEEE Access, vol. 13, pp. 194435-194455, 2025

