"

A Framework for AI Technology Evaluation

The evaluation of artificial intelligence tools for educational purposes requires a structured, multidimensional approach that acknowledges both the technological capabilities and pedagogical applications of these emerging technologies. Our subcommittee on AI Tools for Teaching and Learning developed a comprehensive evaluation framework to assess diverse AI tools and uses, ranging from general-purpose conversational models to specialized educational applications. This framework was designed to provide consistent assessment criteria while remaining flexible enough to accommodate the varying architectures and functionalities of different AI systems. The following overview describes our evaluation process, the instrument’s key dimensions, and preliminary insights from our small but diverse sample of eight tool evaluations.

To support the situated evaluation process, this section introduces an AI Tool Evaluation Rubric for Teaching and Learning, developed by a subcommittee. The rubric is designed to guide systematic and comparative evaluation of AI tools across seven dimensions, considering diverse contexts and purposes: (1) functionality and pedagogical usefulness, (2) accessibility and inclusivity, (3) ease of use and faculty adoption, (4) ethical considerations and data privacy, (5) cost and sustainability, (6) AI transparency and explainability, and (7) institutional support and integration. Each category includes both guiding questions and a 5-point scale, alongside opportunities for narrative feedback.

Evaluators used the rubric to score tools individually, and then combined the findings to inform shared recommendations. This approach recognizes that AI tools vary significantly in their design, with some offering direct conversational interfaces while others function as specialized applications with embedded AI capabilities. The instrument therefore provides parallel assessment pathways tailored to each tool type. Several illustrative cases are shared to demonstrate its application.

Subcommittee volunteers representing diverse academic disciplines, including education, computer science, arts and humanities, and social sciences, evaluated eight distinct AI tools using the rubric: Brisk, Claude, ChatGPT, Co-Pilot, Gamma, Gemini, Magic School, and NotebookLM. This small but intentionally diverse sample allowed us to explore variations across tool types and potential applications. Evaluations were conducted independently, with evaluators documenting their findings through the survey instrument and in summary reports that included external references to support their assessments. These concise summaries captured key insights about each tool’s effectiveness for teaching and learning while acknowledging the rapidly evolving nature of these technologies.

General Findings

The evaluation results revealed both common patterns and tool-specific insights across our sample. Regarding potential pedagogical usefulness, most tools received ratings of “strong educational value” (4/5) or higher, with evaluators noting their potential for enhancing specific teaching and learning activities. However, these positive ratings were often qualified with important caveats about use-case dependency and the need for faculty guidance; without adequate, contextual, and practice-supported guidance, the tools’ educational potentials would not be realized, instead potentially creating increasing dependency and harm to learning.

  • Ease of use was generally favorable across tools, though NotebookLM was noted to have a steeper learning curve compared to more intuitive interfaces such as Claude, ChatGPT, Deepseek, and Gemini.
  • Accessibility and inclusivity ratings showed greater variation, indicating caution that faculty and institutions need while adopting AI tools—especially in light of the new federal ADA updates and the many concerns of bias, exclusion, and prejudices that research and discourse about AI have revealed in the past few years.
  • Ethical considerations and data privacy emerged as the most critical dimensions requiring careful attention. Several evaluators expressed concerns about data handling practices, particularly for tools that process user-generated content.
  • Cost and sustainability assessments varied widely, with some tools offering free access with institutional licenses while others required subscription models that might present barriers to widespread adoption.
  • AI transparency and explainability—the ability to understand how tools generate their outputs—remains an evolving challenge across most platforms, although new approaches are emerging that enable clearer reasoning explanations and step-by-step problem-solving capabilities.​​​​​​​​​​​​​​​​

While these preliminary findings offer valuable insights, they are meant as illustrations for how faculty and administrators might go about making their own evaluations and judgments rather than ready-made guidelines or decisions for all contexts and purposes. The illustrations should further be viewed as exploratory rather than definitive due to the limited sample size and the rapidly evolving nature of AI technologies. Individual evaluations reflect specific use cases and disciplinary perspectives of the evaluators that may not be generalizable across all educational contexts. Nevertheless, this structured evaluation process provides a foundation for more informed decision-making about AI implementation in teaching and learning environments. The framework itself provides a model that others may adapt and extend as they assess AI tools for their own educational contexts.​​​​​​​​​​​​​​​​

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

AI in Action: A SUNY FACT2 Guide to Optimizing AI in Higher Education Copyright © 2025 by SUNY FACT2 Task Group on AI in Action; Kati Ahern; Nicola Marae Allain; Abigail Bechtel; Angie Chung; Billie Franchini; Meghanne Freivald; Ken Fujiuchi; Dana Gavin; Jack Harris; Keith Landa; Alla Myzelev; Victoria Pilato; Ahmad Pratama; Russell V. Rittenhouse; Carrie Solomon; Angela C. Thering; and Shyam Sharma is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.