Enhancing programming competency of Computer Science students by comparing student solutions with GenAI outputs
- Béibhinn Nic Ruairí Béibhinn Nic Ruairí studied Computer Science at Trinity College Dublin, completing a capstone project in 2024 on the topic of “Inner Feedback for Novice Programmers through Comparisons with AI-Generated Solutions.”
- Dr Jonathan Dukes Jonathan Dukes is a lecturer in Computer Science at Trinity College Dublin. He is a past recipient of a Trinity Excellence in Teaching Award and is interested in applications of generative AI in computer science education and the effects of generative AI on learning and assessment of novice programmers.
Context
This showcase explores the use of generative AI to support a learning activity in introductory computer programming courses. In the activity, learners complete a programming task and then compare their own solution to AI-generated solutions before scoring both their own solution and the AI-generated solutions using a rubric provided by an instructor.
First year students studying computer science in Trinity College Dublin were invited to trial the activity using a prototype web-based tool and provide feedback. This small-scale trial with eight students took place in March 2024. The trial was conducted independently of any specific module but was designed to align with familiar tasks from a first year “Introduction to Programming” module and used the Java programming language.
What was your goal in utilising GenAI as part of the teaching process?
The primary goal was to design an activity that would encourage learners to reflect on their own solution to a programming task and, more broadly, their competency with computer programming, through comparisons with alternative solutions. These alternative solutions should differ from each other with respect to qualities or features of interest (e.g. efficiency, complexity, use of specific programming paradigms).
The code comparison activity was motivated and influenced by the work of Nicol (2021) on comparison processes and “internal feedback”, defined by Nicol as “new knowledge that students generate when they compare their current knowledge and competence against some reference information”.
A secondary goal of the activity was to expose novice programmers to the capabilities of generative AI coding assistants, such as GitHub Copilot (GitHub, 2024), as well as their deficiencies. For this trial study, true deficiencies were masked and artificial deficiencies were introduced through prompt engineering
How did you use GenAI to enhance teaching, learning and/or assessment?
A web-based prototype tool was developed to support the programming comparison activity. The comparison tool asked learners to provide their own solution for a programming task and then used generative AI to create two alternative solutions, which were presented side-by-side with the learner’s solution for easy comparison.
After comparing their own solution with each of the two alternative solutions, learners were asked to score their work using a rubric and write two short statements (2–3 lines each) describing (i) the differences between their own work and the alternative solutions and (ii) what they learned and what they might do differently given a similar task. These reflective statements have been identified as a critical step in comparison activities (Nicol, 2021).
The OpenAI gpt-3.5-turbo large language model (OpenAI, 2022) was used to dynamically generate alternative solutions. Prompt engineering was used to “encourage” the generation of diverse solutions with specific features. The generated solutions were tested for correctness so learners would not see programs that did not function correctly (or did not function at all). Working solutions were then evaluated with respect to qualitative measures, such as similarity and complexity, and two dissimilar solutions were selected and presented to the learner through the web interface.
Potential learning outcome: comprehend computer programs, including AI-generated programs, distinguish between alternative approaches for a programming task and judge the relative merits of one approach over another.
What were the outcomes of using GenAI in this way?
A prototype code comparison learning activity was trialled by a small number of students (8) and a more rigorous study is required. All learners in the trial either agreed (2) or strongly agreed (6) that comparing their own solution with alternative solutions was a good exercise when learning how to program. When asked whether the differences between the two AI-generated solutions was useful to understand alternative approaches to solving a problem, 7 out of 8 students agreed (6) or strongly agreed (1) with this. When asked whether, they would make use the comparison tool in an introductory programming course, 5 students either agreed (4) or strongly agreed (1).
What did you learn as part of this process and is there anything you would do differently?
This small-scale trial has served to test a prototype code comparison activity tool and mandates a more in-depth study. Such a study should seek to significantly increase the number of participants and employ the code comparison exercise in the context of multiple programming tasks at key stages in the delivery of a module.
There is scope to enhance the generation of alternative solutions in several ways, for example, by refining the prompt used to generate alternative solutions, by identifying and reusing good (instructive) alternative solutions or by using learner feedback to improve the future selection of good alternative solutions.
Technically, the protype might be enhanced to improve usability (lowering barriers to use), for example, by integrating the tool with assessment management platforms that learners are already using.
Finally, it is worth considering whether the code comparison activity could be used as the basis of a formal assessment activity.