Research group

Machine Learning Methods in Software Engineering

Coding Assistant

Project supervisor: Timofey Bryksin
Status: Active

The project's goal is analyzing the students' behavior while solving diverse programming tasks and creating an assistant system based on previous tasks' solutions. The idea is to collapse all partial solutions of each problem into a single graph, a solution space, and find the proper path in this graph to suggest the next steps for successor students.

To achieve this goal, we have developed a set of tools for collecting and processing the students' activity during problem-solving. The first tool is a plugin for IntelliJ-based IDEs that captures snapshots of code and IDE interaction events during the writing of code, thus allowing us to analyze the programming process; the plugin currently supports Python, Java, Kotlin, and C++. The second tool is designed for the post-processing of the data collected by the plugin and is aimed toward analysis and visualization.

To validate and showcase the toolkit, we have already gathered a small dataset. It describes in detail the process of solving programming tasks by 148 participants — of different age, programming experiences, and in different languages. To publish the dataset we need to anonymize it according to our privacy policy. We developed a special tool to do it.

Now we are working on a PyCharm plugin that unifies Python code by applying various transformations to PSI, such as anonymizing variables, removing dead code, etc. It will help us to determine as accurately as possible whether syntactically different fragments of code actually have the same semantics. We need this tool for an algorithm that generates hints (previously we implemented a prototype in Python, which can be found here), but it could be used elsewhere as well (for instance, for semantic clone detection).