User2Code2vec: Embeddings for profiling students based on distributional representations of source code

David Azcona, Ihan Hsiao, Piyush Arora, Alan Smeaton

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Scopus citations

Abstract

In this work, we propose a new methodology to profile individual students of computer science based on their programming design using a technique called embeddings. We investigate different approaches to analyze user source code submissions in the Python language. We compare the performances of different source code vectorization techniques to predict the correctness of a code submission. In addition, we propose a new mechanism to represent students based on their code submissions for a given set of laboratory tasks on a particular course. This way, we can make deeper recommendations for programming solutions and pathways to support student learning and progression in computer programming modules effectively at a Higher Education Institution. Recent work using Deep Learning tends to work better when more and more data is provided. However, in Learning Analytics, the number of students in a course is an unavoidable limit. Thus we cannot simply generate more data as is done in other domains such as FinTech or Social Network Analysis. Our findings indicate there is a need to learn and develop better mechanisms to extract and learn effective data features from students so as to analyze the students' progression and performance effectively.

Original languageEnglish (US)
Title of host publicationProceedings of the 9th International Conference on Learning Analytics and Knowledge
Subtitle of host publicationLearning Analytics to Promote Inclusion and Success, LAK 2019
PublisherAssociation for Computing Machinery
Pages86-95
Number of pages10
ISBN (Electronic)9781450362566
DOIs
StatePublished - Mar 4 2019
Event9th International Conference on Learning Analytics and Knowledge, LAK 2019 - Tempe, United States
Duration: Mar 4 2019Mar 8 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference9th International Conference on Learning Analytics and Knowledge, LAK 2019
Country/TerritoryUnited States
CityTempe
Period3/4/193/8/19

Keywords

  • Code embeddings
  • Code2vec
  • Computer science education
  • Distributed representations
  • Machine learning
  • Representation learning for source code
  • User2code2vec

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'User2Code2vec: Embeddings for profiling students based on distributional representations of source code'. Together they form a unique fingerprint.

Cite this