TY - GEN
T1 - User2Code2vec
T2 - 9th International Conference on Learning Analytics and Knowledge, LAK 2019
AU - Azcona, David
AU - Hsiao, Ihan
AU - Arora, Piyush
AU - Smeaton, Alan
N1 - Funding Information:
This research was supported by the Irish Research Council in association with the National Forum for the Enhancement of Teaching and Learning in Ireland under project number GOIPG/2015/3497, by Science Foundation Ireland under grant numbers 12/RC/2289 and 13/RC/2106, and by Fulbright Ireland. The authors are indebted to Dr. Stephen Blott who developed the programming grading platform.
Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/3/4
Y1 - 2019/3/4
N2 - In this work, we propose a new methodology to profile individual students of computer science based on their programming design using a technique called embeddings. We investigate different approaches to analyze user source code submissions in the Python language. We compare the performances of different source code vectorization techniques to predict the correctness of a code submission. In addition, we propose a new mechanism to represent students based on their code submissions for a given set of laboratory tasks on a particular course. This way, we can make deeper recommendations for programming solutions and pathways to support student learning and progression in computer programming modules effectively at a Higher Education Institution. Recent work using Deep Learning tends to work better when more and more data is provided. However, in Learning Analytics, the number of students in a course is an unavoidable limit. Thus we cannot simply generate more data as is done in other domains such as FinTech or Social Network Analysis. Our findings indicate there is a need to learn and develop better mechanisms to extract and learn effective data features from students so as to analyze the students' progression and performance effectively.
AB - In this work, we propose a new methodology to profile individual students of computer science based on their programming design using a technique called embeddings. We investigate different approaches to analyze user source code submissions in the Python language. We compare the performances of different source code vectorization techniques to predict the correctness of a code submission. In addition, we propose a new mechanism to represent students based on their code submissions for a given set of laboratory tasks on a particular course. This way, we can make deeper recommendations for programming solutions and pathways to support student learning and progression in computer programming modules effectively at a Higher Education Institution. Recent work using Deep Learning tends to work better when more and more data is provided. However, in Learning Analytics, the number of students in a course is an unavoidable limit. Thus we cannot simply generate more data as is done in other domains such as FinTech or Social Network Analysis. Our findings indicate there is a need to learn and develop better mechanisms to extract and learn effective data features from students so as to analyze the students' progression and performance effectively.
KW - Code embeddings
KW - Code2vec
KW - Computer science education
KW - Distributed representations
KW - Machine learning
KW - Representation learning for source code
KW - User2code2vec
UR - http://www.scopus.com/inward/record.url?scp=85062775545&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062775545&partnerID=8YFLogxK
U2 - 10.1145/3303772.3303813
DO - 10.1145/3303772.3303813
M3 - Conference contribution
AN - SCOPUS:85062775545
T3 - ACM International Conference Proceeding Series
SP - 86
EP - 95
BT - Proceedings of the 9th International Conference on Learning Analytics and Knowledge
PB - Association for Computing Machinery
Y2 - 4 March 2019 through 8 March 2019
ER -