This directory contain detailed documentation on various tools that can be used to manipulate the source files available in the Project CodeNet dataset.
It also contains working documents on proposals and ideas for tools, formats, and applications.
HSQLDB is a simple but complete database software that can use CSV files as persistent storage. This document describes how to convert the Project CodeNet metadata to be used with HSQLDB.
srcML is a tool for the analysis of programming language source code. The document describes typical use cases.
It is possible to obtain a token (class) stream from a source code file such that the result is still syntactically correct. This document shows how.
Some thoughts and a proposal to normalize the token classes for various programming languages are presented here.