Data Format
Each karel task comprises of a program P and a set of input-output examples I-O. Any combination of the programs and examples can be used for different experiments. For example, for program synthesis, the examples I-O can be used to predict the program P, whereas for program induction, the program can be ignored and a subset of examples can be used to predict outputs for the remaining inputs in the other subset of the examples.
The JSON representation of a karel task includes two representations of the program P: 1) a linearized form of the Python program (“program_tokens”), and 2) a JSON representation of the AST of the program (“program_json”). The “examples” is a list of input-output examples, where each example consists of a json representation of the input and output grids (“inpgrid_json” and “outgrid_json”), and their tensor representation (“inpgrid_tensor” and “outgrid_tensor”), as well as the “action” field that denotes the trace of the program P on this particular example.
The dataset consists of both synthetically generated programs (train, test, and validate) as well a challenge dataset from real-world karel exercises for students.
References
More details about the network architecture and the Karel DSL can be found in the papers below:
Jacob Devlin, Rudy Bunel, Rishabh Singh, Matthew Hausknecht, and Pushmeet Kohli. Neural Program Meta-Induction NIPS 2017
Rudy Bunel, Matthew Hausknecht, Rishabh Singh, Jacob Devlin, Chris Piech, Rasool Fakoor, Pushmeet Kohli. Program, not Prose: Leveraging Grammar for Neural Program Synthesis