Towards programmatic reinforcement learning: the case of deterministic gridworlds
Starting from a programmatic representation of a Markov decision process (MDP) in the PRISM syntax, we examine the task of synthesizing a policy in the form of a program for the MDP. The PRISM syntax allows us to specify MDPs concisely by partitioning the state space into regions with similar actions and transitions. In this work, we restrict ourselves to a subclass of two dimensional deterministic gridworlds partitioned into regions along linear predicates. We present an algorithm to synthesize programmatic policies to achieve reachability objectives which exploit the symmetries present in the specification of the MDP. Our programs use memory to track subgoals and navigate between the edges of regions to provide a concise representation of a policy. Our main result is a proof of an upper bound on the size of the synthesized programs. We also provide an implementation of our synthesis algorithm which is evaluated on randomly generated instances of gridworlds.
Wed 17 JanDisplayed time zone: London change
18:15 - 20:00 | |||
18:15 4mPoster | A Denotational Approach to Release/Acquire Concurrency Student Research Competition Yotam Dvir Tel Aviv University | ||
18:19 4mTalk | A Lean Formalization of Cedar Student Research Competition Bhakti Shah University of Chicago | ||
18:24 4mTalk | A Substructural Type and Effect System Student Research Competition Orpheas van Rooij Radboud University | ||
18:29 4mPoster | A type-safe generalized editor calculus (Extended Abstract) Student Research Competition Nikolaj Rossander Kristensen Department of Computer Science, Aalborg University, Benjamin Bennetzen Department of Computer Science, Aalborg University, Peter Buus Steffensen Department of Computer Science, Aalborg University, Andreas Tor Mortensen Department of Computer Science, Aalborg University | ||
18:34 4mPoster | Compilation Quotient (CQ): A Metric for the Compilation Hardness of Programming Languages Student Research Competition Vince Szabó Delft University of Technology | ||
18:38 4mPoster | Compositional Programming with Full Iso-recursive Types Student Research Competition Litao Zhou Shanghai Jiao Tong University; University of Hong Kong | ||
18:43 4mPoster | Differential Privacy in an Impure World Student Research Competition Damián Arquez University of Chile | ||
18:48 4mPoster | Effect handlers in Zig (extended abstract) Student Research Competition Alessio Duè University of Pisa | ||
18:53 4mTalk | Efficient Incremental Computation for Halide Student Research Competition Tyler Hou University of California, Berkeley Pre-print | ||
18:57 4mTalk | Embedding Pointful Array Programming in Python Student Research Competition Jakub Bachurski University of Cambridge | ||
19:02 4mPoster | Exploring the limitations of Contextual Modal Type Theory for Multi-Stage Programming Student Research Competition Theo Wang University of Oxford | ||
19:07 4mPoster | From Java to Kotlin with Contextual Equality Saturation Student Research Competition Alexandre Drewery INRIA | ||
19:12 4mPoster | GPU-Accelerated Synthesis of Boolean Circuits Student Research Competition Justin Du University of California, San Diego, Rana Lulla University of California San Diego, Melody Ruth University of California San Diego | ||
19:17 4mTalk | HOL4P4: A Heapless Small-Step Semantics and Type System for P4 Student Research Competition Anoud Alshnakat KTH Royal Institute of Technology, Roberto Guanciale KTH Royal Institute of Technology, Mads Dam KTH | ||
19:21 4mPoster | Linking Session-Typed Channels in Separation Logic Student Research Competition Thomas Somers Radboud University | ||
19:26 4mPoster | Optimization of a Gradual Verifier: Lazy evaluation of Iso-recursive Predicates as Equi-recursive at Runtime Student Research Competition Jan-Paul Ramos-Davila Cornell University | ||
19:31 4mPoster | Optimization of the Context-Free Language Reachability Matrix-Based Algorithm Student Research Competition Ilya Muravjov Saint Petersburg State University | ||
19:36 4mPoster | PiR (πr): Probabilistic Interpretation of Robustness Student Research Competition Abhinandan Pal University of Birmingham | ||
19:40 4mTalk | Session-Typed Effect Handlers Student Research Competition Wenhao Tang University of Edinburgh | ||
19:45 4mPoster | Tail: A Typed and Structured Document Editor Student Research Competition Alperen Keles University of Maryland at College Park | ||
19:50 4mPoster | Towards programmatic reinforcement learning: the case of deterministic gridworlds Student Research Competition Guruprerana Shabadi École Polytechnique, Institut Polytechnique de Paris | ||
19:55 4mPoster | Zero-Cost Capabilities: Retrofitting Effect Safety in Rust Student Research Competition George Berdovskiy University of California, Davis |