On Grounded Planning for Embodied Tasks with Language Models

Bill Yuchen Lin, Chengsong Huang, Qian Liu, Wenda Gu, Sam Sommerer, Xiang Ren

August 29, 2022

arXiv

Language models (LMs) are shown to have commonsense knowledge of the physical world, which is fundamental for completing tasks in everyday situations. However, it is still an open question whether LMs have the ability to generate grounded, executable plans for embodied tasks. It is very challenging because LMs do not have an "eye" or "hand" to perceive the realistic environment. In this work, we show the first study on this important research question. We first present a novel problem formulation named G-PlanET, which takes as input a high-level goal and a table of objects in a specific environment. The expected output is a plan consisting of step-by-step instructions for agents to execute. To enable the study of this problem, we establish an evaluation protocol and devise a dedicated metric for assessing the quality of plans. In our extensive experiments, we show that adding flattened tables for encoding environments and using an iterative decoding strategy can both improve the LMs' ability for grounded planning. Our analysis of the results also leads to interesting non-trivial findings.

View Publication

Download

Other Publications

D4FT: A Deep Learning Approach to Kohn-Sham Density Functional Theory

Tianbo Li, Min Lin, Zheyuan Hu, Kunhao Zheng, Giovannie Vignale, Kenji Kawaguchi, A.H. Castro Neto, Kotsya S. Novoselov, Shuicheng Yan

2023

ICLR 2023

deep learning generalization Quantum Physics

Kohn-Sham Density Functional Theory (KS-DFT) has been traditionally solved by the Self-Consistent Field (SCF) method. Behind the SCF loop is the physics intuition of solving a system of non-interactive single-electron wave functions under an effective potential. In this work, we propose a deep-learning approach to KS-DFT. First, in contrast to the conventional SCF loop, we propose directly minimizing the total energy by reparameterizing the orthogonal constraint as a feed-forward computation. We prove that such an approach has the same expressivity as the SCF method yet reduces the computational complexity from O(N4) to O(N3). Second, the numerical integration, which involves a summation over the quadrature grids, can be amortized to the optimization steps. At each step, stochastic gradient descent (SGD) is performed with a sampled minibatch of the grids. Extensive experiments are carried out to demonstrate the advantage of our approach in terms of efficiency and stability. In addition, we show that our approach enables us to explore more complex neural-based wave functions.

Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems

Fan Zhou, Haoyu Dong, Qian Liu, Zhoujun Cheng, Shi Han, Dongmei Zhang

2022

arXiv

Computation and Language Information Retrieval Numerical Analysis

Numerical reasoning over natural language has been a long-standing goal for the research community. However, cutting-edge language models have proven difficult to reliably generalize to a broad range of numbers, although they have shown proficiency in reasoning over common and simple numbers. In this paper, we propose a novel method to elicit and exploit the numerical reasoning knowledge hidden in pre-trained language models using simple anchor numbers. Concretely, we first leverage simple numbers as anchors to probe the implicitly inferred arithmetic expressions from language models, and then explicitly apply the expressions on complex numbers to get corresponding answers. To inversely elicit arithmetic expressions, we transform and formulate the task as an analytically solvable linear system. Experimental results on several numerical reasoning benchmarks demonstrate that our approach significantly improves numerical reasoning capabilities of existing LMs. More importantly, our approach is training-free and simply works in the inference phase, making it highly portable and achieving consistent performance benefits across a variety of language models (GPT-3, T5, BART, etc) in all zero-shot, few-shot, and fine-tuning scenarios.