Research
The goal of my research is to enable robots that can operate in unstructured, real-world environments. Towards this goal I study how robots can generalize effectively across tasks, objects, and environments by learning from large datasets. Specifically, my research focuses on methods for:
- Scalably [1, 2] and safely [3] collecting real world robot datasets.
- Self-supervised learning of visual models from offline data [1,2,3] and using them for robotic manipulation [4,5,6,7].
- Leveraging human video datasets [1,3] and natural language annotations [2,3] to enable better robot learning.
|
Recent Talks (April 2022 @ Nuro)
|
Publications & Preprints (Highlighted Papers)
|
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang
Robotics: Science and Systems (RSS), 2023
Best Paper Award Finalist
project page /
code
We present Voltron, a multi-modal foundation model for robotics trained on human videos and language to produce reusable representations and rewards. We train a single model with many downstream capabilities from features for control to expression grounding and reward/intent inference.
|
|
Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets.
Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn
Robotics: Science and Systems (RSS), 2023
project page /
code /
Often, robot data isn't shared across projects. We present a new way that past project data can be used to improve downstream learning. We use a learned model select relevant data from a large dataset of robot interactions, which augments a small set of task demonstrations for use in a behavior cloning algorithm for more efficient learning.
|
|
R3M: A Universal Visual Representation for Robot Manipulation
Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, Abhinav Gupta
Conference on Robot Learning (CoRL) 2022
ICRA Scaling Robot Learning Workshop 2022, (Best Paper Award)
project page /
code
We pre-train a generalizable visual representation on diverse human videos and language, and show it enables far more efficient learning across a wide range of robotic manipulation tasks.
|
|
Play it by Ear: Learning Skills amidst Occlusion through Audio-Visual Imitation Learning
Maximilian Du*, Olivia Y. Lee*, Suraj Nair, and Chelsea Finn
Robotics: Science and Systems (RSS), 2022
project page /
code /
press
We propose a method to enable robots to tackle challenging visually occluded manipulation tasks (like extracting keys from a bag), via end-to-end interactive imitation learning from vision and sound. .
|
|
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation
Suraj Nair, Eric Mitchell, Kevin Chen, Brian Ichter, Silvio Savarese, Chelsea Finn
Conference on Robot Learning (CoRL), 2021
project page /
code
We learn language-conditioned visuomotor skills on real robots from entirely offline, pre-collected datasets and crowdsourced language annotation.
|
|
Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks
Bohan Wu, Suraj Nair, Li Fei-Fei*, Chelsea Finn*
Conference on Robot Learning (CoRL), 2021
project page
EMBR is a model-based RL algorithm that learns visuomotor skills and their groundings, which can then be sequenced with symbolic planners to complete long-horizon, multi-stage manipulation tasks on real robots.
|
|
FitVid: Overfitting in Pixel-Level Video Prediction
Mohammad Babaeizadeh, Mohammad Taghi Saffar, Suraj Nair, Sergey Levine, Chelsea Finn, Dumitru Erhan
Arxiv Preprint, 2021
project page /
code
We propose a variational video prediction model that is capable of severe overfitting on common video prediction benchmarks while having similar parameter count as the current SOTA models.
|
|
Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos
Annie S. Chen, Suraj Nair, Chelsea Finn
Robotics Science and Systems (RSS), 2021
ICLR Workshop on Self-Supervised Reinforcement Learning, 2021, (Oral)
project page
We propose a technique for learning multi-task reward functions from a small amount of robot data and large amounts of in-the-wild human videos. By leveraging diverse human data, the learned reward function is able to generalize to new environments and tasks.
|
|
Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction
Bohan Wu, Suraj Nair, Roberto Martin-Martin, Li Fei-Fei*, Chelsea Finn*
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2021
project page
We propose a technique for video prediction which trains a hierarchy of action-conditioned VAEs in a greedy fashion, enabling efficient training of large video prediction models.
|
|
Model-Based Visual Planning with Self-Supervised Functional Distances
Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine
International Conference on Learning Representations (ICLR), 2021 (Spotlight)
project page
We propose a method for offline model-based RL which learns a video prediction model and a Q function based distance metric, and uses them to accomplish visually specified goals.
|
|
Batch Exploration with Examples for Scalable Robotic Reinforcement Learning
Annie S. Chen*, HyunJi Nam*, Suraj Nair*, Chelsea Finn
Robotics and Automation Letters (RA-L) and International Conference on Robotics and Automation (ICRA), 2021
project page /
code
We propose a framework for leveraging weak human superivision to enable better robotic exploration. Using just a few minutes of human supervision, the robot collects high quality data while unsupervised, providing better data for downstream offline RL.
|
|
Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones
Brijen Thananjeyan*, Ashwin Balakrishna*, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, Ken Goldberg
Robotics and Automation Letters (RA-L) and International Conference on Robotics and Automation (ICRA), 2021
project page
An algorithm for safe reinforcement learning which utilizes a set of offline data to learn about constraints before policy learning and a pair of policies which seperate the often conflicting objectives of task directed exploration and constraint satisfaction to learn contact rich and visuomotor control tasks.
|
|
Goal-Aware Prediction: Learning to Model what Matters
Suraj Nair, Silvio Savarese, Chelsea Finn
International Conference on Machine Learning (ICML) , 2020
project page /
code
We explore learning visual dynamics models which are conditioned on goals, and learn to model only goal relevant quantities.
|
|
Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation.
Suraj Nair, Chelsea Finn
International Conference on Learning Representations (ICLR), 2020
project page /
code
We study how we can learn long horizon vision-based tasks in self-supervised settings. Our approach, hierarchical visual foresight, can optimize for a sequence of subgoals that break down the task into easy to complete subsegments.
|
|
Time Reversal as Self-Supervision
Suraj Nair, Mohammad Babaeizadeh, Chelsea Finn, Sergey Levine, Vikash Kumar
International Conference on Robotics and Automation (ICRA) , 2020
project page /
press
We propose a technique that uses time-reversal to learn goals and provide a high level plan to reach them. In particular, our approach explores outward from a set of goal states, "unsolving" a task, which then enables solving the task from new initializations at test time.
|
|
Causal Induction from Visual Observations for Goal-Directed Tasks
Suraj Nair, Yuke Zhu, Silvio Savarese, Li Fei-Fei
Workshop on Causal Machine Learning NeurIPS, 2019
project page /
code
We explore how to effectively predict causal graphs from a small set of visual observations, and how to encorporate the learned graphs into downstream goal conditioned policy learning.
|
|
RoboNet: Large-Scale Multi-Robot Learning
Sudeep Dasari, Frederik Ebert, Stephen Tian, Suraj Nair, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Sergey Levine, Chelsea Finn
Conference on Robot Learning (CoRL) , 2019
project page /
code /
press
We collect a dataset of robotic experience across 4 institutions and 7 robots, and demonstrate that robot learning algorithms leveraging this data can adapt to new environments faster than training from scratch.
|
|
Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration
De-An Huang*, Suraj Nair*, Danfei Xu*, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2019 (Oral)
NTG learns to produce a task graph from a single video demonstration of an unseen task, and leverages it for one-shot imitation learning.
|
|
Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
Danfei Xu*, Suraj Nair*, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese
International Conference on Robotics and Automation (ICRA) , 2018
project page /
code /
Two Minute Papers
Neural Task Programming (NTP) is a meta-learning framework that learns to generate robot-executable neural programs from task demonstration video.
|
|
Reliable RealTime Seismic Signal/Noise Discrimination With Machine Learning
Men-Andrin Meier, Zachary E Ross, Anshul Ramachandran, Ashwin Balakrishna, Suraj Nair, Peter Kundzicz, Zefeng Li, Jennifer Andrews, Egill Hauksson, Yisong Yue.
Journal of Geo-Physical Research: Solid Earth, 2019
Efficient prediction of real local earthquake signals from impulsive signals for earthquake early warning (EEW) alerts.
|
|
Annotated Reconstruction of 3D Spaces Using Drones
Suraj Nair, Anshul Ramachandran, Peter Kundzicz.
MIT Undergraduate Research in Technology Conference (URTC), 2017 (Best Paper Presentation)
Reconstruct 3D voxel representations of a scene with object labels from RGB images captured from a drone, and use it for exporatory motion planning
|
Teaching Assistant: Stanford CS 330 [2019, 2020], Deep Multi-Task and Meta Learning
Teaching Assistant: Caltech CS/EE 155 [2017] , Machine Learning/Data Mining
Teaching Assistant: Caltech CS 121 [2016], Introduction to Relational Databases
Reviewer: NeurIPS, ICML, ICLR, CVPR, ICCV, CoRL, ICRA, IROS
|