COMP 599-02 Topics in AI for Robotics and Intelligent Systems
Overview
This new course explores selected topics at the intersection of AI and Robotics, and especially intelligent exploration and information acquisition, and how large langaige models (AKA foundation models) can be used to make robotics more flexible and effective. The course does nto presume the student has substantial exposure to robotics, but a comfort with basic AI-relevant algorthms (such as A-star search) is expected. About half the course will be deveoted to tarntional lecture-style presentations, and about half will be devoted to the discussion of recent papers from the research literature. Due to the seminar-style format, student involvement will be important and for students who are uncomforatble speaking up some coaching will be included. Since this course is open to both undergraduate and graduate students, some key aspects of course structure will only be finalized during the first week when the class composition can be observed.Teaching Staff
Textbook
Supplementary materials (these excellent books may not get used in this course)
Detailed Lecture Schedule
To be finalized after the first lecture and before the end of the first week.# | Date | Lecture | Topics | Suplements | References | Slides |
---|---|---|---|---|---|---|
Introduction | Motivation, logistics, scope of the field and the course, sense-plan-act paradigm. | Slides: Intro (part 1 of 2) [history] (part 2 of 2) |
||||
The diversity of robotics and how LLMs change the game | Hard and easy problems in robotics. Analytic and ML based alternatives. | |||||
Intro to Planning | Properties, definitions, deterministic methods. | Planning part 1 and Probabilistic planning and RRT | ||||
Planning 2 | Planning algorithms and procedures. | Lavalle Ch. 4 Dudek & Jenkin Ch. 6 |
Planning part 1 and Probabilistic planning and RRT | |||
Configuration space and topology | Nature and topology of C-space classes | |||||
Configuration space and topology (2) | Homotopy classes and compleity | |||||
Report from marine (sea) trials. Introduction to classical robot control | State space, PID control and Kalman filters | |||||
Deep learning and transformers. | Chris Manning on Backprop and Neural Networks
Backprop and Neural Networks
How to fine-tune a pretrained model |
|||||
Large Language Models meet robotics | Introduction to foundation models. Also: how to give presentations. | |||||
Reinforcement learning 1 | Chapter from Jurafsky on Markov models | |||||
Reinforcement learning 2 | ||||||
Student presentations and paper discussions, group 1. | ||||||
Student presentations and paper discussions, group 1. | ||||||
Student presentations and paper discussions, group 1. | ||||||
Multi-modality in foundation models. | ||||||
Introduction to classical robot control part 2 | State space, PID control and Kalman filters | |||||
Mid-term exam | ||||||
Student presentations and paper discussions, group 2. | ||||||
Student presentations and paper discussions. | ||||||
Student presentations and paper discussions. | ||||||
Student presentations and paper discussions. | ||||||
Student presentations and paper discussions. | ||||||
Student presentations and paper discussions. | ||||||
Deployent on physical robots. Ethical issues. New developments in the field. | ||||||
Final project presentations | ||||||
Course recap and conclusion |
Assignments and course evaulation(s)
To be finalized after the first lecture and before the end of the first week. The tentative plans is as follows:- Two assignments.
- Course Project.
- Possible final exam (potentially an oral exam, pending discussion during lecture one
Syllabus for Fall 2023
Discussion papers
Group 1 papers
A group of papers on interesting and influantial aspects of robot activity modulation from a pragmatic point of view.- Edwin first Task Planning for Long-Horizon Cooking Tasks Based on Large Language Models web link, may require McGill VPN
- Quinn Cao (Feb 19) ChatGPT for Robotics: Design Principles and Model Abilities web link
- PhotoBot: Reference-Guided Interactive Photography via Natural Language web link
- Novak Feb 19 ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence web link
- Emma ChatMap: A Wearable Platform Based on the Multi-modal Foundation Model to Augment Spatial Cognition for People with Blindness and Low Vision web link, may require McGill VPN
- SocialED: A Python Library for Social Event Detection web link
- Adam Alfred and Alfworld Alfred came first" and Alfword builds on it.
- Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots web link
Group 2 papers
A broad group of cool and exciting papers.- Reinforcement learning human feedback (RLHF).
- William CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments web link
- Daniel Barroso (March 24th) Generative AI Agents in Autonomous Machines: A Safety Perspective web link
- Mikhail Attention Is All You Need https://arxiv.org/abs/1706.03762 NeurIPS link
- Titans: Learning to Memorize at Test Time
- Steve DeepSeek LLM: Scaling Open-Source Language Models with Longtermism web link
- Neural Machine Translation by Jointly Learning to Align and Translate from https://arxiv.org/abs/1409.0473
- Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey from papers with code
- Edicho: Consistent Image Editing in the Wild, uses world knowldeg such as a robot might need to to LLM-based editing: from papers with code
- Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly web link
- CogAgent: A Visual Language Model for GUI Agents web link
- ROBIO: Empowering Robot Path Planning with Large Language Models: osmAG Map Topology & Hierarchy Comprehension with LLMs web link
- Lancelot Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics web link
- Abstractive Summarization of YouTube Videos Using LaMini-Flan-T5 LLM https://ieeexplore.ieee.org/document/10690747
- Scaling Instruction-Finetuned Language Models (about Flan-T5) web link, may require McGill VPN also described at This hugging Face page
- ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and Robots web link, may require McGill VPN
- YiweiLarge Language Models Powered Context-aware Motion Prediction in Autonomous Driving
web link, may require McGill VPN
Related paper: The Waymo Open Sim Agents Challenge - DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences web link, may require McGill VPN
-
Can Vehicle Motion Planning Generalize to Realistic Long-tail Scenarios? web link, may require McGill VPN - Behavior-Actor: Behavioral Decomposition and Efficient-Training for Robotic Manipulation web link, may require McGill VPN
- Sebastian TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation web link, may require McGill VPN
Related paper: robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners- Mariana NARRATE: Versatile Language Architecture for Optimal Control in Robotics web link, may require McGill VPN
- Majid Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models web link, may require McGill VPN
- Zuhayr Mahmood Fly by Book: How to Train a Humanoid Robot to Fly an Airplane using Large Language Models web link, may require McGill VPN
- Sequential Discrete Action Selection via Blocking Conditions and Resolutions web link, may require McGill VPN
- Weichen SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models web link, may require McGill VPN
- Commonsense Scene Graph-based Target Localization for Object Search web link, may require McGill VPN
- Tancred GRID: Scene-Graph-based Instruction-driven Robotic Task Planning web link, may require McGill VPN
- Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection paper
- ShareGPT4V: Improving Large Multi-Modal Models with Better Captions paper and supplementary data
- Mourad RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control paper and supplementary data
Please share your presentations
Use this link: https://www.cim.mcgill.ca/~dudek/599/upload.htmlPresentation schedule (from March 17 onwards, completed presentations not shown)
-
March 17:
- Yiwei
- Sebastian
-
March 19:
- William
- Alexandre
- Adam Friedman
-
March 24:
- Alex (wrap up from prior class)
- Zuhayr
- Marianna
- Daniel
-
March 26:
- Raina
- Tanc
- Weichen
-
March 31:
- Howard
- Majid
- Murad
Additional materials
project
https://askforalfred.com/ https://alfworld.github.io/ https://github.com/alfworld/alfworld https://arxiv.org/pdf/1912.01734 https://arxiv.org/pdf/2010.03768
Please share your presentations
Use this link: https://www.cim.mcgill.ca/~dudek/599/upload.htmlEvaluation
The details of the course evaluation scheme and format of some classes will depend of the enrollment and hence will not be fixed until after the first lecture (based on attendance and student mix in the first lecture). Evaluation will be based on three types of activity: class participation, independent work (homework/project), and a possible in-class formal presentation. Based on substantial enrollment in 2017, the in-class presentations may not be possible.
The evaluation for the course is to be based on a combination of assignment, exam, midterm and other elements as discussed in class and as posted.
Technicalities to note
Senate on January 29, 2003 approved the following resolution on academic integrity, which requires that a reminder to students be printed on every course outline:
Whereas, McGill University values academic integrity; Whereas, every term, there are new students who register for the first time at McGill and who need to be informed about academic integrity; Whereas, it is beneficial to remind returning students about academic integrity;
Be it resolved that instructors include the following statement on all course outlines:
McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offences under the Code of Student Conduct and Disciplinary Procedures (see www.mcgill.ca/integrity for more information).
Be it further resolved that failure by an instructor to include a statement about academic integrity on a course outline shall not constitute an excuse by a student for violating the Code of Student Conduct and Disciplinary Procedures.
dudek@cim.mcgill.ca