COMP 599-002 McGill

Overview

This new course explores selected topics at the intersection of AI and Robotics, and especially intelligent exploration and information acquisition, and how large langaige models (AKA foundation models) can be used to make robotics more flexible and effective. The course does nto presume the student has substantial exposure to robotics, but a comfort with basic AI-relevant algorthms (such as A-star search) is expected. About half the course will be deveoted to tarntional lecture-style presentations, and about half will be devoted to the discussion of recent papers from the research literature. Due to the seminar-style format, student involvement will be important and for students who are uncomforatble speaking up some coaching will be included. Since this course is open to both undergraduate and graduate students, some key aspects of course structure will only be finalized during the first week when the class composition can be observed.

Lecture location: McConnell Engineeing MC103, Moday and Wednesday 10:00-11:30am

Teaching Staff

Instructor: Professor Gregory Dudek (gregory.dudek@mcgill.ca)

MC103

Office Hours: Monday/Wednesday 11:30-12:30 (not offered on days when there is a guest lecturer)

Teaching Assistant(s): TDB

Office Hours and location: TBD.

Zoom link http://mcgill.zoom.us/my/gdudek by arrangement

Textbook

The primary textbook is Computational Principles of Mobile Robotics 3rd edition, by Dudek and Jenkin. Cambridge University Press, 2024. Note we will be using the third edition, which is longer and more timely than the older second (or first) edition. Available at Amazon.ca or Indigo online

Selected readings from the research literature.

Supplementary materials (these excellent books may not get used in this course)

Planning Algorithms, by Steven Lavalle, Cambridge University Press, 2006.

Robotics, Vision, and Control, by Corke

Refresher notes on probabiity and stats (from Stanford undergrad TAs)

Build LLM Applications a very pragmatic guide to using LLMs (only tangentially related to this course).

A nice book on robotic manipulation from the most-excellent Russ Tedrake at MIT.

Great book covering the mehcnical side, such as dynamics: Introduction to Robotics: Mechanics and Control, by Craig

The handbook of robotics from Springer

Detailed Lecture Schedule

To be finalized after the first lecture and before the end of the first week.

The following table is incomplete and not finalized.

#	Date	Lecture	Topics	Suplements
Introduction	Motivation, logistics, scope of the field and the course, sense-plan-act paradigm.	Dudek & Jenkin Ch. 1 Aggressive UAV flight Examples of manipulators from Tedrake book	Slides: Intro (part 1 of 2) [history] (part 2 of 2)
The diversity of robotics and how LLMs change the game	Hard and easy problems in robotics. Analytic and ML based alternatives.	Dudek & Jenkin Ch. 1 Aggressive UAV flight Examples of manipulators from Tedrake book
Intro to Planning	Properties, definitions, deterministic methods.		Planning part 1 and Probabilistic planning and RRT
Planning 2	Planning algorithms and procedures.		Lavalle Ch. 4 Dudek & Jenkin Ch. 6	Planning part 1 and Probabilistic planning and RRT
Configuration space and topology	Nature and topology of C-space classes
Configuration space and topology (2)	Homotopy classes and compleity
Report from marine (sea) trials. Introduction to classical robot control	State space, PID control and Kalman filters
Deep learning and transformers.		Chris Manning on Backprop and Neural Networks Backprop and Neural Networks How to fine-tune a pretrained model
Large Language Models meet robotics	Introduction to foundation models. Also: how to give presentations.
Reinforcement learning 1		Chapter from Jurafsky on Markov models
Reinforcement learning 2
Student presentations and paper discussions, group 1.
Student presentations and paper discussions, group 1.
Student presentations and paper discussions, group 1.
Multi-modality in foundation models.
Introduction to classical robot control part 2	State space, PID control and Kalman filters
Mid-term exam
Student presentations and paper discussions, group 2.
Student presentations and paper discussions.
Student presentations and paper discussions.
Student presentations and paper discussions.
Student presentations and paper discussions.
Student presentations and paper discussions.
Deployent on physical robots. Ethical issues. New developments in the field.
Final project presentations
Course recap and conclusion

Assignments and course evaulation(s)

To be finalized after the first lecture and before the end of the first week. The tentative plans is as follows:

Two assignments.
Course Project.
Possible final exam (potentially an oral exam, pending discussion during lecture one

Syllabus for Fall 2023

Note that the lecture timing and sequence may drift slightly as the terms progresses as a function of student interests, emerging issues and other factors.

Discussion papers

Group 1 papers

A group of papers on interesting and influantial aspects of robot activity modulation from a pragmatic point of view.

Edwin first Task Planning for Long-Horizon Cooking Tasks Based on Large Language Models web link, may require McGill VPN
Quinn Cao (Feb 19) ChatGPT for Robotics: Design Principles and Model Abilities web link
PhotoBot: Reference-Guided Interactive Photography via Natural Language web link
Novak Feb 19 ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence web link
Emma ChatMap: A Wearable Platform Based on the Multi-modal Foundation Model to Augment Spatial Cognition for People with Blindness and Low Vision web link, may require McGill VPN
SocialED: A Python Library for Social Event Detection web link
Adam Alfred and Alfworld Alfred came first" and Alfword builds on it.
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots web link

Group 2 papers

A broad group of cool and exciting papers.

Reinforcement learning human feedback (RLHF).
William CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments web link
Daniel Barroso (March 24th) Generative AI Agents in Autonomous Machines: A Safety Perspective web link
Mikhail Attention Is All You Need https://arxiv.org/abs/1706.03762 NeurIPS link
Titans: Learning to Memorize at Test Time
Steve DeepSeek LLM: Scaling Open-Source Language Models with Longtermism web link
Neural Machine Translation by Jointly Learning to Align and Translate from https://arxiv.org/abs/1409.0473
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey from papers with code
Edicho: Consistent Image Editing in the Wild, uses world knowldeg such as a robot might need to to LLM-based editing: from papers with code
Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly web link
CogAgent: A Visual Language Model for GUI Agents web link
ROBIO: Empowering Robot Path Planning with Large Language Models: osmAG Map Topology & Hierarchy Comprehension with LLMs web link
Lancelot Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics web link
Abstractive Summarization of YouTube Videos Using LaMini-Flan-T5 LLM https://ieeexplore.ieee.org/document/10690747
Scaling Instruction-Finetuned Language Models (about Flan-T5) web link, may require McGill VPN also described at This hugging Face page
ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and Robots web link, may require McGill VPN
YiweiLarge Language Models Powered Context-aware Motion Prediction in Autonomous Driving web link, may require McGill VPN
Related paper: The Waymo Open Sim Agents Challenge
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences web link, may require McGill VPN
Can Vehicle Motion Planning Generalize to Realistic Long-tail Scenarios? web link, may require McGill VPN
Behavior-Actor: Behavioral Decomposition and Efficient-Training for Robotic Manipulation web link, may require McGill VPN
Sebastian TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation web link, may require McGill VPN
Related paper: robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Mariana NARRATE: Versatile Language Architecture for Optimal Control in Robotics web link, may require McGill VPN
Majid Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models web link, may require McGill VPN
Zuhayr Mahmood Fly by Book: How to Train a Humanoid Robot to Fly an Airplane using Large Language Models web link, may require McGill VPN
Howard Qin Sequential Discrete Action Selection via Blocking Conditions and Resolutions web link, may require McGill VPN
Weichen SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models web link, may require McGill VPN
Commonsense Scene Graph-based Target Localization for Object Search web link, may require McGill VPN
Tancred GRID: Scene-Graph-based Instruction-driven Robotic Task Planning web link, may require McGill VPN
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection paper
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions paper and supplementary data
Mourad RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control paper and supplementary data

Please share your presentations

Use this link: https://www.cim.mcgill.ca/~dudek/599/upload.html

Presentation schedule (from March 17 onwards, completed presentations not shown)

March 17:
- Yiwei
- Sebastian
March 19:
- William
- Alexandre
- Adam Friedman
March 24:
- Alex (wrap up from prior class)
- Zuhayr
- Marianna
- Daniel
March 26:
- Raina
- Tanc
- Weichen
March 31:
- Howard
- Majid
- Murad

Additional materials

Interesting video from 2021 Stanford workshop on Foundation Models, on YouTube https://www.youtube.com/watch?v=dG628PEN1fY

This video goes with the paper ("SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models", by Shyam Sundar Kannan, Vishnunandan L. N. Venkatesh & Byung-Cheol Min.) presented by Weichen on March 26: Smart Multi-Agent Robot Task Planning using Large Language Models Note that is also uses AI-2-Thor just like for your project

project

https://askforalfred.com/ https://alfworld.github.io/ https://github.com/alfworld/alfworld https://arxiv.org/pdf/1912.01734 https://arxiv.org/pdf/2010.03768

Please share your presentations

Use this link: https://www.cim.mcgill.ca/~dudek/599/upload.html

Here is the list of presentations given in class, as shared by their authors

Class of 2025: the Future Leaders

Evaluation

The details of the course evaluation scheme and format of some classes will depend of the enrollment and hence will not be fixed until after the first lecture (based on attendance and student mix in the first lecture). Evaluation will be based on three types of activity: class participation, independent work (homework/project), and a possible in-class formal presentation. Based on substantial enrollment in 2017, the in-class presentations may not be possible.

The evaluation for the course is to be based on a combination of assignment, exam, midterm and other elements as discussed in class and as posted.

Technicalities to note

Senate on January 29, 2003 approved the following resolution on academic integrity, which requires that a reminder to students be printed on every course outline:

Whereas, McGill University values academic integrity; Whereas, every term, there are new students who register for the first time at McGill and who need to be informed about academic integrity; Whereas, it is beneficial to remind returning students about academic integrity;

Be it resolved that instructors include the following statement on all course outlines:

McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offences under the Code of Student Conduct and Disciplinary Procedures (see www.mcgill.ca/integrity for more information).

Be it further resolved that failure by an instructor to include a statement about academic integrity on a course outline shall not constitute an excuse by a student for violating the Code of Student Conduct and Disciplinary Procedures.

dudek@cim.mcgill.ca

COMP 599-02 Topics in AI for Robotics and Intelligent Systems

Overview

Teaching Staff

Textbook

Supplementary materials (these excellent books may not get used in this course)

Detailed Lecture Schedule

Assignments and course evaulation(s)

Syllabus for Fall 2023

Discussion papers

Group 1 papers

Group 2 papers

Please share your presentations

Presentation schedule (from March 17 onwards, completed presentations not shown)

Additional materials

project

Please share your presentations

Technicalities to note