Overview

This new course explores selected topics at the intersection of AI and Robotics, and especially intelligent exploration and information acquisition, and how large langaige models (AKA foundation models) can be used to make robotics more flexible and effective. The course does nto presume the student has substantial exposure to robotics, but a comfort with basic AI-relevant algorthms (such as A-star search) is expected. About half the course will be deveoted to tarntional lecture-style presentations, and about half will be devoted to the discussion of recent papers from the research literature. Due to the seminar-style format, student involvement will be important and for students who are uncomforatble speaking up some coaching will be included. Since this course is open to both undergraduate and graduate students, some key aspects of course structure will only be finalized during the first week when the class composition can be observed.
Lecture location: McConnell Engineeing MC103, Moday and Wednesday 10:00-11:30am

Teaching Staff

Instructor: Professor Gregory Dudek (gregory.dudek@mcgill.ca)
MC103
Office Hours: Monday/Wednesday 11:30-12:30 (not offered on days when there is a guest lecturer)

Teaching Assistant(s): TDB
Office Hours and location: TBD.
Zoom link http://mcgill.zoom.us/my/gdudek by arrangement

Textbook

  • The primary textbook is Computational Principles of Mobile Robotics 3rd edition, by Dudek and Jenkin. Cambridge University Press, 2024. Note we will be using the third edition, which is longer and more timely than the older second (or first) edition. Available at Amazon.ca or Indigo online
  • Selected readings from the research literature.
  • Supplementary materials (these excellent books may not get used in this course)

  • Planning Algorithms, by Steven Lavalle, Cambridge University Press, 2006.
  • Robotics, Vision, and Control, by Corke
  • Refresher notes on probabiity and stats (from Stanford undergrad TAs)
  • Build LLM Applications a very pragmatic guide to using LLMs (only tangentially related to this course).
  • A nice book on robotic manipulation from the most-excellent Russ Tedrake at MIT.
  • Great book covering the mehcnical side, such as dynamics: Introduction to Robotics: Mechanics and Control, by Craig
  • The handbook of robotics from Springer

    Detailed Lecture Schedule

    To be finalized after the first lecture and before the end of the first week.
    The following table is incomplete and not finalized.
    # Date     Lecture Topics Suplements References Slides
    IntroductionMotivation, logistics, scope of the field and the course, sense-plan-act paradigm.
  • Dudek & Jenkin Ch. 1
  • Aggressive UAV flight
  • Examples of manipulators from Tedrake book
  • Slides: Intro (part 1 of 2)
    [history] (part 2 of 2)
     
    The diversity of robotics and how LLMs change the gameHard and easy problems in robotics. Analytic and ML based alternatives.
  • Dudek & Jenkin Ch. 1
  • Aggressive UAV flight
  • Examples of manipulators from Tedrake book
  • Intro to PlanningProperties, definitions, deterministic methods.   Planning part 1 and Probabilistic planning and RRT
    Planning 2Planning algorithms and procedures.   Lavalle Ch. 4
    Dudek & Jenkin Ch. 6
    Planning part 1 and Probabilistic planning and RRT
    Configuration space and topologyNature and topology of C-space classes
    Configuration space and topology (2)Homotopy classes and compleity
    Report from marine (sea) trials. Introduction to classical robot controlState space, PID control and Kalman filters
    Deep learning and transformers. Chris Manning on Backprop and Neural Networks Backprop and Neural Networks
    How to fine-tune a pretrained model
    Large Language Models meet roboticsIntroduction to foundation models. Also: how to give presentations.
    Reinforcement learning 1 Chapter from Jurafsky on Markov models
    Reinforcement learning 2
    Student presentations and paper discussions, group 1.
    Student presentations and paper discussions, group 1.
    Student presentations and paper discussions, group 1.
    Multi-modality in foundation models.
    Introduction to classical robot control part 2State space, PID control and Kalman filters
    Mid-term exam
    Student presentations and paper discussions, group 2.
    Student presentations and paper discussions.
    Student presentations and paper discussions.
    Student presentations and paper discussions.
    Student presentations and paper discussions.
    Student presentations and paper discussions.
    Deployent on physical robots. Ethical issues. New developments in the field.
    Final project presentations
    Course recap and conclusion        

    Assignments and course evaulation(s)

    To be finalized after the first lecture and before the end of the first week. The tentative plans is as follows:
    • Two assignments.
    • Course Project.
    • Possible final exam (potentially an oral exam, pending discussion during lecture one

    Syllabus for Fall 2023

    Note that the lecture timing and sequence may drift slightly as the terms progresses as a function of student interests, emerging issues and other factors.

    Discussion papers

    Group 1 papers

    A group of papers on interesting and influantial aspects of robot activity modulation from a pragmatic point of view.
    1. Edwin first Task Planning for Long-Horizon Cooking Tasks Based on Large Language Models web link, may require McGill VPN
    2. Quinn Cao (Feb 19) ChatGPT for Robotics: Design Principles and Model Abilities web link
    3. PhotoBot: Reference-Guided Interactive Photography via Natural Language web link
    4. Novak Feb 19 ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence web link
    5. Emma ChatMap: A Wearable Platform Based on the Multi-modal Foundation Model to Augment Spatial Cognition for People with Blindness and Low Vision web link, may require McGill VPN
    6. SocialED: A Python Library for Social Event Detection web link
    7. Adam Alfred and Alfworld Alfred came first" and Alfword builds on it.
    8. Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots web link

    Group 2 papers

    A broad group of cool and exciting papers.
    1. Reinforcement learning human feedback (RLHF).
    2. William CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments web link
    3. Daniel Barroso (March 24th) Generative AI Agents in Autonomous Machines: A Safety Perspective web link
    4. Mikhail Attention Is All You Need https://arxiv.org/abs/1706.03762 NeurIPS link
    5. Titans: Learning to Memorize at Test Time
    6. Steve DeepSeek LLM: Scaling Open-Source Language Models with Longtermism web link
    7. Neural Machine Translation by Jointly Learning to Align and Translate from https://arxiv.org/abs/1409.0473
    8. Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey from papers with code
    9. Edicho: Consistent Image Editing in the Wild, uses world knowldeg such as a robot might need to to LLM-based editing: from papers with code
    10. Exploring What Why and How: A Multifaceted Benchmark for Causation Understanding of Video Anomaly web link
    11. CogAgent: A Visual Language Model for GUI Agents web link
    12. ROBIO: Empowering Robot Path Planning with Large Language Models: osmAG Map Topology & Hierarchy Comprehension with LLMs web link
    13. Lancelot Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics web link
    14. Abstractive Summarization of YouTube Videos Using LaMini-Flan-T5 LLM https://ieeexplore.ieee.org/document/10690747
    15. Scaling Instruction-Finetuned Language Models (about Flan-T5) web link, may require McGill VPN also described at This hugging Face page
    16. ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and Robots web link, may require McGill VPN
    17. YiweiLarge Language Models Powered Context-aware Motion Prediction in Autonomous Driving web link, may require McGill VPN
      Related paper: The Waymo Open Sim Agents Challenge
    18. DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences web link, may require McGill VPN
    19. Can Vehicle Motion Planning Generalize to Realistic Long-tail Scenarios? web link, may require McGill VPN
    20. Behavior-Actor: Behavioral Decomposition and Efficient-Training for Robotic Manipulation web link, may require McGill VPN
    21. Sebastian TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation web link, may require McGill VPN
      Related paper: robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
    22. Mariana NARRATE: Versatile Language Architecture for Optimal Control in Robotics web link, may require McGill VPN
    23. Majid Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models web link, may require McGill VPN
    24. Zuhayr Mahmood Fly by Book: How to Train a Humanoid Robot to Fly an Airplane using Large Language Models web link, may require McGill VPN
    25. Sequential Discrete Action Selection via Blocking Conditions and Resolutions web link, may require McGill VPN
    26. Weichen SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models web link, may require McGill VPN
    27. Commonsense Scene Graph-based Target Localization for Object Search web link, may require McGill VPN
    28. Tancred GRID: Scene-Graph-based Instruction-driven Robotic Task Planning web link, may require McGill VPN
    29. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection paper
    30. ShareGPT4V: Improving Large Multi-Modal Models with Better Captions paper and supplementary data
    31. Mourad RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control paper and supplementary data

    Please share your presentations

    Use this link: https://www.cim.mcgill.ca/~dudek/599/upload.html

    Presentation schedule (from March 17 onwards, completed presentations not shown)

    • March 17:
      • Yiwei
      • Sebastian
    • March 19:
      • William
      • Alexandre
      • Adam Friedman
    • March 24:
      • Alex (wrap up from prior class)
      • Zuhayr
      • Marianna
      • Daniel
    • March 26:
      • Raina
      • Tanc
      • Weichen
    • March 31:
      • Howard
      • Majid
      • Murad

    Additional materials

  • Interesting video from 2021 Stanford workshop on Foundation Models, on YouTube https://www.youtube.com/watch?v=dG628PEN1fY
  • This video goes with the paper ("SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models", by Shyam Sundar Kannan, Vishnunandan L. N. Venkatesh & Byung-Cheol Min.) presented by Weichen on March 26: Smart Multi-Agent Robot Task Planning using Large Language Models Note that is also uses AI-2-Thor just like for your project
  • project

    https://askforalfred.com/ https://alfworld.github.io/ https://github.com/alfworld/alfworld https://arxiv.org/pdf/1912.01734 https://arxiv.org/pdf/2010.03768

    Please share your presentations

    Use this link: https://www.cim.mcgill.ca/~dudek/599/upload.html

    Evaluation

    The details of the course evaluation scheme and format of some classes will depend of the enrollment and hence will not be fixed until after the first lecture (based on attendance and student mix in the first lecture). Evaluation will be based on three types of activity: class participation, independent work (homework/project), and a possible in-class formal presentation. Based on substantial enrollment in 2017, the in-class presentations may not be possible.

    The evaluation for the course is to be based on a combination of assignment, exam, midterm and other elements as discussed in class and as posted.

    Technicalities to note

    Senate on January 29, 2003 approved the following resolution on academic integrity, which requires that a reminder to students be printed on every course outline:

    Whereas, McGill University values academic integrity; Whereas, every term, there are new students who register for the first time at McGill and who need to be informed about academic integrity; Whereas, it is beneficial to remind returning students about academic integrity;

    Be it resolved that instructors include the following statement on all course outlines:

    McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offences under the Code of Student Conduct and Disciplinary Procedures (see www.mcgill.ca/integrity for more information).

    Be it further resolved that failure by an instructor to include a statement about academic integrity on a course outline shall not constitute an excuse by a student for violating the Code of Student Conduct and Disciplinary Procedures.

    dudek@cim.mcgill.ca