Skip to content

2025-10-23 | MDPs

Goal: Finish adding past days and work on ARENA 2.1

Summary: Introduction to MDPs, add docs for 2025-10-18 and 2025-10-19

Work sessions

In Out
11:35 11:58
18:39 19:49

ARENA 2.1 Bandits and Markov Decision Processes

  • Quite a lot of math to go through. I think the lecture moves quite fast without too many examples so I will need to start watching the lecture again and likely use AI to make me some practice examples.

Concept to review from Multi-Arm Bandits 1. Expected Values

Concept to review from Markov Decision Processes

  1. What is $ \hat{Q}_t(a) $
  2. Discounted Reward $ G_t $ and why geometric sequences are needed with the coefficient $ \gamma $
  3. In the four tuple $ {S, T, A, R} $ what is a the simplex $ \Delta A $

The goal for tomorrow is to work through some simple problems from MDPs and Expected Values. The goal is to build up an intuition for Bellman's Equation.

Updating docs

Side Notes

  • Might be a nice idea to include writing down the more formalized math down into Latex in a "knowledge" section for my own solidifcation and quick reference (like one page phonology tutorials) since my notes are kind of messy and free form