2025-10-23 | MDPs
Goal: Finish adding past days and work on ARENA 2.1
Summary: Introduction to MDPs, add docs for 2025-10-18 and 2025-10-19
Work sessions
| In | Out |
|---|---|
| 11:35 | 11:58 |
| 18:39 | 19:49 |
ARENA 2.1 Bandits and Markov Decision Processes
- Quite a lot of math to go through. I think the lecture moves quite fast without too many examples so I will need to start watching the lecture again and likely use AI to make me some practice examples.
Concept to review from Multi-Arm Bandits 1. Expected Values
Concept to review from Markov Decision Processes
- What is $ \hat{Q}_t(a) $
- Discounted Reward $ G_t $ and why geometric sequences are needed with the coefficient $ \gamma $
- In the four tuple $ {S, T, A, R} $ what is a the simplex $ \Delta A $
The goal for tomorrow is to work through some simple problems from MDPs and Expected Values. The goal is to build up an intuition for Bellman's Equation.
Updating docs
- Added dcoumentation for 2025-10-18 and 2025-10-19
Side Notes
- Might be a nice idea to include writing down the more formalized math down into Latex in a "knowledge" section for my own solidifcation and quick reference (like one page phonology tutorials) since my notes are kind of messy and free form