• 0 Posts
  • 58 Comments
Joined 1 year ago
cake
Cake day: June 17th, 2023

help-circle
















  • A markov chain models a process as a transition between states were transition probabilities depends only on the current state.

    A LLM is ideally less a markov chain, more similar to a discrete langevin dynamics as both have a memory (attention mechanism for LLMs, inertia for LD) and both a noise defined by a parameter (temperature in both cases, the name temperature in LLM context is exactly derived from thermodynamics).

    As far as I remember the original attention paper doesn’t reference markov processes.

    I am not saying one cannot explain it starting from a markov chain, it is just that saying that we could do it decades ago but we didn’t have the horse power and the data is wrong. We didn’t have the method to simulate writing. We now have a decent one, and the horse power to train on a lot of data