Hey Marcos, first thanks for reading.

1 min readMay 24, 2020

Hey Marcos, first thanks for reading. But as you said in practice they have trouble accessing information far in the past. That was what I was trying to portray with the window analogy. Let me try to explain it better. RNNs when propagating time step by time step has the tendency to forget earlier information and only remembers info later in time steps. Therefore having short term memory. RNN process everything sequentially so it has better memory complexity. Attention mechanisms process everything in parallel therefore needing to hold the entire sequence in memory, causing the need for lots of memory resources but this allows it to access information from the entire sequence.

Written by Michael Phi

No responses yet