| Paper Title | arXiv Link |
|-------------------------------------------|-----------------------------------------------------------------------------------------------|
| RAG/LoRA etc | - |
| Demucs | - |
| ML from scratch series: | [Transformers](https://scholar.harvard.edu/binxuw/classes/machine-learning-scratch/materials/transformers) |
| | [Diffusion](https://jalammar.github.io/illustrated-stable-diffusion/) |
| | [Resources rich website](https://github.com/DrugowitschLab/ML-from-scratch-seminar) |
| Diffusion Models | [See resource section at the end](https://jalammar.github.io/illustrated-stable-diffusion/) |
| | [Unet Paper: Biomedical segmentation](https://learn.deeplearning.ai/diffusion-models) |
| | [minDiffusion git repo](https://github.com/lllyasviel/ControlNet/) |
| | [Score sde paper](https://mlberkeley.substack.com/p/vq-vae) |
| | [Unified perspective on DMs tutorial paper](https://amaarora.github.io/posts/2020-06-29-FocalLoss.html) |
| GAN Tutorial | [Lilian Gan Blog](https://mlberkeley.substack.com/p/vq-vae) |
| | [GAN specialization from deeplearning.ai](https://mlberkeley.substack.com/p/dalle2?utm_source=substack&utm_campaign=post_embed&utm_medium=web) |
| ControlNets | [Paper](https://arxiv.org/abs/2302.05543) |
| | [Code](https://github.com/lllyasviel/ControlNet/) |
| RL tutorial | - |
| Adam original paper | - |
| Flash attention paper | [Memory-Efficient Attention: xformers from meta github](https://www.tensorflow.org/text/tutorials/transformer) |
| Focal Vs Cross Entropy | [Article](https://amaarora.github.io/posts/2020-06-29-FocalLoss.html) |
| Mamba | - |
| CLIP | - |
| Wavenet | - |
| A Neural Probabilistic Language Model | - |
| Byte pair encoding paper | - |
| BERT | - |
| Bahdanau et al., 2014and Luong et al., 2015 | - |
| VQ-VAE | [Why are we doing the discretisation, what goal it achieves?](https://mlberkeley.substack.com/p/vq-vae) |
| Soundstream | [Learn SFT and CNN visual from 3brown1blue channel](https://mlberkeley.substack.com/p/vq-vae) |
| | [Cover CNN basics again with Image and audio and see transformer, audiogen, musicgen, encoded paper as reference](https://mlberkeley.substack.com/p/vq-vae) |
| ML Crash Course | [Part 1](https://mlberkeley.substack.com/p/part-1) / [Part 2](https://mlberkeley.substack.com/p/part-2) / [Part 3](https://mlberkeley.substack.com/p/part-3) / [Part 4](https://mlberkeley.substack.com/p/part-4) / [Part 5](https://mlberkeley.substack.com/p/part-5) |
| | [Karpathy makemore series : code part 1,2,3,4,5](https://mlberkeley.substack.com/p/part-1) |
| Dall-e | [Original paper](https://mlberkeley.substack.com/p/vq-vae) |
| | [Additional Info](https://mlberkeley.substack.com/p/dalle2?utm_source=substack&utm_campaign=post_embed&utm_medium=web) |
| Information theory | [Visual Information](https://colah.github.io/posts/2015-09-Visual-Information/) |
| | [Kullback-Leibler Divergence](https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained) |
| VAE tutorial paper | [What is Variational Autoencoder (VAE) Tutorial](https://jaan.io/what-is-variational-autoencoder-vae-tutorial/) |
| | [Autoencoder](https://www.compthree.com/blog/autoencoder/) |
| | [Revision Material](https://www.compthree.com/blog/autoencoder/) |
| Layer Normalization: the main thing | [Batch Layer Normalization](https://www.pinecone.io/learn/batch-layer-normalization/) |
| Audiogen | - |
| Attention is all you need | [Transformer Tutorial](https://www.tensorflow.org/text/tutorials/transformer) |
| | [Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/) |
| Residual connection original paper | [Residual Blocks](https://towardsdatascience.com/residual-blocks-building-blocks-of-resnet-fd90ca15d6ec) |
| RNN original paper |
Lil'log
Jay Alammar
Andrej Karpathy
Colah's Blog
ML@Berkely
AI Summer
Blog pages of Deepmind, OpenAI labs
openreview.net
arxiv.org
google-scholar
Jurgen Schmidhuber
Andrej Karpathy
Ilya Sutskever
Ian Goodfellow
Yann Lecun
Yoshua Bengio
Geoffrey Hinton
Alexi Efros
Andrew Ng
Sharon Zhou