some papers

| Paper Title                               | arXiv Link                                                                                    |
|-------------------------------------------|-----------------------------------------------------------------------------------------------|
| RAG/LoRA etc                              | -                                                                                             |
| Demucs                                    | -                                                                                             |
| ML from scratch series:                   | [Transformers](https://scholar.harvard.edu/binxuw/classes/machine-learning-scratch/materials/transformers) |
|                                           | [Diffusion](https://jalammar.github.io/illustrated-stable-diffusion/)                          |
|                                           | [Resources rich website](https://github.com/DrugowitschLab/ML-from-scratch-seminar)           |
| Diffusion Models                          | [See resource section at the end](https://jalammar.github.io/illustrated-stable-diffusion/)   |
|                                           | [Unet Paper: Biomedical segmentation](https://learn.deeplearning.ai/diffusion-models)         |
|                                           | [minDiffusion git repo](https://github.com/lllyasviel/ControlNet/)                             |
|                                           | [Score sde paper](https://mlberkeley.substack.com/p/vq-vae)                                     |
|                                           | [Unified perspective on DMs tutorial paper](https://amaarora.github.io/posts/2020-06-29-FocalLoss.html) |
| GAN Tutorial                              | [Lilian Gan Blog](https://mlberkeley.substack.com/p/vq-vae)                                     |
|                                           | [GAN specialization from deeplearning.ai](https://mlberkeley.substack.com/p/dalle2?utm_source=substack&utm_campaign=post_embed&utm_medium=web) |
| ControlNets                               | [Paper](https://arxiv.org/abs/2302.05543)                                                     |
|                                           | [Code](https://github.com/lllyasviel/ControlNet/)                                              |
| RL tutorial                               | -                                                                                             |
| Adam original paper                       | -                                                                                             |
| Flash attention paper                     | [Memory-Efficient Attention: xformers from meta github](https://www.tensorflow.org/text/tutorials/transformer) |
| Focal Vs Cross Entropy                    | [Article](https://amaarora.github.io/posts/2020-06-29-FocalLoss.html)                          |
| Mamba                                     | -                                                                                             |
| CLIP                                      | -                                                                                             |
| Wavenet                                   | -                                                                                             |
| A Neural Probabilistic Language Model    | -                                                                                             |
| Byte pair encoding paper                  | -                                                                                             |
| BERT                                      | -                                                                                             |
| Bahdanau et al., 2014and Luong et al., 2015 | -                                                                                           |
| VQ-VAE                                    | [Why are we doing the discretisation, what goal it achieves?](https://mlberkeley.substack.com/p/vq-vae) |
| Soundstream                               | [Learn SFT and CNN visual from 3brown1blue channel](https://mlberkeley.substack.com/p/vq-vae) |
|                                           | [Cover CNN basics again with Image and audio and see transformer, audiogen, musicgen, encoded paper as reference](https://mlberkeley.substack.com/p/vq-vae) |
| ML Crash Course                           | [Part 1](https://mlberkeley.substack.com/p/part-1) / [Part 2](https://mlberkeley.substack.com/p/part-2) / [Part 3](https://mlberkeley.substack.com/p/part-3) / [Part 4](https://mlberkeley.substack.com/p/part-4) / [Part 5](https://mlberkeley.substack.com/p/part-5) |
|                                           | [Karpathy makemore series : code part 1,2,3,4,5](https://mlberkeley.substack.com/p/part-1)      |
| Dall-e                                    | [Original paper](https://mlberkeley.substack.com/p/vq-vae)                                      |
|                                           | [Additional Info](https://mlberkeley.substack.com/p/dalle2?utm_source=substack&utm_campaign=post_embed&utm_medium=web) |
| Information theory                       | [Visual Information](https://colah.github.io/posts/2015-09-Visual-Information/)                |
|                                           | [Kullback-Leibler Divergence](https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained) |
| VAE tutorial paper                        | [What is Variational Autoencoder (VAE) Tutorial](https://jaan.io/what-is-variational-autoencoder-vae-tutorial/) |
|                                           | [Autoencoder](https://www.compthree.com/blog/autoencoder/)                                      |
|                                           | [Revision Material](https://www.compthree.com/blog/autoencoder/)                                |
| Layer Normalization: the main thing      | [Batch Layer Normalization](https://www.pinecone.io/learn/batch-layer-normalization/)           |
| Audiogen                                  | -                                                                                             |
| Attention is all you need                 | [Transformer Tutorial](https://www.tensorflow.org/text/tutorials/transformer)                   |
|                                           | [Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/)                   |
| Residual connection original paper        | [Residual Blocks](https://towardsdatascience.com/residual-blocks-building-blocks-of-resnet-fd90ca15d6ec) |
| RNN original paper                        |

some blogs

Lil'log
Jay Alammar
Andrej Karpathy
Colah's Blog
ML@Berkely
AI Summer
Blog pages of Deepmind, OpenAI labs

websites to hunt papers

openreview.net
arxiv.org
google-scholar

some researchers to follow

Jurgen Schmidhuber
Andrej Karpathy
Ilya Sutskever
Ian Goodfellow
Yann Lecun
Yoshua Bengio
Geoffrey Hinton
Alexi Efros
Andrew Ng
Sharon Zhou