Posts by Tags

Attention

Paged Attention and vLLM

6 minute read

Published:

Paged Attention is a memory optimization on which the vLLM Inference Engine is based. Here is a summary of the paper on paged attention and the key features of vLLM that make it so powerful.

Autoencoders

Are Autoencoders Fundamentally Denoisers?

6 minute read

Published:

The core idea behind Autoencoders is to bottleneck information flow so that the DNN is forced to prioritize what information to propagate to the next layer (by restricting the number of dimensions in the latent space). In this project, I explore how this can be a useful denoising tool.

Autoregressive Training

Implementing GPT from Scratch

9 minute read

Published:

This article contains a conceptual explanation, necessary for building a language model from scratch, using the decoder-only transformer architecture. It is based on Andrej’s Karpathys GPT from scratch. The code for this conceptual guide can be found here.

Circuits

Review: A Mathematical Framework for Transformer Circuits

7 minute read

Published:

This paper provides a mental model for reasoning about the internal workings of transformers and attention heads in deep neural networks. The insights here help understand and analyze the behaviors of large models.

Deep Learning

Are Autoencoders Fundamentally Denoisers?

6 minute read

Published:

The core idea behind Autoencoders is to bottleneck information flow so that the DNN is forced to prioritize what information to propagate to the next layer (by restricting the number of dimensions in the latent space). In this project, I explore how this can be a useful denoising tool.

Review: A Mathematical Framework for Transformer Circuits

7 minute read

Published:

This paper provides a mental model for reasoning about the internal workings of transformers and attention heads in deep neural networks. The insights here help understand and analyze the behaviors of large models.

Denoising

Are Autoencoders Fundamentally Denoisers?

6 minute read

Published:

The core idea behind Autoencoders is to bottleneck information flow so that the DNN is forced to prioritize what information to propagate to the next layer (by restricting the number of dimensions in the latent space). In this project, I explore how this can be a useful denoising tool.

Einops and Einsum

KV Cache

Paged Attention and vLLM

6 minute read

Published:

Paged Attention is a memory optimization on which the vLLM Inference Engine is based. Here is a summary of the paper on paged attention and the key features of vLLM that make it so powerful.

LLM Inference

Paged Attention and vLLM

6 minute read

Published:

Paged Attention is a memory optimization on which the vLLM Inference Engine is based. Here is a summary of the paper on paged attention and the key features of vLLM that make it so powerful.

Language Models

Implementing GPT from Scratch

9 minute read

Published:

This article contains a conceptual explanation, necessary for building a language model from scratch, using the decoder-only transformer architecture. It is based on Andrej’s Karpathys GPT from scratch. The code for this conceptual guide can be found here.

Review: A Mathematical Framework for Transformer Circuits

7 minute read

Published:

This paper provides a mental model for reasoning about the internal workings of transformers and attention heads in deep neural networks. The insights here help understand and analyze the behaviors of large models.

Linear Algebra

Mechanistic Interpretability

Review: A Mathematical Framework for Transformer Circuits

7 minute read

Published:

This paper provides a mental model for reasoning about the internal workings of transformers and attention heads in deep neural networks. The insights here help understand and analyze the behaviors of large models.

PyTorch

Are Autoencoders Fundamentally Denoisers?

6 minute read

Published:

The core idea behind Autoencoders is to bottleneck information flow so that the DNN is forced to prioritize what information to propagate to the next layer (by restricting the number of dimensions in the latent space). In this project, I explore how this can be a useful denoising tool.

Implementing GPT from Scratch

9 minute read

Published:

This article contains a conceptual explanation, necessary for building a language model from scratch, using the decoder-only transformer architecture. It is based on Andrej’s Karpathys GPT from scratch. The code for this conceptual guide can be found here.

Signal Processing

Are Autoencoders Fundamentally Denoisers?

6 minute read

Published:

The core idea behind Autoencoders is to bottleneck information flow so that the DNN is forced to prioritize what information to propagate to the next layer (by restricting the number of dimensions in the latent space). In this project, I explore how this can be a useful denoising tool.

Tensor Operations

Transformers

Implementing GPT from Scratch

9 minute read

Published:

This article contains a conceptual explanation, necessary for building a language model from scratch, using the decoder-only transformer architecture. It is based on Andrej’s Karpathys GPT from scratch. The code for this conceptual guide can be found here.

Review: A Mathematical Framework for Transformer Circuits

7 minute read

Published:

This paper provides a mental model for reasoning about the internal workings of transformers and attention heads in deep neural networks. The insights here help understand and analyze the behaviors of large models.

vLLM

Paged Attention and vLLM

6 minute read

Published:

Paged Attention is a memory optimization on which the vLLM Inference Engine is based. Here is a summary of the paper on paged attention and the key features of vLLM that make it so powerful.