Posts by Tags

Paged Attention and vLLM

6 minute read

Published: February 04, 2025

Paged Attention is a memory optimization on which the vLLM Inference Engine is based. Here is a summary of the paper on paged attention and the key features of vLLM that make it so powerful.

Are Autoencoders Fundamentally Denoisers?

6 minute read

Published: November 08, 2024

The core idea behind Autoencoders is to bottleneck information flow so that the DNN is forced to prioritize what information to propagate to the next layer (by restricting the number of dimensions in the latent space). In this project, I explore how this can be a useful denoising tool.

Implementing GPT from Scratch

9 minute read

Published: November 03, 2024

This article contains a conceptual explanation, necessary for building a language model from scratch, using the decoder-only transformer architecture. It is based on Andrej’s Karpathys GPT from scratch. The code for this conceptual guide can be found here.

Review: Interpretability in the Wild: A Circuit for Indirect Object Detection in GPT2-Small

3 minute read

Published: October 30, 2024

A paper review highlighting the key discoveries with respect to attention heads and the algorithms used.

Review: A Mathematical Framework for Transformer Circuits

7 minute read

Published: October 25, 2024

This paper provides a mental model for reasoning about the internal workings of transformers and attention heads in deep neural networks. The insights here help understand and analyze the behaviors of large models.

Are Autoencoders Fundamentally Denoisers?

6 minute read

Published: November 08, 2024

The core idea behind Autoencoders is to bottleneck information flow so that the DNN is forced to prioritize what information to propagate to the next layer (by restricting the number of dimensions in the latent space). In this project, I explore how this can be a useful denoising tool.

Einops and Einsum Summarized

4 minute read

Published: November 06, 2024

A brief summary on einops and einsum, usage documentation and an implementation of Average Pooling in CNNs using einops (inspired from the max pooling layer implemented in the original library documentation).

Review: Interpretability in the Wild: A Circuit for Indirect Object Detection in GPT2-Small

3 minute read

Published: October 30, 2024

A paper review highlighting the key discoveries with respect to attention heads and the algorithms used.

Review: A Mathematical Framework for Transformer Circuits

7 minute read

Published: October 25, 2024

This paper provides a mental model for reasoning about the internal workings of transformers and attention heads in deep neural networks. The insights here help understand and analyze the behaviors of large models.

Are Autoencoders Fundamentally Denoisers?

6 minute read

Published: November 08, 2024

The core idea behind Autoencoders is to bottleneck information flow so that the DNN is forced to prioritize what information to propagate to the next layer (by restricting the number of dimensions in the latent space). In this project, I explore how this can be a useful denoising tool.

Einops and Einsum Summarized

4 minute read

Published: November 06, 2024

A brief summary on einops and einsum, usage documentation and an implementation of Average Pooling in CNNs using einops (inspired from the max pooling layer implemented in the original library documentation).

Paged Attention and vLLM

6 minute read

Published: February 04, 2025

Paged Attention is a memory optimization on which the vLLM Inference Engine is based. Here is a summary of the paper on paged attention and the key features of vLLM that make it so powerful.

Paged Attention and vLLM

6 minute read

Published: February 04, 2025

Paged Attention is a memory optimization on which the vLLM Inference Engine is based. Here is a summary of the paper on paged attention and the key features of vLLM that make it so powerful.

Implementing GPT from Scratch

9 minute read

Published: November 03, 2024

This article contains a conceptual explanation, necessary for building a language model from scratch, using the decoder-only transformer architecture. It is based on Andrej’s Karpathys GPT from scratch. The code for this conceptual guide can be found here.

Review: Interpretability in the Wild: A Circuit for Indirect Object Detection in GPT2-Small

3 minute read

Published: October 30, 2024

A paper review highlighting the key discoveries with respect to attention heads and the algorithms used.

Review: A Mathematical Framework for Transformer Circuits

7 minute read

Published: October 25, 2024

This paper provides a mental model for reasoning about the internal workings of transformers and attention heads in deep neural networks. The insights here help understand and analyze the behaviors of large models.

Einops and Einsum Summarized

4 minute read

Published: November 06, 2024

A brief summary on einops and einsum, usage documentation and an implementation of Average Pooling in CNNs using einops (inspired from the max pooling layer implemented in the original library documentation).

Review: Interpretability in the Wild: A Circuit for Indirect Object Detection in GPT2-Small

3 minute read

Published: October 30, 2024

A paper review highlighting the key discoveries with respect to attention heads and the algorithms used.

Review: A Mathematical Framework for Transformer Circuits

7 minute read

Published: October 25, 2024

This paper provides a mental model for reasoning about the internal workings of transformers and attention heads in deep neural networks. The insights here help understand and analyze the behaviors of large models.

Are Autoencoders Fundamentally Denoisers?

6 minute read

Published: November 08, 2024

The core idea behind Autoencoders is to bottleneck information flow so that the DNN is forced to prioritize what information to propagate to the next layer (by restricting the number of dimensions in the latent space). In this project, I explore how this can be a useful denoising tool.

Einops and Einsum Summarized

4 minute read

Published: November 06, 2024

A brief summary on einops and einsum, usage documentation and an implementation of Average Pooling in CNNs using einops (inspired from the max pooling layer implemented in the original library documentation).

Implementing GPT from Scratch

9 minute read

Published: November 03, 2024

This article contains a conceptual explanation, necessary for building a language model from scratch, using the decoder-only transformer architecture. It is based on Andrej’s Karpathys GPT from scratch. The code for this conceptual guide can be found here.

Are Autoencoders Fundamentally Denoisers?

6 minute read

Published: November 08, 2024

The core idea behind Autoencoders is to bottleneck information flow so that the DNN is forced to prioritize what information to propagate to the next layer (by restricting the number of dimensions in the latent space). In this project, I explore how this can be a useful denoising tool.

Einops and Einsum Summarized

4 minute read

Published: November 06, 2024

A brief summary on einops and einsum, usage documentation and an implementation of Average Pooling in CNNs using einops (inspired from the max pooling layer implemented in the original library documentation).

Implementing GPT from Scratch

9 minute read

Published: November 03, 2024

This article contains a conceptual explanation, necessary for building a language model from scratch, using the decoder-only transformer architecture. It is based on Andrej’s Karpathys GPT from scratch. The code for this conceptual guide can be found here.

Review: Interpretability in the Wild: A Circuit for Indirect Object Detection in GPT2-Small

3 minute read

Published: October 30, 2024

A paper review highlighting the key discoveries with respect to attention heads and the algorithms used.

Review: A Mathematical Framework for Transformer Circuits

7 minute read

Published: October 25, 2024

This paper provides a mental model for reasoning about the internal workings of transformers and attention heads in deep neural networks. The insights here help understand and analyze the behaviors of large models.

Paged Attention and vLLM

6 minute read

Published: February 04, 2025

Paged Attention is a memory optimization on which the vLLM Inference Engine is based. Here is a summary of the paper on paged attention and the key features of vLLM that make it so powerful.

Pratik Doshi

Posts by Tags

Attention

Autoencoders

Autoregressive Training

Circuits

Deep Learning

Denoising

Einops and Einsum

KV Cache

LLM Inference

Language Models

Linear Algebra

Mechanistic Interpretability

PyTorch

Signal Processing

Tensor Operations

Transformers

vLLM