Hi 馃憢!

This is Shubham. I am a Software Developer, currently working @enumerate.ai
Welcome to my blog/portfolio where I document my day to day learning.
Know more About Me | Resume

Enhancing Retrieval with Hypothetical Document Embeddings (HyDE) in RAG Systems

Find the paper here RAG The Retrieval-Augmented Generation (RAG) technique offers a promising approach when leveraging large language models like LLMs to build knowledge bases. Envision creating a chatbot capable of querying a collection of textbooks. A standard pre-trained LLM doesn鈥檛 inherently possess this capability. This is where RAG comes into play. RAG works by dissecting your corpus into more manageable segments or documents. Ideally, these segments should fit within the context window of the language model in use....

October 5, 2023 路 3 min 路 Shubham Singh

Lost In The Middle - LLM

I鈥檝e been working with large language models from across the spectrum, including OpenAI GPT, Anthropic, LLAMA, etc., for quite some time. For most of this journey, I was leaning towards selecting larger models (with more context windows) and cramming as much context as possible in hopes of getting better inferences and domain-specific reasoning. However, after the release of GPT-3.5-Turbo-16K, I realized the capabilities of these LLMs do not necessarily scale with the context window....

July 28, 2023 路 2 min 路 Shubham Singh

Teaching Task to Language models using: Zero Shot, One Shot & Few Shot

In this blog, we will discuss prompt engineering techniques used in working with large language models: zero-shot, one-shot, and few-shot learning. These methods aim to enable models to perform and learn new tasks quickly with little or no training data. Zero-Shot Learning Zero-shot learning refers to the ability of a language model to perform a task without having seen any examples from that specific task during training. This is particularly useful when there is a lack of labeled data for a given task....

March 5, 2023 路 2 min 路 Shubham Singh

TIL [1] - Postgres & Logical Decoding

TIL (Today I Learned) - short notes/articles from my day to day learning Introduction Recently I was working with a legacy pub sub architecture with MongoDB which was utilizing mongo change streams for pushing real time data changes. Now I had to find something similar that could be incorporated in PostgreSQL, a quick google search will show you that PostgreSQL or MySQL doesn鈥檛 have something exactly similar to mongo streams. Now, something like this can also be implemented on application level but in that architecture, the stream is centralized to our application and not very efficient if we are talking in terms of accessing it from multiple different services....

April 7, 2022 路 3 min 路 Shubham Singh

HTTP DSL in Ruby

To explore the meta programming abilities of ruby programming language, I worked upon building an internal DSL for making http requests. To further understand the concept of internal DSL refer to https://martinfowler.com/bliki/InternalDslStyle.html. I call this package Mock and it provides basic functionalities for working with HTTP requests. Currently supports get, post & delete verbs and can be easily extended to all the other verbs as well. Allows for setting cookies, headers, query & params and allow processing json responses with ease....

February 5, 2022 路 2 min 路 Shubham Singh

TIL [0] - (Case Study) Getting an index scan on timestamp column with a predicate?

Problem How to get index scan on indexed timestamp column in postgres alongside a predicate? Solution Ensure usage of functions which are tagged as STABLE or IMMUTATBLE in query predicate and also explicitly specify the predicate in case of partial index Case Study Consider the following table named User. (Can directly jump to explanation section) User + id (INT) + username (TEXT) + created_at (TIMESTAMPTZ) At some point the table saw exponential growth in inserts so it was decided to index created_at column but since only data from frequent time series was required (let鈥檚 say last one month), it made sense to just partially index the table....

January 7, 2022 路 3 min 路 Shubham Singh

Server Side Template Injection with Flask and Jinja2

Introduction Last week I took part in a CTF, and one of the problems from the 鈥淲eb鈥 category seems pretty intriguing to me. Although I was able to get the Flag, but couldn鈥檛 submit the flag in time, so no points for me 馃様. The interesting thing about this problem was, that the technology which needed to be exploited was something I am pretty familiar with and had a lot of experience in, but still, even after working with that tech stack for years, I was unaware of this vulnerability....

June 13, 2020 路 8 min 路 Shubham Singh