Exploring: Retrieval-Augmented Generation (RAG) with open-source LLMs

ByHendy Irawan July 19, 2025July 19, 2025

Some time ago, I’ve been experimenting with building a chatbot powered by Llama 3, LangChain, and vector databases. Initially Qdrant, later switched to Chroma.

Why RAG?

I wanted to test if I could build a helpful assistant from a specific knowledge base. In this case, content from Heni Ardiana’s beautiful travel website, Pesona Matahari 🌻

Here’s what I tried and learned:

✅ Indexing went smoothly using LangChain’s RecursiveCharacterTextSplitter, combined with FastEmbedEmbeddings.

📦 Data was loaded and chunked well, giving a solid starting point for semantic search.

🤖 I deployed the chatbot and integrated it into a Discord channel for real-world interaction.

🧪 Infra Setup:

Hosted on Oracle Cloud (OCI) using an Ampere ARM instance (CPU-only)

Used Ollama to serve Llama 3 models locally

❌ What didn’t go so well:

Qdrant retrieval via Python client occasionally stuck indefinitely, despite working manually, debugging this was inconclusive.

Switched to Chroma, and it performed much more reliably with LangChain.

📉 Evaluation:

Handles basic Q&A well

Struggles with nuanced queries, sometimes misses key info or returns irrelevant chunks

🧭 What’s next?

Looking to explore MLflow for structured experiment tracking and improved iteration speed.

If you’re also building with open-source LLMs or RAG pipelines (especially on CPU-only infra!), let’s share learnings.

💬 Drop a comment or DM. Always open to connect with fellow builders.

Software Engineer

Let Kids Grow Like Ryu Kintaro
ByHendy Irawan July 30, 2025July 30, 2025

I’ve been quietly observing the story of a boy named Ryu Kintaro, and I feel it’s time I spoke from the heart. In a world where social media rewards controversy more than compassion, I find it both amazing and heartbreaking to see how a 9-year-old boy pursuing business and personal growth can become a magnet…

Read More Let Kids Grow Like Ryu Kintaro
Software Engineer

This simple observability stack helped us understand where our backend was silently struggling.
ByHendy Irawan July 25, 2025July 25, 2025

When you’re building backend services with NestJS, one of the first things you’ll notice is this: 🧱 Out of the box, there’s no observability, just logs. But logs alone don’t tell you how your service behaves under load, or why requests are slowing down. So here’s what we did: 🔍 What I was trying to…

Read More This simple observability stack helped us understand where our backend was silently struggling.
Software Engineer

Why I migrated from NestJS-Fastify back to NestJS-Express
ByHendy Irawan July 14, 2025July 14, 2025

A while ago, I led a project using NestJS with Express as the default HTTP adapter. Wanting to optimize performance, we migrated to NestJS with Fastify, which promised better throughput and lower overhead. Here’s what happened: 🔍 The context:We were building internal tools and APIs where performance mattered, but not at hyperscale. As a tech…

Read More Why I migrated from NestJS-Fastify back to NestJS-Express
Software Engineer

Unpopular opinion: Not every startup needs Kubernetes, not even managed Kubernetes.
ByHendy Irawan July 18, 2025July 18, 2025

In my experience as a tech lead, I’ve seen too many early-stage startups overengineer their infrastructure just to “look legit”: setting up Kubernetes clusters when all they needed was to ship a working MVP fast. Here’s what I’ve learned after trying all the paths: 🚀 Fastest to get started: Use a PaaS like Vercel, Railway,…

Read More Unpopular opinion: Not every startup needs Kubernetes, not even managed Kubernetes.
Software Engineer

If I were to rebuild the love matchmaking platform I once launched…
ByHendy Irawan July 16, 2025July 16, 2025

Years ago, I built a love matchmaking platform for Indonesia using Java, Spring Framework, Wicket, and MongoDB. Users could create profiles and match based on preferences and location. At its peak, it reached customers in Indonesia, Singapore, and Malaysia, and hit around $2000 MRR. But here’s the hard truth: ❌ The codebase became hard to…

Read More If I were to rebuild the love matchmaking platform I once launched…
Software Engineer

How I built an MVP fast with almost no backend code
ByHendy Irawan July 15, 2025July 15, 2025

When I needed to ship an MVP quickly, I wanted to stay focused on the frontend, not spend time wiring up backend endpoints for every database operation. I chose Hasura, which exposes a GraphQL API instantly on top of a PostgreSQL database, giving you typesafe queries, mutations, and real-time subscriptions out of the box. ✅…

Read More How I built an MVP fast with almost no backend code

Similar Posts