← all projects

Job Finder

side project live

An eval-driven LLM pipeline that reads job boards for me and only surfaces the ones worth my time.

role Full stack developer impact False-positive rate tuned from 30% down to about 10% 2026

I’ve been around in the tech industry for 7 years now, mostly as a software engineer contractor but also (very shortly) I tried to build a tech recruitment side hustle.

The job market is a two sided problem, it always was. On one side you have employers flooded by candidates many of whom are not even close to a fit. On the other hand you have people trying to find jobs/gigs that pay well and they can do good work at.

Obviously the latter pains me more, even though I had to hire people at times. I have to find job posts, then read the company and JD, see if I’m a fit, see if they seem cool, if they’re remote, etc. It takes real time even before I start answering the form questions.

So I built the “Job Finder”: it searches ATS job boards for target roles, scrapes each posting, runs it through a multi-stage keyword and LLM evaluator, and drops only the qualified ones into a review queue in my Notion “CRM”. I still have to manually apply (which I think you should too so that you ensure quality of your application). But just removing the screening work itself has been a huge win for me.

The biggest problem with this job finder I made was precision. Initially the majority of the jobs it was showing me were just not a fit.

How do you improve the quality of the output of a non-deterministic system? Well, you need evals. You can’t fix something (a prompt, a model, etc) if you don’t know when and how it breaks.

So I made a few Claude skills that help me put my head down and go through the weekly pile of jobs I get. And, together, we evaluate every job and decide whether it should become part of the evals and in what way.

Over time (and with lots of this manual data collection work) I’ve put together over 100 fixtures with examples of jobs I’d like to get and I wouldn’t like to get. This has allowed me to reverse prompt/engineer the evaluation pipeline so that I improved the false positive rate from ~30% to ~10%. Not too shabby.


← all projects