OK, here’s my question:

Why do all these RAG apps use giant f**king models??

There are more, but you get the point. I’m worked up about this because these foundation models- chatGPT, Claude, Bard, the whole lot- are so freaking powerful that using them for information retrieval is super inefficient. Think about it- they can do so many tasks, using them for only one small task is a waste of resources!

chatGPT can summarize, generate, extract information, _and_ embed information!

Not only are they overkill in terms of computation power, but these models under perform significantly if your data is not the typical middle-class white American kind of data. And that sucks because there are so many interesting and useful datasets out there that don’t fall into this bucket, like: * A database of Eminem song lyrics * Websites for literally any government in the global south * Super-duper technical manuals for weird German manufacturing equipment * Store policies for your friendly neighborhood sex shop

Since we’re not like those other guys in tech, we’re going to take the “AI for everyone” mantra and actually do something to make it happen. And not only will we make it happen, but “it” will be harder, better, faster, and stronger than what those general foundation models can do!

So here’s the plan

Three-step plan for IR domination

So we’re going to go after niche, non-English datasets like the ones listed above and show that you don’t need an enormous general LLM to do information retrieval. But wait! There’s more:

We’re going to do it without a GPU!

That’s right! These model are so small that we should be able to get fast performance without a GPU! Since we’ll only be encoding the incoming query, not doing inference, this should be pretty straightforward.

Moreover, to prove my point I want to set up a battle UI between chatGPT, Cohere multilingual, and my tiny retriever and deploy it as a huggingface space for folks to try. Then people can try for themselves and see the difference that a good retriever makes!

The next steps:

  1. Find a good multilingual dataset and pre-process it
  2. Build the battle UI and hook it up to chatGPT and Cohere
  3. Fine-tune a tiny BERT model on the dataset in each language
  4. Deploy those tiny models to a server with some logic to switch for each language
  5. Test it out and see if I can beat Cohere!