Creating the tiniest information retriever |

OK, here’s my question:

Why do all these RAG apps use giant f**king models??

Default Langchain RAG tutorial - ChatGPT 3.5 Turbo (175B param)
Random RAG tutorial from Google - ChatGPT 3.5 Turbo (175B param)
Another tutorial - Llama 2 (70B param)

There are more, but you get the point. I’m worked up about this because these foundation models- chatGPT, Claude, Bard, the whole lot- are so freaking powerful that using them for information retrieval is super inefficient. Think about it- they can do so many tasks, using them for only one small task is a waste of resources!

chatGPT can summarize, generate, extract information, _and_ embed information!

Not only are they overkill in terms of computation power, but these models under perform significantly if your data is not the typical middle-class white American kind of data. And that sucks because there are so many interesting and useful datasets out there that don’t fall into this bucket, like: * A database of Eminem song lyrics * Websites for literally any government in the global south * Super-duper technical manuals for weird German manufacturing equipment * Store policies for your friendly neighborhood sex shop

Since we’re not like those other guys in tech, we’re going to take the “AI for everyone” mantra and actually do something to make it happen. And not only will we make it happen, but “it” will be harder, better, faster, and stronger than what those general foundation models can do!

So here’s the plan

Three-step plan for IR domination

So we’re going to go after niche, non-English datasets like the ones listed above and show that you don’t need an enormous general LLM to do information retrieval. But wait! There’s more:

We’re going to do it without a GPU!

That’s right! These model are so small that we should be able to get fast performance without a GPU! Since we’ll only be encoding the incoming query, not doing inference, this should be pretty straightforward.

Moreover, to prove my point I want to set up a battle UI between chatGPT, Cohere multilingual, and my tiny retriever and deploy it as a huggingface space for folks to try. Then people can try for themselves and see the difference that a good retriever makes!

The next steps:

Find a good multilingual dataset and pre-process it
Build the battle UI and hook it up to chatGPT and Cohere
Fine-tune a tiny BERT model on the dataset in each language
Deploy those tiny models to a server with some logic to switch for each language
Test it out and see if I can beat Cohere!

Creating the tiniest information retriever

Why do all these RAG apps use giant f**king models??

So here’s the plan

We’re going to do it without a GPU!

The next steps:

Get the next article in your inbox