Andy Luhrs - Link

Understanding the recent criticism of the Chatbot Arena - Another day another LLM drama. Anyone using LLMs for serious purposes needs to make sure they have evals representative of their real use-cases when comparing new models. Benchmarks and leaderboards mean nothing.