Well, there’s a reason why they’re called “Large Language Models”. They can be several GB in size, and the test downloads such a model to your machine in order to properly test it. The tasks involved in the test are also repeated several times for redundancy and in order to generate a significant result.