

3·
13 days agoIt depends on what you mean by “relative responsiveness”, but you can absolutely get ~4 tokens/sec of performance on R1 671b (Q4 quantized) from a system costing a fraction of the number you quote.
It depends on what you mean by “relative responsiveness”, but you can absolutely get ~4 tokens/sec of performance on R1 671b (Q4 quantized) from a system costing a fraction of the number you quote.
It is indeed called a refund by the IRS and all tax professionals. The person(s) attempting to correct your use of “refund” are wrong, but they were probably trying to make the point that giving a lot of extra money to the government interest-free is not a smart financial idea.
Yeah I definitely get your point (and I didn’t downvote you, for the record). But I will note that ChatGPT generates text way faster than most people can read, and 4 tokens/second, while perhaps slower than reading speed for some people, is not that bad in my experience.