“Ternary models can match performance of full-precision LLMs at 1.58 bits per weight.”
If this generalizes, the entire premise of GPU shortages changes. Models that fit on commodity hardware kill the moat that hyperscalers spent two years building. Worth tracking whether the benchmarks survive contact with real workloads.