dtgm92 14 hours ago | next |

Does this result in a regular model that say llama-cpp can run? Is there any way to test these ourselves?

hochmartinez 17 hours ago | prev |

"... Compared to full-precision Llama models, BitNet a4.8 reduces memory usage by a factor of 10 and achieves 4x speedup..."