归档: 2024/2 | QT-7274

Never really desperate, only the lost of the soul.

人定勝天

2024

摘要作者设计了一种方法——通过在Transformers的前馈和注意力投影层中实现8位整数（Int8）矩阵乘法来减少运行大型语言模型（LLM）所需的GPU内存。 We develop a procedure for Int8 matri

2024-02-16 深度学习

深度学习论文工作