Scalable-Softmax Is Superior for Attention | Feb. 4, 2025, 8:10 a.m. | Graph | Comments | |
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models | Feb. 4, 2025, 9:40 a.m. | Graph | Comments | |
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization | Feb. 4, 2025, 9:40 a.m. | Graph | Comments | |
Querying Databases with Function Calling | Feb. 4, 2025, 11 a.m. | Graph | Comments | |
Over-Tokenized Transformer: Vocabulary Is Generally Worth Scaling | Feb. 4, 2025, 4:05 p.m. | Graph | Comments |