Articles
165
Tags
34
Categories
0
Home
Archives
Tags
Categories
List
Music
Movie
Link
About
ALTNT's Hexo Blog
Home
Archives
Tags
Categories
List
Music
Movie
Link
About
Differential Transformer
Created
2025-05-01
|
Updated
2025-05-15
|
Post Views:
Hey, password is required here.
Author:
ALTNT
Link:
http://blog.705553939.xyz/2025/05/01/sequence-processing/2025-ICLR-DIFF-Transformer/
Copyright Notice:
All articles in this blog are licensed under
CC BY-NC-SA 4.0
unless stating additionally.
Sequence Processing
Previous
周报2025年5月11日
Next
周报2025年4月28日
Related Articles
2024-11-24
CSDI代码解释
2024-11-25
CSDI:Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation
2024-12-08
Representation Learning with Contrastive Predictive Coding
2025-04-10
Improving Time Series Encoding with Noise-Aware Self-Supervised Learning and an Efficient Encoder
ALTNT
Articles
165
Tags
34
Categories
0
Follow Me
Announcement
This is my Blog
Contents
1.
Differential Transformer
1.1.
摘要
1.2.
引言
1.3.
2. 差分Transformer
1.3.1.
2.1 差分注意力
1.3.2.
2.2 整体架构
1.4.
3. 实验
1.4.1.
3.1 语言建模评估
1.4.2.
3.2 与Transformer的可扩展性(SCALABILITY)比较
1.4.3.
3.3 长上下文评估
1.4.4.
3.4 关键信息检索
1.4.5.
3.5 上下文学习
1.4.6.
3.6 上下文幻觉评估
1.4.7.
3.7 激活异常值分析
1.4.8.
3.8 消融研究
1.5.
4. 结论
1.6.
附录A. 差分注意力的实现
1.7.
附录B. 语言建模评估
1.8.
附录C. 数学推理评估
1.8.1.
数学能力评估
1.8.2.
o1风格推理评估
1.9.
附录D. 第3.1节的超参数
1.10.
附录E. 第3.2节的超参数
1.11.
附录F. 上下文学习的鲁棒性
1.12.
附录G. 差分Transformer的梯度流
1.12.1.
的推导
Recent Post
周报2025年6月8日
2025-06-09
The 20 m Africa rice distribution map of 2023
2025-06-04
DCM————DeepCropMapping: A multi-temporal deep learning approach with improved spatial generalizability for dynamic corn and soybean mapping
2025-06-04
重新复现CACM记录
2025-06-04
机器学习相关概念
2025-05-26