avatar
Articles
165
Tags
34
Categories
0

Home
Archives
Tags
Categories
List
  • Music
  • Movie
Link
About
ALTNT's Hexo Blog
Home
Archives
Tags
Categories
List
  • Music
  • Movie
Link
About

Differential Transformer

Created2025-05-01|Updated2025-05-15
|Post Views:
Author: ALTNT
Link: http://blog.705553939.xyz/2025/05/01/sequence-processing/2025-ICLR-DIFF-Transformer/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.
Sequence Processing
Previous
周报2025年5月11日
Next
周报2025年4月28日
Related Articles
2024-11-24
CSDI代码解释
2024-11-25
CSDI:Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation
2024-12-08
Representation Learning with Contrastive Predictive Coding
2025-04-10
Improving Time Series Encoding with Noise-Aware Self-Supervised Learning and an Efficient Encoder
avatar
ALTNT
Articles
165
Tags
34
Categories
0
Follow Me
Announcement
This is my Blog
Contents
  1. 1. Differential Transformer
    1. 1.1. 摘要
    2. 1.2. 引言
    3. 1.3. 2. 差分Transformer
      1. 1.3.1. 2.1 差分注意力
      2. 1.3.2. 2.2 整体架构
    4. 1.4. 3. 实验
      1. 1.4.1. 3.1 语言建模评估
      2. 1.4.2. 3.2 与Transformer的可扩展性(SCALABILITY)比较
      3. 1.4.3. 3.3 长上下文评估
      4. 1.4.4. 3.4 关键信息检索
      5. 1.4.5. 3.5 上下文学习
      6. 1.4.6. 3.6 上下文幻觉评估
      7. 1.4.7. 3.7 激活异常值分析
      8. 1.4.8. 3.8 消融研究
    5. 1.5. 4. 结论
    6. 1.6. 附录A. 差分注意力的实现
    7. 1.7. 附录B. 语言建模评估
    8. 1.8. 附录C. 数学推理评估
      1. 1.8.1. 数学能力评估
      2. 1.8.2. o1风格推理评估
    9. 1.9. 附录D. 第3.1节的超参数
    10. 1.10. 附录E. 第3.2节的超参数
    11. 1.11. 附录F. 上下文学习的鲁棒性
    12. 1.12. 附录G. 差分Transformer的梯度流
      1. 1.12.1. 的推导
Recent Post
周报2025年6月8日2025-06-09
The 20 m Africa rice distribution map of 20232025-06-04
DCM————DeepCropMapping: A multi-temporal deep learning approach with improved spatial generalizability for dynamic corn and soybean mapping2025-06-04
重新复现CACM记录2025-06-04
机器学习相关概念2025-05-26
©2020 - 2025 By ALTNT
Framework Hexo|Theme Butterfly