Skip to content


update for 07Train/
Browse files Browse the repository at this point in the history
  • Loading branch information
chenzomi12 committed Mar 25, 2024
1 parent 826ae75 commit 8bbf855
Show file tree
Hide file tree
Showing 109 changed files with 48 additions and 50 deletions.
30 changes: 15 additions & 15 deletions 01Introduction/
Original file line number Diff line number Diff line change
Expand Up @@ -137,33 +137,33 @@

## 参考文献

2. [Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).](
1. [Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).](

3. [McCulloch, W.S., Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 115–133 (1943).](
2. [McCulloch, W.S., Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 115–133 (1943).](

4. [The perceptron - A perceiving and recognizing automaton. Rosenblatt, F. Technical Report 85-460-1, Cornell Aeronautical Laboratory, Ithaca, New York, January, 1957.](
3. [The perceptron - A perceiving and recognizing automaton. Rosenblatt, F. Technical Report 85-460-1, Cornell Aeronautical Laboratory, Ithaca, New York, January, 1957.](

5. [Bernard Widrow. (1960). “Adaptive "Adaline" Neuron Using Chemical "memistors".” Number Technical Report 1553-2. Stanford Electron. Labs. Stanford, CA](
4. [Bernard Widrow. (1960). “Adaptive "Adaline" Neuron Using Chemical "memistors".” Number Technical Report 1553-2. Stanford Electron. Labs. Stanford, CA](

6. [Minsky, M., Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. Cambridge, MA, USA: MIT Press.](
5. [Minsky, M., Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. Cambridge, MA, USA: MIT Press.](

7. [Werbos, Paul J.. “Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences.” (1974).](
6. [Werbos, Paul J.. “Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences.” (1974).](

8. [Rina Dechter. 1986. Learning while searching in constraint-satisfaction-problems. In Proceedings of the Fifth AAAI National Conference on Artificial Intelligence (AAAI'86). AAAI Press, 178–183.](
7. [Rina Dechter. 1986. Learning while searching in constraint-satisfaction-problems. In Proceedings of the Fifth AAAI National Conference on Artificial Intelligence (AAAI'86). AAAI Press, 178–183.](

9. [Y. LeCun et al., "Backpropagation Applied to Handwritten Zip Code Recognition," in Neural Computation, vol. 1, no. 4, pp. 541-551, Dec. 1989, doi: 10.1162/neco.1989.1.4.541.](
8. [Y. LeCun et al., "Backpropagation Applied to Handwritten Zip Code Recognition," in Neural Computation, vol. 1, no. 4, pp. 541-551, Dec. 1989, doi: 10.1162/neco.1989.1.4.541.](

10. [Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006 Jul 28;313(5786):504-7. doi: 10.1126/science.1127647. PMID: 16873662.](
9. [Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006 Jul 28;313(5786):504-7. doi: 10.1126/science.1127647. PMID: 16873662.](

11. [Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255).
10. [Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255).

12. [Dong Yu, Frank Seide, and Gang Li. 2012. Conversational speech transcription using context-dependent deep neural networks. In Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML'12). Omnipress, Madison, WI, USA, 1–2.](
11. [Dong Yu, Frank Seide, and Gang Li. 2012. Conversational speech transcription using context-dependent deep neural networks. In Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML'12). Omnipress, Madison, WI, USA, 1–2.](

13. [Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, and Andrew Y. Ng. 2012. Building high-level features using large scale unsupervised learning. In Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML'12). Omnipress, Madison, WI, USA, 507–514.](
12. [Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, and Andrew Y. Ng. 2012. Building high-level features using large scale unsupervised learning. In Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML'12). Omnipress, Madison, WI, USA, 507–514.](

14. [Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (June 2017), 84–90.](
13. [Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (June 2017), 84–90.](

15. [Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: an imperative style, high-performance deep learning library. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, Article 721, 8026–8037.](
14. [Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: an imperative style, high-performance deep learning library. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, Article 721, 8026–8037.](

16. [Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 265–283.](
15. [Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 265–283.](
10 changes: 5 additions & 5 deletions 05Framework/01Foundation/
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@



很多人会说神经网络只要网络模型足够深和足够宽,就可以拟合(fit)任意函数,这样的说法数学理论上靠谱吗?严格地说,神经网络并不是拟合任意函数,其数学理论建立在通用逼近定理(Universal approximation theorem)的基础之上:

Expand All @@ -57,7 +57,7 @@ $$ loss(w)=f(w)-g $$




Expand Down Expand Up @@ -115,19 +115,19 @@ $$ \frac{\partial loss}{\partial w_1} = {Loss}'(L_3, y) {sigmoid}'(w_3,L_2) {sig



### AI框架与程序结合



定义整个神经网络最终的损失函数为 $Loss$ 之后,AI框架会自动对损失函数求导(即对神经网络模型中各个参数求其偏导数)。




Expand Down
14 changes: 4 additions & 10 deletions 05Framework/02AutoDiff/
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,12 @@

- 分解程序为一系列已知微分规则的基础表达式组合,并使用高级语言的重载操作
- 在重载运算操作的过程中,根据已知微分规则给出各基础表达式的微分结果
- 根据基础表达式间的数据依赖关系,使用链式法则将微分结果组合完成程序的微分结果
- 根据基础表达式间的数据依赖关系,使用链式法则将微分结果组合完成程序的微分结果

## 具体实现


import numpy as np
Expand All @@ -39,15 +38,14 @@ import numpy as np

需要注意的是,操作符重载自动微分不像源码转换可以给出求导的公式,一般而言并不会给出求导公式,而是直接给出最后的求导值,所以就会有 dx 的出现。

class ADTangent:

# 自变量 x,对自变量进行求导得到的 dx
def __init__(self, x, dx):
self.x = x
self.dx = dx

# 重载 str 是为了方便打印的时候,看到输入的值和求导后的值
def __str__(self):
context = f'value:{self.x:.4f}, grad:{self.dx}'
Expand All @@ -58,7 +56,6 @@ class ADTangent:

其中值得注意的就是 dx 的计算,因为是正向自动微分,因此每一个前向的计算都会有对应的反向求导计算。求导的过程是这个程序的核心,不过不用担心的是这都是最基础的求导法则。最后返回自身的对象 ADTangent(x, dx)。

def __add__(self, other):
if isinstance(other, ADTangent):
Expand All @@ -74,7 +71,6 @@ class ADTangent:

下面则是对减号、乘法、log、sin几个操作进行操作符重载,正向的重载的过程比较简单,基本都是按照上面的 __add__ 的代码讨论来实现。

def __sub__(self, other):
if isinstance(other, ADTangent):
Expand Down Expand Up @@ -118,7 +114,6 @@ $$ f(x1,x2)=ln(x1)+x1x2−sin(x2) $$

由于这里是求 f 关于自变量 x 的导数,因此初始化数据的时候呢,自变量 x 的 dx 设置为1,而自变量 y 的 dx 设置为0。

x = ADTangent(x=2., dx=1)
y = ADTangent(x=5., dx=0)
Expand Down Expand Up @@ -172,7 +167,7 @@ class Fun(nn.Cell):
def construct(self, x, y):
f = ops.log(x) + x * y - ops.sin(y)
return f

x = Tensor(np.array([2.], np.float32))
y = Tensor(np.array([5.], np.float32))
f = Fun()(x, y)
Expand All @@ -182,7 +177,6 @@ grad = grad_all(Fun())(x, y)



Expand Down

0 comments on commit 8bbf855

Please sign in to comment.