update for 07Train/

chenzomi12 · Mar 25, 2024 · 8bbf855 · 8bbf855
1 parent 826ae75
commit 8bbf855
Show file tree

Hide file tree

Showing 109 changed files with 48 additions and 50 deletions.
diff --git a/01Introduction/01present.md b/01Introduction/01present.md
@@ -137,33 +137,33 @@
 
 ## 参考文献
 
-2. [Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961](https://www.nature.com/articles/nature16961)
+1. [Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961](https://www.nature.com/articles/nature16961)
 
-3. [McCulloch, W.S., Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 115–133 (1943).](https://link.springer.com/article/10.1007/BF02478259)
+2. [McCulloch, W.S., Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 115–133 (1943).](https://link.springer.com/article/10.1007/BF02478259)
 
-4. [The perceptron - A perceiving and recognizing automaton. Rosenblatt, F. Technical Report 85-460-1, Cornell Aeronautical Laboratory, Ithaca, New York, January, 1957.](https://bibbase.org/network/publication/rosenblatt-theperceptronaperceivingandrecognizingautomaton-1957)
+3. [The perceptron - A perceiving and recognizing automaton. Rosenblatt, F. Technical Report 85-460-1, Cornell Aeronautical Laboratory, Ithaca, New York, January, 1957.](https://bibbase.org/network/publication/rosenblatt-theperceptronaperceivingandrecognizingautomaton-1957)
 
-5. [Bernard Widrow. (1960). “Adaptive "Adaline" Neuron Using Chemical "memistors".” Number Technical Report 1553-2. Stanford Electron. Labs. Stanford, CA](https://www-isl.stanford.edu/~widrow/papers/t1960anadaptive.pdf)
+4. [Bernard Widrow. (1960). “Adaptive "Adaline" Neuron Using Chemical "memistors".” Number Technical Report 1553-2. Stanford Electron. Labs. Stanford, CA](https://www-isl.stanford.edu/~widrow/papers/t1960anadaptive.pdf)
 
-6. [Minsky, M., Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. Cambridge, MA, USA: MIT Press.](https://www.amazon.com/Perceptrons-Introduction-Computational-Geometry-Expanded/dp/0262631113)
+5. [Minsky, M., Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. Cambridge, MA, USA: MIT Press.](https://www.amazon.com/Perceptrons-Introduction-Computational-Geometry-Expanded/dp/0262631113)
 
-7. [Werbos, Paul J.. “Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences.” (1974).](https://books.google.com/books/about/Beyond_Regression.html?id=z81XmgEACAAJ)
+6. [Werbos, Paul J.. “Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences.” (1974).](https://books.google.com/books/about/Beyond_Regression.html?id=z81XmgEACAAJ)
 
-8. [Rina Dechter. 1986. Learning while searching in constraint-satisfaction-problems. In Proceedings of the Fifth AAAI National Conference on Artificial Intelligence (AAAI'86). AAAI Press, 178–183.](https://dl.acm.org/doi/abs/10.5555/2887770.2887799)
+7. [Rina Dechter. 1986. Learning while searching in constraint-satisfaction-problems. In Proceedings of the Fifth AAAI National Conference on Artificial Intelligence (AAAI'86). AAAI Press, 178–183.](https://dl.acm.org/doi/abs/10.5555/2887770.2887799)
 
-9. [Y. LeCun et al., "Backpropagation Applied to Handwritten Zip Code Recognition," in Neural Computation, vol. 1, no. 4, pp. 541-551, Dec. 1989, doi: 10.1162/neco.1989.1.4.541.](https://ieeexplore.ieee.org/document/6795724)
+8. [Y. LeCun et al., "Backpropagation Applied to Handwritten Zip Code Recognition," in Neural Computation, vol. 1, no. 4, pp. 541-551, Dec. 1989, doi: 10.1162/neco.1989.1.4.541.](https://ieeexplore.ieee.org/document/6795724)
 
-10. [Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006 Jul 28;313(5786):504-7. doi: 10.1126/science.1127647. PMID: 16873662.](https://www.science.org/doi/10.1126/science.1127647)
+9. [Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006 Jul 28;313(5786):504-7. doi: 10.1126/science.1127647. PMID: 16873662.](https://www.science.org/doi/10.1126/science.1127647)
 
-11. [Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255).
+10. [Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255).
     ](https://image-net.org/)
 
-12. [Dong Yu, Frank Seide, and Gang Li. 2012. Conversational speech transcription using context-dependent deep neural networks. In Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML'12). Omnipress, Madison, WI, USA, 1–2.](https://dl.acm.org/doi/10.5555/3042573.3042574)
+11. [Dong Yu, Frank Seide, and Gang Li. 2012. Conversational speech transcription using context-dependent deep neural networks. In Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML'12). Omnipress, Madison, WI, USA, 1–2.](https://dl.acm.org/doi/10.5555/3042573.3042574)
 
-13. [Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, and Andrew Y. Ng. 2012. Building high-level features using large scale unsupervised learning. In Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML'12). Omnipress, Madison, WI, USA, 507–514.](https://dl.acm.org/doi/10.5555/3042573.3042641)
+12. [Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, and Andrew Y. Ng. 2012. Building high-level features using large scale unsupervised learning. In Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML'12). Omnipress, Madison, WI, USA, 507–514.](https://dl.acm.org/doi/10.5555/3042573.3042641)
 
-14. [Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (June 2017), 84–90. https://doi.org/10.1145/3065386](https://dl.acm.org/doi/10.1145/3065386)
+13. [Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (June 2017), 84–90. https://doi.org/10.1145/3065386](https://dl.acm.org/doi/10.1145/3065386)
 
-15. [Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: an imperative style, high-performance deep learning library. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, Article 721, 8026–8037.](https://dl.acm.org/doi/10.5555/3454287.3455008)
+14. [Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: an imperative style, high-performance deep learning library. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, Article 721, 8026–8037.](https://dl.acm.org/doi/10.5555/3454287.3455008)
 
-16. [Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 265–283.](https://dl.acm.org/doi/10.5555/3026877.3026899)
+15. [Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 265–283.](https://dl.acm.org/doi/10.5555/3026877.3026899)
diff --git a/05Framework/01Foundation/02.fundamentals.md b/05Framework/01Foundation/02.fundamentals.md
@@ -36,7 +36,7 @@
 
 直观地看下一个简单的例子：假设1个圆圈代表一个神经元，那么一个神经元可模拟“与或非”3种运算，3个神经元组成包含1个隐层的神经网络即可以模拟异或运算。因此，理论上，如此组合的神经网络可模拟任意组合的逻辑函数。
 
-![神经元表示与或非](../images/021FW_Foundation/function_apprcimate.png)
+![神经元表示与或非](images/021FW_Foundation/function_apprcimate.png)
 
 很多人会说神经网络只要网络模型足够深和足够宽，就可以拟合（fit）任意函数，这样的说法数学理论上靠谱吗？严格地说，神经网络并不是拟合任意函数，其数学理论建立在通用逼近定理（Universal approximation theorem）的基础之上：
 
@@ -57,7 +57,7 @@ $$ loss(w)=f(w)-g $$
 
 深度学习一般流程是：1）构建神经网络模型，2）定义损失函数和优化器（优化目标），3）开始训练神经网络模型（计算梯度并更新网络模型中的权重参数），4）最后验证精度，其流程如下图所示，前三步最为重要。
 
-![深度学习构建流程](../images/021FW_Foundation/deeplearning02.png)
+![深度学习构建流程](images/021FW_Foundation/deeplearning02.png)
 
 因为AI框架已经帮我们封装好了许多功能，所以遇到神经网络模型的精度不达标，算法工程师可以调整网络模型结构、调节损失函数、优化器等参数重新训练，不断地测试验证精度，因此很多人戏称算法工程师又是“调参工程师”。
 
@@ -115,19 +115,19 @@ $$ \frac{\partial loss}{\partial w_1} = {Loss}'(L_3, y) {sigmoid}'(w_3,L_2) {sig
 
  这里的反向，指的是图中的反向箭头，每一次对损失函数中的参数进行求导，都会复用前一次的计算结果和与其对称的原公式中的变量，更方便地对复合函数进行求导。
 
-![神经网络计算流程](../images/021FW_Foundation/deeplearning03.png)
+![神经网络计算流程](images/021FW_Foundation/deeplearning03.png)
 
 ### AI框架与程序结合
 
 下面左图的公式是神经网络表示的复合函数表示，蓝色框框表示的是AI框架，AI框架给开发者提供构建神经网络模型的数学操作，AI框架把复杂的数学表达，转换成计算机可识别的计算图。
 
-![神经网络表示到AI框架](../images/021FW_Foundation/deeplearning05.png)
+![神经网络表示到AI框架](images/021FW_Foundation/deeplearning05.png)
 
 定义整个神经网络最终的损失函数为 $Loss$ 之后，AI框架会自动对损失函数求导（即对神经网络模型中各个参数求其偏导数）。
 
 上面提到过，每一次求导都会复用前一次的计算结果和与其对称的原公式中的变量。那么干脆直接基于表示神经网络的计算图计的基础之上，构建一个与之对称的计算图（反向计算图）。通过反向计算图表示神经网络模型中的偏导数，反向传播则是对链式求导法则的展开。
 
-![AI框架自动微分](../images/021FW_Foundation/deeplearning07.png)
+![AI框架自动微分](images/021FW_Foundation/deeplearning07.png)
 
 通过损失函数对神经网络模型进行求导，训练过程中更新网络模型中的参数（函数逼近的过程），使得损失函数的值越来越小（表示网络模型的表现越好）。这一过程，只要你定义好网络AI框架都会主动地帮我们完成。
 

diff --git a/05Framework/02AutoDiff/05.forward_mode.md b/05Framework/02AutoDiff/05.forward_mode.md
@@ -24,13 +24,12 @@
 
 - 分解程序为一系列已知微分规则的基础表达式组合，并使用高级语言的重载操作
 - 在重载运算操作的过程中，根据已知微分规则给出各基础表达式的微分结果
--  根据基础表达式间的数据依赖关系，使用链式法则将微分结果组合完成程序的微分结果
+- 根据基础表达式间的数据依赖关系，使用链式法则将微分结果组合完成程序的微分结果
 
 ## 具体实现
 
 首先呢，我们需要加载通用的numpy库，用于实际运算的，如果不用numpy，在python中也可以使用math来代替。
 
-
 ```python
 import numpy as np
 ```
@@ -39,15 +38,14 @@ import numpy as np
 
 需要注意的是，操作符重载自动微分不像源码转换可以给出求导的公式，一般而言并不会给出求导公式，而是直接给出最后的求导值，所以就会有 dx 的出现。
 
-
 ```python
 class ADTangent:
-    
+
     # 自变量 x，对自变量进行求导得到的 dx
     def __init__(self, x, dx):
         self.x = x
         self.dx = dx
-    
+
     # 重载 str 是为了方便打印的时候，看到输入的值和求导后的值
     def __str__(self):
         context = f'value:{self.x:.4f}, grad:{self.dx}'
@@ -58,7 +56,6 @@ class ADTangent:
 
 其中值得注意的就是 dx 的计算，因为是正向自动微分，因此每一个前向的计算都会有对应的反向求导计算。求导的过程是这个程序的核心，不过不用担心的是这都是最基础的求导法则。最后返回自身的对象 ADTangent(x, dx)。
 
-
 ```python
     def __add__(self, other):
         if isinstance(other, ADTangent):
@@ -74,7 +71,6 @@ class ADTangent:
 
 下面则是对减号、乘法、log、sin几个操作进行操作符重载，正向的重载的过程比较简单，基本都是按照上面的 __add__ 的代码讨论来实现。
 
-
 ```python
     def __sub__(self, other):
         if isinstance(other, ADTangent):
@@ -118,7 +114,6 @@ $$ f(x1,x2)=ln(x1)+x1x2−sin(x2) $$
 
 由于这里是求 f 关于自变量 x 的导数，因此初始化数据的时候呢，自变量 x 的 dx 设置为1，而自变量 y 的 dx 设置为0。
 
-
 ```python
 x = ADTangent(x=2., dx=1)
 y = ADTangent(x=5., dx=0)
@@ -172,7 +167,7 @@ class Fun(nn.Cell):
     def construct(self, x, y):
         f = ops.log(x) + x * y - ops.sin(y)
         return f
-    
+
 x = Tensor(np.array([2.], np.float32))
 y = Tensor(np.array([5.], np.float32))
 f = Fun()(x, y)
@@ -182,7 +177,6 @@ grad = grad_all(Fun())(x, y)
 
 print(f)
 print(grad[0])
-
 ```
 
 输出结果：