Reading log: Engineering Deep Learning from scratch

Table of Contents


I read the following book to study a fundamental of Deep Learning.

ゼロから作るDeep Learning ―Pythonで学ぶディープラーニングの理論と実装

ゼロから作るDeep Learning ―Pythonで学ぶディープラーニングの理論と実装

This article is about my thoughts on this book and a memo about the contents.

My thoughts

This book explains about perceptron, neural network and deep learning.
In this book, there are not only the sentences but also diagrams and numerical expressions.
And then, how to create a sample code by Python is introduced. These codes are written as a function or class.
These codes can be used as a module by importing in our own Python codes repeatedly. This is very useful.
All of explanations are not difficult for us to understand. Before reading this book, if you had some knowledge about machine learning, it would be easier for you to understand.
I think that the most useful section is "technique of learning". In this section, multiple methods to optimize a parameter are compared. I can understand the differences between each method.


  • An problem of linear function is what we can not make use of an advantage of overlapping layer.
  • ReLU(Rectified Linear Unit) function is to output same value as input if the input was over than 0. When the input is under than 0, the output is limited until 0.
  • When we calculate a multi-dimensional array, we can use inner product to calculate the array at once.
  • We need to take measures preventing from overflow when we use a softmax function. Usually, the maximum value of input signal is used for it.
  • An final target of machine learning is to get a generalization performance. It is an ability for unknown data.
  • We should not use a recognition accuracy as an index because the differential is 0 at almost point and a parameter can not be update.
  • Epoch is expressed as unit. "1 epoch" means the iterating number of running out of training data.
  • "Local calculation" means that next result is output depend on information related with only itself.
  • Numerical differentiation is used practically because we can confirm that an implementation of back propagation is correct or not.
  • We can make a function module easily by implementing a class for optimization.
  • Dropout is a method to learn in deleting a neuron randomly.
  • In searching a hyper parameter, we need to narrow down the range of parameters and select one of them from the range.
  • Pooling is to make a size of space smaller in lon-lat direction.
  • One of advantage to make the layers deepen is to decrease the number of parameters of network.