CNN - Part 2

AlexNet (2012)

https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  • "ImageNet Classification with Deep Convolutional Neural Networks"
  • Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton created a “large, deep convolutional neural network” that was used to win the 2012 ILSVRC (ImageNet Large-Scale Visual Recognition Challenge).
  • 2012 marked the first year where a CNN was used to achieve a top 5 test error rate of 15.4%
  • The next best entry achieved an error of 26.2%, which was an astounding improvement that pretty much shocked the computer vision community.
  • 1000 categories
  • Trained the network on ImageNet data, which contained over 15 million annotated images from a total of over 22,000 categories.
  • Used ReLU for the nonlinearity functions.
  • Used data augmentation techniques that consisted of image translations, horizontal reflections, and patch extractions.
  • Implemented dropout layers in order to combat the problem of overfitting to the training data.
  • Trained the model using batch stochastic gradient descent, with specific values for momentum and weight decay.
  • Trained on two GTX 580 GPUs for five to six days.

The neural network developed by Krizhevsky, Sutskever, and Hinton in 2012 was the coming out party for CNNs in the computer vision community. This was the first time a model performed so well on a historically difficult ImageNet dataset. Utilizing techniques that are still used today, such as data augmentation and dropout, this paper really illustrated the benefits of CNNs and backed them up with record breaking performance in the competition.

ZF Net (2013)

https://arxiv.org/pdf/1311.2901v3.pdf

  • “Visualizing and Understanding Convolutional Neural Networks”
  • Matthew Zeiler and Rob Fergus
  • 11.2% error rate
  • fine tuning to the previous AlexNet structure
  • explaining a lot of the intuition behind ConvNets and showing how to visualize the filters and weights correctly

  • Very similar architecture to AlexNet, except for a few minor modifications.
  • AlexNet trained on 15 million images, while ZF Net trained on only 1.3 million images.
  • Instead of using 11x11 sized filters in the first layer (which is what AlexNet implemented), ZF Net used filters of size 7x7 and a decreased stride value. The reasoning behind this modification is that a smaller filter size in the first conv layer helps retain a lot of original pixel information in the input volume. A filtering of size 11x11 proved to be skipping a lot of relevant information, especially as this is the first conv layer.
  • As the network grows, we also see a rise in the number of filters used.
  • Used ReLUs for their activation functions, cross-entropy loss for the error function, and trained using batch stochastic gradient descent.
  • Trained on a GTX 580 GPU for twelve days.
  • Developed a visualization technique named Deconvolutional Network, which helps to examine different feature activations and their relation to the input space. Called “deconvnet” because it maps features to pixels (the opposite of what a convolutional layer does).

DeConvNet

The basic idea behind how this works is that at every layer of the trained CNN, you attach a “deconvnet” which has a path back to the image pixels. An input image is fed into the CNN and activations are computed at each level. This is the forward pass. Now, let’s say we want to examine the activations of a certain feature in the 4th conv layer. We would store the activations of this one feature map, but set all of the other activations in the layer to 0, and then pass this feature map as the input into the deconvnet. This deconvnet has the same filters as the original CNN. This input then goes through a series of unpool (reverse maxpooling), rectify, and filter operations for each preceding layer until the input space is reached.

VGG Net (2014)

https://arxiv.org/pdf/1409.1556v6.pdf

  • "VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION"
  • Karen Simonyan and Andrew Zisserman
  • 7.3% error rate.
  • Simplicity and depth. 19 layer CNN that strictly used 3x3 filters with stride and pad of 1, along with 2x2 maxpooling layers with stride 2.
  • The use of only 3x3 sized filters is quite different from AlexNet’s 11x11 filters in the first layer and ZF Net’s 7x7 filters. The authors’ reasoning is that the combination of two 3x3 conv layers has an effective receptive field of 5x5. This in turn simulates a larger filter while keeping the benefits of smaller filter sizes. One of the benefits is a decrease in the number of parameters. Also, with two conv layers, we’re able to use two ReLU layers instead of one.
  • 3 conv layers back to back have an effective receptive field of 7x7.
  • As the spatial size of the input volumes at each layer decrease (result of the conv and pool layers), the depth of the volumes increase due to the increased number of filters as you go down the network.
  • Interesting to notice that the number of filters doubles after each maxpool layer. This reinforces the idea of shrinking spatial dimensions, but growing depth.
  • Worked well on both image classification and localization tasks. The authors used a form of localization as regression (see page 10 of the paper for all details).
  • Built model with the Caffe toolbox.
  • Used scale jittering as one data augmentation technique during training.
  • Used ReLU layers after each conv layer and trained with batch gradient descent.
  • Trained on 4 Nvidia Titan Black GPUs for two to three weeks.

GoogLeNet (2015)

https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf

Inception Module

The bottom green box is our input and the top one is the output of the model (Turning this picture right 90 degrees would let you visualize the model in relation to the last picture which shows the full network). Basically, at each layer of a traditional ConvNet, you have to make a choice of whether to have a pooling operation or a conv operation (there is also the choice of filter size). What an Inception module allows you to do is perform all of these operations in parallel. In fact, this was exactly the “naïve” idea that the authors came up with.

Now, why doesn’t this work? It would lead to way too many outputs. We would end up with an extremely large depth channel for the output volume. The way that the authors address this is by adding 1x1 conv operations before the 3x3 and 5x5 layers. The 1x1 convolutions (or network in network layer) provide a method of dimensionality reduction. For example, let’s say you had an input volume of 100x100x60 (This isn’t necessarily the dimensions of the image, just the input to any layer of the network). Applying 20 filters of 1x1 convolution would allow you to reduce the volume to 100x100x20. This means that the 3x3 and 5x5 convolutions won’t have as large of a volume to deal with. This can be thought of as a “pooling of features” because we are reducing the depth of the volume, similar to how we reduce the dimensions of height and width with normal maxpooling layers. Another note is that these 1x1 conv layers are followed by ReLU units which definitely can’t hurt (See Aaditya Prakash’s great post for more info on the effectiveness of 1x1 convolutions). Check out this video for a great visualization of the filter concatenation at the end.

You may be asking yourself “How does this architecture help?”. Well, you have a module that consists of a network in network layer, a medium sized filter convolution, a large sized filter convolution, and a pooling operation. The network in network conv is able to extract information about the very fine grain details in the volume, while the 5x5 filter is able to cover a large receptive field of the input, and thus able to extract its information as well. You also have a pooling operation that helps to reduce spatial sizes and combat overfitting. On top of all of that, you have ReLUs after each conv layer, which help improve the nonlinearity of the network. Basically, the network is able to perform the functions of these different operations while still remaining computationally considerate. The paper does also give more of a high level reasoning that involves topics like sparsity and dense connections.

Main Points

  • Used 9 Inception modules in the whole architecture, with over 100 layers in total! Now that is deep…
  • No use of fully connected layers! They use an average pool instead, to go from a 7x7x1024 volume to a 1x1x1024 volume. This saves a huge number of parameters.
  • Uses 12x fewer parameters than AlexNet.
  • During testing, multiple crops of the same image were created, fed into the network, and the softmax probabilities were averaged to give us the final solution.
  • Utilized concepts from R-CNN for their detection model.
  • There are updated versions to the Inception module (Versions 6 and 7).
  • Trained on “a few high-end GPUs within a week”.

Why It’s Important

GoogLeNet was one of the first models that introduced the idea that CNN layers didn’t always have to be stacked up sequentially. Coming up with the Inception module, the authors showed that a creative structuring of layers can lead to improved performance and computationally efficiency. This paper has really set the stage for some amazing architectures that we could see in the coming years.

Microsoft ResNet (2015)

https://arxiv.org/pdf/1512.03385v1.pdf

  • "Deep Residual Learning for Image Recognition"
  • 152 layer architecture
  • Set new records in classification, detection, and localization through one incredible architecture.
  • ILSVRC 2015 with an incredible error rate of 3.6% (Depending on their skill and expertise, humans generally hover around a 5-10% error rate.

Residual Block

The idea behind a residual block is that you have your input x go through conv-relu-conv series. This will give you some F(x). That result is then added to the original input x. Let’s call that H(x) = F(x) + x. In traditional CNNs, your H(x) would just be equal to F(x) right? So, instead of just computing that transformation (straight from x to F(x)), we’re computing the term that you have to add, F(x), to your input, x. Basically, the mini module shown below is computing a “delta” or a slight change to the original input x to get a slightly altered representation (When we think of traditional CNNs, we go from x to F(x) which is a completely new representation that doesn’t keep any information about the original x). The authors believe that “it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping”.

Another reason for why this residual block might be effective is that during the backward pass of backpropagation, the gradient will flow easily through the graph because we have addition operations, which distributes the gradient.

Main Points

  • “Ultra-deep” – Yann LeCun.
  • 152 layers…
  • Interesting note that after only the first 2 layers, the spatial size gets compressed from an input volume of 224x224 to a 56x56 volume.
  • Authors claim that a naïve increase of layers in plain nets result in higher training and test error (Figure 1 in the paper).
  • The group tried a 1202-layer network, but got a lower test accuracy, presumably due to overfitting.
  • Trained on an 8 GPU machine for two to three weeks.

Object Localization

  • Additional 4 outputs: x, y, width, height
  • Custom loss function using softmax and regression

Generating Image Descriptions (2014)

https://arxiv.org/pdf/1412.2306v2.pdf

  • "Deep Visual-Semantic Alignments for Generating Image Descriptions"
  • Andrej Karpathy, Li Fei-Fei
  • Combining CNN & bi-directional LSTM to generate natural language descriptions of different image regions.
  • Input an image - outputs a text description.

Generative Adversarial Networks

https://arxiv.org/pdf/1312.6199v4.pdf

  • Intriguing properties of neural networks (2014)

In [11]:
!wget https://i.ytimg.com/vi/SNggmeilXDQ/maxresdefault.jpg
--2018-01-09 13:15:34--  https://i.ytimg.com/vi/SNggmeilXDQ/maxresdefault.jpg
Resolving i.ytimg.com (i.ytimg.com)... 209.85.202.101, 209.85.202.138, 209.85.202.139, ...
Connecting to i.ytimg.com (i.ytimg.com)|209.85.202.101|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 118853 (116K) [image/jpeg]
Saving to: ‘maxresdefault.jpg.1’

maxresdefault.jpg.1 100%[===================>] 116.07K  --.-KB/s    in 0.03s   

2018-01-09 13:15:34 (4.27 MB/s) - ‘maxresdefault.jpg.1’ saved [118853/118853]

Class Prediction using ResNet

In [12]:
from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np

model = ResNet50(weights='imagenet')

img_path = 'maxresdefault.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
Downloading data from https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json
24576/35363 [===================>..........] - ETA: 0sPredicted: [('n01871265', 'tusker', 0.55347723), ('n02504013', 'Indian_elephant', 0.3473224), ('n02504458', 'African_elephant', 0.093237929)]

Extract features from an arbitrary intermediate layer with VGG19

In [19]:
from keras.applications.vgg19 import VGG19
from keras.preprocessing import image
from keras.applications.vgg19 import preprocess_input
from keras.models import Model
import numpy as np

base_model = VGG19(weights='imagenet')
base_model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_6 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv4 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv4 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv4 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 143,667,240
Trainable params: 143,667,240
Non-trainable params: 0
_________________________________________________________________
In [20]:
model = Model(inputs=base_model.input, outputs=base_model.get_layer('block4_pool').output)

img_path = 'maxresdefault.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

block4_pool_features = model.predict(x)
In [21]:
print(block4_pool_features.shape)
print(block4_pool_features)
(1, 14, 14, 512)
[[[[    0.             0.           122.54925537 ...,   177.70880127
      531.36535645     0.        ]
   [  139.80073547   105.62182617     0.         ...,     0.            28.69142914
        0.        ]
   [  218.14257812     0.             0.         ...,     0.           320.17297363
        0.        ]
   ..., 
   [  323.05130005     0.             0.         ...,     0.             0.
        0.        ]
   [  505.80566406     0.             0.         ...,     0.             0.
        0.        ]
   [   68.26963806     0.             0.         ...,     0.           363.43035889
        0.        ]]

  [[   43.93622971     0.            20.30366325 ...,     0.           317.33456421
        0.        ]
   [  352.26470947     0.             0.         ...,     0.           208.03858948
        0.        ]
   [  514.26416016     0.            64.01551819 ...,     0.           491.24716187
        0.        ]
   ..., 
   [  251.37638855     0.            64.99979401 ...,  1078.25561523
      153.43206787     0.        ]
   [  469.35650635     0.             7.40624285 ...,     0.           440.97027588
        0.        ]
   [  169.08496094     0.             0.         ...,     0.             0.
        0.        ]]

  [[    0.             0.             0.         ...,     0.             0.
      210.23703003]
   [   90.42609406     0.            61.89510345 ...,   114.12887573     0.
       89.93547821]
   [  163.7958374     94.26544189    66.4033432  ...,   104.34303284     0.
      125.18013   ]
   ..., 
   [    0.             0.             0.         ...,   491.42773438     0.
        0.        ]
   [  231.61131287     0.             0.         ...,   104.05222321     0.
        0.        ]
   [  321.99502563     0.             0.         ...,     0.             0.
        0.        ]]

  ..., 
  [[    0.             0.             0.         ...,     0.             0.
        0.        ]
   [    0.             0.           224.06518555 ...,     0.             0.
        0.        ]
   [   57.30466843     0.             0.         ...,     0.             0.
      104.63116455]
   ..., 
   [    0.            52.42150497     0.         ...,     0.             0.
        0.        ]
   [    0.             0.             0.         ...,     0.             0.
        0.        ]
   [  209.14796448     0.             0.         ...,     0.             0.
        0.        ]]

  [[    0.             0.           109.9282608  ...,     0.             0.
      217.73460388]
   [    0.             0.             0.         ...,    74.03837585     0.
        0.        ]
   [    0.             0.             0.         ...,   375.40673828     0.
        0.        ]
   ..., 
   [    0.             0.             0.         ...,   153.57034302     0.
        0.        ]
   [    0.             0.             0.         ...,   861.41790771     0.
        0.        ]
   [  501.52288818     0.             0.         ...,     0.             0.
        0.        ]]

  [[    0.           422.22711182     0.         ...,   100.96258545
      363.83013916     0.        ]
   [    0.           548.48620605     0.         ...,   170.02937317
      348.87850952     0.        ]
   [    0.           675.27142334    52.00751495 ...,     0.             0.
        0.        ]
   ..., 
   [    0.             0.             0.         ...,   455.44384766     0.
        0.        ]
   [    0.             0.             0.         ...,   746.07312012     0.
      192.56748962]
   [  376.0262146     55.38940811     0.         ...,     0.             0.
      204.31973267]]]]

Fine-tune InceptionV3 on a new set of classes

In [22]:
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K

# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.5/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
87654400/87910968 [============================>.] - ETA: 0s
In [23]:
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)

# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)

# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)

# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
In [25]:
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
    layer.trainable = False
    
# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
In [27]:
# train the model on the new data for a few epochs
# model.fit_generator(...)

# at this point, the top layers are well trained and we can start fine-tuning
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers.
In [28]:
# let's visualize layer names and layer indices to see how many layers
# we should freeze:
for i, layer in enumerate(base_model.layers):
    print(i, layer.name)
0 input_7
1 conv2d_1
2 batch_normalization_1
3 activation_99
4 conv2d_2
5 batch_normalization_2
6 activation_100
7 conv2d_3
8 batch_normalization_3
9 activation_101
10 max_pooling2d_3
11 conv2d_4
12 batch_normalization_4
13 activation_102
14 conv2d_5
15 batch_normalization_5
16 activation_103
17 max_pooling2d_4
18 conv2d_9
19 batch_normalization_9
20 activation_107
21 conv2d_7
22 conv2d_10
23 batch_normalization_7
24 batch_normalization_10
25 activation_105
26 activation_108
27 average_pooling2d_1
28 conv2d_6
29 conv2d_8
30 conv2d_11
31 conv2d_12
32 batch_normalization_6
33 batch_normalization_8
34 batch_normalization_11
35 batch_normalization_12
36 activation_104
37 activation_106
38 activation_109
39 activation_110
40 mixed0
41 conv2d_16
42 batch_normalization_16
43 activation_114
44 conv2d_14
45 conv2d_17
46 batch_normalization_14
47 batch_normalization_17
48 activation_112
49 activation_115
50 average_pooling2d_2
51 conv2d_13
52 conv2d_15
53 conv2d_18
54 conv2d_19
55 batch_normalization_13
56 batch_normalization_15
57 batch_normalization_18
58 batch_normalization_19
59 activation_111
60 activation_113
61 activation_116
62 activation_117
63 mixed1
64 conv2d_23
65 batch_normalization_23
66 activation_121
67 conv2d_21
68 conv2d_24
69 batch_normalization_21
70 batch_normalization_24
71 activation_119
72 activation_122
73 average_pooling2d_3
74 conv2d_20
75 conv2d_22
76 conv2d_25
77 conv2d_26
78 batch_normalization_20
79 batch_normalization_22
80 batch_normalization_25
81 batch_normalization_26
82 activation_118
83 activation_120
84 activation_123
85 activation_124
86 mixed2
87 conv2d_28
88 batch_normalization_28
89 activation_126
90 conv2d_29
91 batch_normalization_29
92 activation_127
93 conv2d_27
94 conv2d_30
95 batch_normalization_27
96 batch_normalization_30
97 activation_125
98 activation_128
99 max_pooling2d_5
100 mixed3
101 conv2d_35
102 batch_normalization_35
103 activation_133
104 conv2d_36
105 batch_normalization_36
106 activation_134
107 conv2d_32
108 conv2d_37
109 batch_normalization_32
110 batch_normalization_37
111 activation_130
112 activation_135
113 conv2d_33
114 conv2d_38
115 batch_normalization_33
116 batch_normalization_38
117 activation_131
118 activation_136
119 average_pooling2d_4
120 conv2d_31
121 conv2d_34
122 conv2d_39
123 conv2d_40
124 batch_normalization_31
125 batch_normalization_34
126 batch_normalization_39
127 batch_normalization_40
128 activation_129
129 activation_132
130 activation_137
131 activation_138
132 mixed4
133 conv2d_45
134 batch_normalization_45
135 activation_143
136 conv2d_46
137 batch_normalization_46
138 activation_144
139 conv2d_42
140 conv2d_47
141 batch_normalization_42
142 batch_normalization_47
143 activation_140
144 activation_145
145 conv2d_43
146 conv2d_48
147 batch_normalization_43
148 batch_normalization_48
149 activation_141
150 activation_146
151 average_pooling2d_5
152 conv2d_41
153 conv2d_44
154 conv2d_49
155 conv2d_50
156 batch_normalization_41
157 batch_normalization_44
158 batch_normalization_49
159 batch_normalization_50
160 activation_139
161 activation_142
162 activation_147
163 activation_148
164 mixed5
165 conv2d_55
166 batch_normalization_55
167 activation_153
168 conv2d_56
169 batch_normalization_56
170 activation_154
171 conv2d_52
172 conv2d_57
173 batch_normalization_52
174 batch_normalization_57
175 activation_150
176 activation_155
177 conv2d_53
178 conv2d_58
179 batch_normalization_53
180 batch_normalization_58
181 activation_151
182 activation_156
183 average_pooling2d_6
184 conv2d_51
185 conv2d_54
186 conv2d_59
187 conv2d_60
188 batch_normalization_51
189 batch_normalization_54
190 batch_normalization_59
191 batch_normalization_60
192 activation_149
193 activation_152
194 activation_157
195 activation_158
196 mixed6
197 conv2d_65
198 batch_normalization_65
199 activation_163
200 conv2d_66
201 batch_normalization_66
202 activation_164
203 conv2d_62
204 conv2d_67
205 batch_normalization_62
206 batch_normalization_67
207 activation_160
208 activation_165
209 conv2d_63
210 conv2d_68
211 batch_normalization_63
212 batch_normalization_68
213 activation_161
214 activation_166
215 average_pooling2d_7
216 conv2d_61
217 conv2d_64
218 conv2d_69
219 conv2d_70
220 batch_normalization_61
221 batch_normalization_64
222 batch_normalization_69
223 batch_normalization_70
224 activation_159
225 activation_162
226 activation_167
227 activation_168
228 mixed7
229 conv2d_73
230 batch_normalization_73
231 activation_171
232 conv2d_74
233 batch_normalization_74
234 activation_172
235 conv2d_71
236 conv2d_75
237 batch_normalization_71
238 batch_normalization_75
239 activation_169
240 activation_173
241 conv2d_72
242 conv2d_76
243 batch_normalization_72
244 batch_normalization_76
245 activation_170
246 activation_174
247 max_pooling2d_6
248 mixed8
249 conv2d_81
250 batch_normalization_81
251 activation_179
252 conv2d_78
253 conv2d_82
254 batch_normalization_78
255 batch_normalization_82
256 activation_176
257 activation_180
258 conv2d_79
259 conv2d_80
260 conv2d_83
261 conv2d_84
262 average_pooling2d_8
263 conv2d_77
264 batch_normalization_79
265 batch_normalization_80
266 batch_normalization_83
267 batch_normalization_84
268 conv2d_85
269 batch_normalization_77
270 activation_177
271 activation_178
272 activation_181
273 activation_182
274 batch_normalization_85
275 activation_175
276 mixed9_0
277 concatenate_1
278 activation_183
279 mixed9
280 conv2d_90
281 batch_normalization_90
282 activation_188
283 conv2d_87
284 conv2d_91
285 batch_normalization_87
286 batch_normalization_91
287 activation_185
288 activation_189
289 conv2d_88
290 conv2d_89
291 conv2d_92
292 conv2d_93
293 average_pooling2d_9
294 conv2d_86
295 batch_normalization_88
296 batch_normalization_89
297 batch_normalization_92
298 batch_normalization_93
299 conv2d_94
300 batch_normalization_86
301 activation_186
302 activation_187
303 activation_190
304 activation_191
305 batch_normalization_94
306 activation_184
307 mixed9_1
308 concatenate_2
309 activation_192
310 mixed10
In [29]:
# we chose to train the top 2 inception blocks, i.e. we will freeze
# the first 249 layers and unfreeze the rest:
for layer in model.layers[:249]:
    layer.trainable = False
for layer in model.layers[249:]:
    layer.trainable = True

# we need to recompile the model for these modifications to take effect
# we use SGD with a low learning rate
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')

# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
# model.fit_generator(...)

Project suggestions:

  • Image noice cleaning - сравнение на различни модели.
  • AutoML - генериране на модели.
  • Генериране на текстове за новини на български по зададени ключови думи. Генериране на текствое за песни.
  • Генериране на музика. Генериране на изображения (style transfer)
  • Spell, grammatical checker.
  • Document summarisation.
  • Nice visualization projects.
  • Data extraction.
  • Kaggle Competition.