【python实现卷积神经网络】卷积层Conv2D实现（带stride、padding）-低调大师

【python实现卷积神经网络】卷积层Conv2D实现（带stride、padding）

2020-04-14 580

【python实现卷积神经网络】卷积层Conv2D实现（带stride、padding）

关于卷积操作是如何进行的就不必多说了，结合代码一步一步来看卷积层是怎么实现的。

代码来源：https://github.com/eriklindernoren/ML-From-Scratch

先看一下其基本的组件函数，首先是determine_padding(filter_shape, output_shape="same")：

def determine_padding(filter_shape, output_shape="same"):

# No padding
if output_shape == "valid":
    return (0, 0), (0, 0)
# Pad so that the output shape is the same as input shape (given that stride=1)
elif output_shape == "same":
    filter_height, filter_width = filter_shape

    # Derived from:
    # output_height = (height + pad_h - filter_height) / stride + 1
    # In this case output_height = height and stride = 1. This gives the
    # expression for the padding below.
    pad_h1 = int(math.floor((filter_height - 1)/2))
    pad_h2 = int(math.ceil((filter_height - 1)/2))
    pad_w1 = int(math.floor((filter_width - 1)/2))
    pad_w2 = int(math.ceil((filter_width - 1)/2))

    return (pad_h1, pad_h2), (pad_w1, pad_w2)

说明：根据卷积核的形状以及padding的方式来计算出padding的值，包括上、下、左、右，其中out_shape=valid表示不填充。

补充：

math.floor(x)表示返回小于或等于x的最大整数。
math.ceil(x)表示返回大于或等于x的最大整数。
带入实际的参数来看下输出：

pad_h,pad_w=determine_padding((3,3), output_shape="same")
输出：(1,1),(1,1)

然后是image_to_column(images, filter_shape, stride, output_shape='same')函数

def image_to_column(images, filter_shape, stride, output_shape='same'):

filter_height, filter_width = filter_shape
pad_h, pad_w = determine_padding(filter_shape, output_shape)# Add padding to the image
images_padded = np.pad(images, ((0, 0), (0, 0), pad_h, pad_w), mode='constant')# Calculate the indices where the dot products are to be applied between weights
# and the image
k, i, j = get_im2col_indices(images.shape, filter_shape, (pad_h, pad_w), stride)

# Get content from image at those indices
cols = images_padded[:, k, i, j]
channels = images.shape[1]
# Reshape content into column shape
cols = cols.transpose(1, 2, 0).reshape(filter_height * filter_width * channels, -1)
return cols

说明：输入的images的形状是[batchsize,channel,height,width]，类似于pytorch的图像格式的输入。也就是说images_padded是在height和width上进行padding的。在其中调用了get_im2col_indices()函数，那我们接下来看看它是个什么样子的：

def get_im2col_indices(images_shape, filter_shape, padding, stride=1):

# First figure out what the size of the output should be
batch_size, channels, height, width = images_shape
filter_height, filter_width = filter_shape
pad_h, pad_w = padding
out_height = int((height + np.sum(pad_h) - filter_height) / stride + 1)
out_width = int((width + np.sum(pad_w) - filter_width) / stride + 1)

i0 = np.repeat(np.arange(filter_height), filter_width)
i0 = np.tile(i0, channels)
i1 = stride * np.repeat(np.arange(out_height), out_width)
j0 = np.tile(np.arange(filter_width), filter_height * channels)
j1 = stride * np.tile(np.arange(out_width), out_height)
i = i0.reshape(-1, 1) + i1.reshape(1, -1)
j = j0.reshape(-1, 1) + j1.reshape(1, -1)
k = np.repeat(np.arange(channels), filter_height * filter_width).reshape(-1, 1)return (k, i, j)

说明：单独看很难理解，我们还是带着带着实际的参数一步步来看。

get_im2col_indices((1,3,32,32), (3,3), ((1,1),(1,1)), stride=1)
说明：看一下每一个变量的变化情况，out_width和out_height就不多说，是卷积之后的输出的特征图的宽和高维度。

i0：np.repeat(np.arange(3),3)：[0 ,0,0,1,1,1,2,2,2]
i0：np.tile([0,0,0,1,1,1,2,2,2],3)：[0,0,0,1,1,1,2,2,2,0,0,0,1,1,1,2,2,2,0,0,0,1,1,1,2,2,2]，大小为：(27,)
i1：1*np.repeat(np.arange(32),32)：[0,0,0......,31,31,31]，大小为：(1024,)
j0：np.tile(np.arange(3),3*3)：[0,1,2,0,1,2,......]，大小为：(27,)
j1：1*np.tile(np.arange(32),32)：[0,1,2,3,......,0,1,2,......,29,30,31]，大小为(1024,)
i：i0.reshape(-1,1)+i1.reshape(1,-1)：大小(27,1024)
j：j0.reshape(-1,1)+j1.reshape(1,-1)：大小(27,1024)
k：np.repeat(np.arange(3),3*3).reshape(-1,1)：大小(27,1)
补充：

numpy.pad(array, pad_width, mode, **kwargs)：array是要要被填充的数据，第二个参数指定填充的长度，mod用于指定填充的数据，默认是0，如果是constant，则需要指定填充的值。
numpy.arange(start, stop, step, dtype = None)：举例numpy.arange(3)，输出[0,1,2]
numpy.repeat(array,repeats,axis=None)：举例numpy.repeat([0,1,2],3)，输出：[0,0,0,1,1,1,2,2,2]
numpy.tile(array,reps)：举例numpy.tile([0,1,2],3)，输出：[0,1,2,0,1,2,0,1,2]
具体的更复杂的用法还是得去查相关资料。这里只列举出与本代码相关的。
有了这些大小还是挺难理解的呀。那么我们继续，需要明确的是k是对通道进行操作，i是对特征图的高，j是对特征图的宽。使用3×3的卷积核在一个通道上进行卷积，每次执行3×3=9个像素操作，共3个通道，所以共对9×3=27个像素点进行操作。而图像大小是32×32，共1024个像素。再回去看这三行代码：

cols = images_padded[:, k, i, j]
channels = images.shape[1]
# Reshape content into column shape
cols = cols.transpose(1, 2, 0).reshape(filter_height * filter_width * channels, -1)

images_padded的大小是(1,3,34,34)，则cols=images_padded的大小是(1,27,1024)

channels的大小是3

最终cols=cols.transpose(1,2,0).reshape(333,-1)的大小是(27,1024)。

当batchsize的大小不是1，假设是64时，那么最终输出的cols的大小就是：(27,1024×64)=(27,65536)。

最后就是卷积层的实现了：

首先有一个Layer通用基类，通过继承该基类可以实现不同的层，例如卷积层、池化层、批量归一化层等等：

class Layer(object):

def set_input_shape(self, shape):
    """ Sets the shape that the layer expects of the input in the forward
    pass method """
    self.input_shape = shape

def layer_name(self):
    """ The name of the layer. Used in model summary. """
    return self.__class__.__name__

def parameters(self):
    """ The number of trainable parameters used by the layer """
    return 0

def forward_pass(self, X, training):
    """ Propogates the signal forward in the network """
    raise NotImplementedError()

def backward_pass(self, accum_grad):
    """ Propogates the accumulated gradient backwards in the network.
    If the has trainable weights then these weights are also tuned in this method.
    As input (accum_grad) it receives the gradient with respect to the output of the layer and
    returns the gradient with respect to the output of the previous layer. """
    raise NotImplementedError()

def output_shape(self):
    """ The shape of the output produced by forward_pass """
    raise NotImplementedError()

对于子类继承该基类必须要实现的方法，如果没有实现使用raise NotImplementedError()抛出异常。

接着就可以基于该基类实现Conv2D了：

class Conv2D(Layer):

"""A 2D Convolution Layer.
Parameters:
-----------
n_filters: int
    The number of filters that will convolve over the input matrix. The number of channels
    of the output shape.
filter_shape: tuple
    A tuple (filter_height, filter_width).
input_shape: tuple
    The shape of the expected input of the layer. (batch_size, channels, height, width)
    Only needs to be specified for first layer in the network.
padding: string
    Either 'same' or 'valid'. 'same' results in padding being added so that the output height and width
    matches the input height and width. For 'valid' no padding is added.
stride: int
    The stride length of the filters during the convolution over the input.
"""
def __init__(self, n_filters, filter_shape, input_shape=None, padding='same', stride=1):
    self.n_filters = n_filters
    self.filter_shape = filter_shape
    self.padding = padding
    self.stride = stride
    self.input_shape = input_shape
    self.trainable = True

def initialize(self, optimizer):
    # Initialize the weights
    filter_height, filter_width = self.filter_shape
    channels = self.input_shape[0]
    limit = 1 / math.sqrt(np.prod(self.filter_shape))
    self.W  = np.random.uniform(-limit, limit, size=(self.n_filters, channels, filter_height, filter_width))
    self.w0 = np.zeros((self.n_filters, 1))
    # Weight optimizers
    self.W_opt  = copy.copy(optimizer)
    self.w0_opt = copy.copy(optimizer)

def parameters(self):
    return np.prod(self.W.shape) + np.prod(self.w0.shape)

def forward_pass(self, X, training=True):
    batch_size, channels, height, width = X.shape
    self.layer_input = X
    # Turn image shape into column shape
    # (enables dot product between input and weights)
    self.X_col = image_to_column(X, self.filter_shape, stride=self.stride, output_shape=self.padding)
    # Turn weights into column shape
    self.W_col = self.W.reshape((self.n_filters, -1))
    # Calculate output
    output = self.W_col.dot(self.X_col) + self.w0
    # Reshape into (n_filters, out_height, out_width, batch_size)
    output = output.reshape(self.output_shape() + (batch_size, ))
    # Redistribute axises so that batch size comes first
    return output.transpose(3,0,1,2)

def backward_pass(self, accum_grad):
    # Reshape accumulated gradient into column shape
    accum_grad = accum_grad.transpose(1, 2, 3, 0).reshape(self.n_filters, -1)

    if self.trainable:
        # Take dot product between column shaped accum. gradient and column shape
        # layer input to determine the gradient at the layer with respect to layer weights
        grad_w = accum_grad.dot(self.X_col.T).reshape(self.W.shape)
        # The gradient with respect to bias terms is the sum similarly to in Dense layer
        grad_w0 = np.sum(accum_grad, axis=1, keepdims=True)

        # Update the layers weights
        self.W = self.W_opt.update(self.W, grad_w)
        self.w0 = self.w0_opt.update(self.w0, grad_w0)

    # Recalculate the gradient which will be propogated back to prev. layer
    accum_grad = self.W_col.T.dot(accum_grad)
    # Reshape from column shape to image shape
    accum_grad = column_to_image(accum_grad,
                            self.layer_input.shape,
                            self.filter_shape,
                            stride=self.stride,
                            output_shape=self.padding)

    return accum_grad

def output_shape(self):
    channels, height, width = self.input_shape
    pad_h, pad_w = determine_padding(self.filter_shape, output_shape=self.padding)
    output_height = (height + np.sum(pad_h) - self.filter_shape[0]) / self.stride + 1
    output_width = (width + np.sum(pad_w) - self.filter_shape[1]) / self.stride + 1
    return self.n_filters, int(output_height), int(output_width)

假设输入还是(1,3,32,32)的维度，使用16个3×3的卷积核进行卷积，那么self.W的大小就是(16,3,3,3)，self.w0的大小就是(16,1)。

self.X_col的大小就是(27,1024)，self.W_col的大小是(16,27)，那么output = self.W_col.dot(self.X_col) + self.w0的大小就是(16,1024)

最后是这么使用的：

image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)
input_shape=image.squeeze().shape
conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='same', stride=1)
conv2d.initialize(None)
output=conv2d.forward_pass(image,training=True)
print(output.shape)
输出结果：(1,16,32,32)

计算下参数：

print(conv2d.parameters())
输出结果：448

也就是448=3×3×3×16+16

再是一个padding=valid的：

image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)
input_shape=image.squeeze().shape
conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='valid', stride=1)
conv2d.initialize(None)
output=conv2d.forward_pass(image,training=True)
print(output.shape)
print(conv2d.parameters())

需要注意的是cols的大小变化了，因为我们卷积之后的输出是(1,16,30,30)

输出：

cols的大小：(27,900)

(1,16,30,30)

448

最后是带步长的：

image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)
input_shape=image.squeeze().shape
conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='valid', stride=2)
conv2d.initialize(None)
output=conv2d.forward_pass(image,training=True)
print(output.shape)
print(conv2d.parameters())

cols的大小：(27,225)

(1,16,15,15)

448

最后补充下：

卷积层参数计算公式：params=卷积核高×卷积核宽×通道数目×卷积核数目+偏置项（卷积核数目）

卷积之后图像大小计算公式：

输出图像的高=(输入图像的高+padding（高）×2-卷积核高)/步长+1

输出图像的宽=(输入图像的宽+padding（宽）×2-卷积核宽)/步长+1

get_im2col_indices()函数中的变换操作是清楚了，至于为什么这么变换的原因还需要好好去琢磨。至于反向传播和优化optimizer等研究好了之后再更新了。

原文地址https://www.cnblogs.com/xiximayou/p/12706576.html

微信关注我们

原文链接：https://yq.aliyun.com/articles/755368

转载内容版权归作者及来源网站所有！

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

第二届南京六合创业大赛报名指南

第二届南京六合创业大赛报名指南节能环保和新材料行业是作为传统的实体产业，也是正在快速升级的两大产业，作为节能环保和新材料产业的创业者，前期的研发和经验更需要多方支持，尤其是初创企业更需要政策与资本的大力扶持，南京六合高新区产业方向为智能制造、节能环保、新材料等领域，为了吸引更多海内外优秀科技人才项目落户六合高新区，促进企业、人才、资本、技术等创新要素深度融合，推动六合高新区产业转型升级和经济高质量发展。因此举办了第二届南京六合创业大赛，下面就由合作单位创成汇平台，为你介绍本次赛事报名指南。一、本次赛事的组织单位都有哪些？主办单位：六合区政府、南京市人社局承办单位：江苏科技镇长团六合团、六合区人社局、六合高新区协办单位：深圳创成汇（深圳市众博企业服务有限公司）二、第二届南京六合创业大赛报名时间和参赛领域1.节能环保和新材料行业。包括：石油、煤炭及其他燃料加工业、基础化学原料制造、肥料制造、农药制造、涂料、油墨、颜料及类似产品制造、合成材料制造、林产化学产品制造、动物胶制造、火工及焰火产品制造、口腔清洁用品制造、合成纤维制造、泡沫塑料制造、塑料人造革、合成革制造、非金属矿物制品业和化学原料...

2020-04-14

518

多角度让你彻底明白yield语法糖的用法和原理及在C#函数式编程中的作用如果大家读过dapper源码，你会发现这内部有很多方法都用到了yield关键词，那yield到底是用来干嘛的，能不能拿掉，拿掉与不拿掉有多大的差别，首先上一段dapper中精简后的Query方法，先让大家眼见为实。 private static IEnumerable<T> QueryImpl<T>(this IDbConnection cnn, CommandDefinition command, Type effectiveType) { object param = command.Parameters; var identity = new Identity(command.CommandText, command.CommandType, cnn, effectiveType, param?.GetType()); var info = GetCacheInfo(identity, param, command.AddToCache); IDbCommand cmd = null...

2020-04-14

622

资源下载

更多资源

Mario

马里奥是站在游戏界顶峰的超人气多面角色。马里奥靠吃蘑菇成长，特征是大鼻子、头戴帽子、身穿背带裤，还留着胡子。与他的双胞胎兄弟路易基一起，长年担任任天堂的招牌角色。

Nacos

Nacos /nɑ:kəʊs/ 是 Dynamic Naming and Configuration Service 的首字母简称，一个易于构建 AI Agent 应用的动态服务发现、配置管理和AI智能体管理平台。Nacos 致力于帮助您发现、配置和管理微服务及AI智能体应用。Nacos 提供了一组简单易用的特性集，帮助您快速实现动态服务发现、服务配置、服务元数据、流量管理。Nacos 帮助您更敏捷和容易地构建、交付和管理微服务平台。

Rocky Linux

Rocky Linux（中文名：洛基）是由Gregory Kurtzer于2020年12月发起的企业级Linux发行版，作为CentOS稳定版停止维护后与RHEL（Red Hat Enterprise Linux）完全兼容的开源替代方案，由社区拥有并管理，支持x86_64、aarch64等架构。其通过重新编译RHEL源代码提供长期稳定性，采用模块化包装和SELinux安全架构，默认包含GNOME桌面环境及XFS文件系统，支持十年生命周期更新。

Sublime Text

Sublime Text具有漂亮的用户界面和强大的功能，例如代码缩略图，Python的插件，代码段等。还可自定义键绑定，菜单和工具栏。Sublime Text 的主要功能包括：拼写检查，书签，完整的 Python API ， Goto 功能，即时项目切换，多选择，多窗口等等。Sublime Text 是一个跨平台的编辑器，同时支持Windows、Linux、Mac OS X等操作系统。