代码实战带你了解深度学习中的混合精度训练
摘要:本文为大家介绍一下深度学习中的混合精度训练,并通过代码实战的方式为大家讲解实际应用的理论,并对模型进行测试。
本文分享自华为云社区《浅谈深度学习中的混合精度训练》,作者:李长安。
1 混合精度训练
混合精度训练最初是在论文Mixed Precision Training中被踢出,该论文对混合精度训练进行了详细的阐述,并对其实现进行了讲解,有兴趣的同学可以看看这篇论文。
1.1半精度与单精度
半精度(也被称为FP16)对比高精度的FP32与FP64降低了神经网络的显存占用,使得我们可以训练部署更大的网络,并且FP16在数据转换时比FP32或者FP64更节省时间。
单精度(也被称为32-bit)是通用的浮点数格式(在C扩展语言中表示为float),64-bit被称为双精度(double)。
如图所示,我们能够很直观的看到半精度的存储空间是单精度存储空间的一半。
1.2为什么使用混合精度训练
混合精度训练,指代的是单精度 float和半精度 float16 混合训练。
float16和float相比恰里,总结下来就是两个原因:内存占用更少,计算更快。
内存占用更少:这个是显然可见的,通用的模型 fp16 占用的内存只需原来的一半。memory-bandwidth 减半所带来的好处:
模型占用的内存更小,训练的时候可以用更大的batchsize。
模型训练时,通信量(特别是多卡,或者多机多卡)大幅减少,大幅减少等待时间,加快数据的流通。
计算更快:目前的不少GPU都有针对 fp16 的计算进行优化。论文指出:在近期的GPU中,半精度的计算吞吐量可以是单精度的 2-8 倍;
损失控制原理:
2 实验设计
本次实验主要从两个方面进行测试,分别在精度和速度两个部分进行对比。实验中采用ResNet-18作为测试对象,使用的数据集为美食数据集,共五种类别。
# 解压数据集 !cd data/data64280/ && unzip -q trainset.zip
2.1数据集预处理
import pandas as pd import numpy as np import os all_file_dir = 'data/data64280/trainset' img_list = [] label_list = [] label_id = 0 class_list = [c for c in os.listdir(all_file_dir) if os.path.isdir(os.path.join(all_file_dir, c))] for class_dir in class_list: image_path_pre = os.path.join(all_file_dir, class_dir) for img in os.listdir(image_path_pre): img_list.append(os.path.join(image_path_pre, img)) label_list.append(label_id) label_id += 1 img_df = pd.DataFrame(img_list) label_df = pd.DataFrame(label_list) img_df.columns = ['images'] label_df.columns = ['label'] df = pd.concat([img_df, label_df], axis=1) df = df.reindex(np.random.permutation(df.index)) df.to_csv('food_data.csv', index=0) import pandas as pd # 读取数据 df = pd.read_csv('food_data.csv') image_path_list = df['images'].values label_list = df['label'].values # 划分训练集和校验集 all_size = len(image_path_list) train_size = int(all_size * 0.8) train_image_path_list = image_path_list[:train_size] train_label_list = label_list[:train_size] val_image_path_list = image_path_list[train_size:] val_label_list = label_list[train_size:]
2.2自定义数据集
import numpy as np from PIL import Image from paddle.io import Dataset import paddle.vision.transforms as T import paddle as pd class MyDataset(Dataset): """ 步骤一:继承paddle.io.Dataset类 """ def __init__(self, image, label, transform=None): """ 步骤二:实现构造函数,定义数据读取方式,划分训练和测试数据集 """ super(MyDataset, self).__init__() imgs = image labels = label self.labels = labels self.imgs = imgs self.transform = transform # self.loader = loader def __getitem__(self, index): # 这个方法是必须要有的,用于按照索引读取每个元素的具体内容 fn = self.imgs label = self.labels # fn是图片path #fn和label分别获得imgs[index]也即是刚才每行中word[0]和word[1]的信息 for im,la in zip(fn, label): img = Image.open(im) img = img.convert("RGB") img = np.array(img).astype('float32') / 255.0 label = np.array([la]).astype(dtype='int64') # 按照路径读取图片 if self.transform is not None: img = self.transform(img) # 数据标签转换为Tensor return img, label # return回哪些内容,那么我们在训练时循环读取每个batch时,就能获得哪些内容 # ********************************** #使用__len__()初始化一些需要传入的参数及数据集的调用********************** def __len__(self): # 这个函数也必须要写,它返回的是数据集的长度,也就是多少张图片,要和loader的长度作区分 return len(self.imgs)
2.3训练准备
import paddle from paddle.metric import Accuracy import warnings warnings.filterwarnings("ignore") import paddle.vision.transforms as T transform = T.Compose([ T.Resize([224, 224]), T.ToTensor(), # T.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]), # T.Transpose(), ]) train_dataset = MyDataset(image=train_image_path_list, label=train_label_list ,transform=transform) train_loader = paddle.io.DataLoader(train_dataset, places=paddle.CPUPlace(), batch_size=16, shuffle=True) from __future__ import absolute_import from __future__ import division from __future__ import print_function import numpy as np import paddle from paddle import ParamAttr import paddle.nn as nn import paddle.nn.functional as F from paddle.nn import Conv2D, BatchNorm, Linear, Dropout from paddle.nn import AdaptiveAvgPool2D, MaxPool2D, AvgPool2D from paddle.nn.initializer import Uniform import math __all__ = ["ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet152"] class ConvBNLayer(nn.Layer): def __init__(self, num_channels, num_filters, filter_size, stride=1, groups=1, act=None, name=None, data_format="NCHW"): super(ConvBNLayer, self).__init__() self._conv = Conv2D( in_channels=num_channels, out_channels=num_filters, kernel_size=filter_size, stride=stride, padding=(filter_size - 1) // 2, groups=groups, weight_attr=ParamAttr(name=name + "_weights"), bias_attr=False, data_format=data_format) if name == "conv1": bn_name = "bn_" + name else: bn_name = "bn" + name[3:] self._batch_norm = BatchNorm( num_filters, act=act, param_attr=ParamAttr(name=bn_name + "_scale"), bias_attr=ParamAttr(bn_name + "_offset"), moving_mean_name=bn_name + "_mean", moving_variance_name=bn_name + "_variance", data_layout=data_format) def forward(self, inputs): y = self._conv(inputs) y = self._batch_norm(y) return y class BottleneckBlock(nn.Layer): def __init__(self, num_channels, num_filters, stride, shortcut=True, name=None, data_format="NCHW"): super(BottleneckBlock, self).__init__() self.conv0 = ConvBNLayer( num_channels=num_channels, num_filters=num_filters, filter_size=1, act="relu", name=name + "_branch2a", data_format=data_format) self.conv1 = ConvBNLayer( num_channels=num_filters, num_filters=num_filters, filter_size=3, stride=stride, act="relu", name=name + "_branch2b", data_format=data_format) self.conv2 = ConvBNLayer( num_channels=num_filters, num_filters=num_filters * 4, filter_size=1, act=None, name=name + "_branch2c", data_format=data_format) if not shortcut: self.short = ConvBNLayer( num_channels=num_channels, num_filters=num_filters * 4, filter_size=1, stride=stride, name=name + "_branch1", data_format=data_format) self.shortcut = shortcut self._num_channels_out = num_filters * 4 def forward(self, inputs): y = self.conv0(inputs) conv1 = self.conv1(y) conv2 = self.conv2(conv1) if self.shortcut: short = inputs else: short = self.short(inputs) y = paddle.add(x=short, y=conv2) y = F.relu(y) return y class BasicBlock(nn.Layer): def __init__(self, num_channels, num_filters, stride, shortcut=True, name=None, data_format="NCHW"): super(BasicBlock, self).__init__() self.stride = stride self.conv0 = ConvBNLayer( num_channels=num_channels, num_filters=num_filters, filter_size=3, stride=stride, act="relu", name=name + "_branch2a", data_format=data_format) self.conv1 = ConvBNLayer( num_channels=num_filters, num_filters=num_filters, filter_size=3, act=None, name=name + "_branch2b", data_format=data_format) if not shortcut: self.short = ConvBNLayer( num_channels=num_channels, num_filters=num_filters, filter_size=1, stride=stride, name=name + "_branch1", data_format=data_format) self.shortcut = shortcut def forward(self, inputs): y = self.conv0(inputs) conv1 = self.conv1(y) if self.shortcut: short = inputs else: short = self.short(inputs) y = paddle.add(x=short, y=conv1) y = F.relu(y) return y class ResNet(nn.Layer): def __init__(self, layers=50, class_dim=1000, input_image_channel=3, data_format="NCHW"): super(ResNet, self).__init__() self.layers = layers self.data_format = data_format self.input_image_channel = input_image_channel supported_layers = [18, 34, 50, 101, 152] assert layers in supported_layers, \ "supported layers are {} but input layer is {}".format( supported_layers, layers) if layers == 18: depth = [2, 2, 2, 2] elif layers == 34 or layers == 50: depth = [3, 4, 6, 3] elif layers == 101: depth = [3, 4, 23, 3] elif layers == 152: depth = [3, 8, 36, 3] num_channels = [64, 256, 512, 1024] if layers >= 50 else [64, 64, 128, 256] num_filters = [64, 128, 256, 512] self.conv = ConvBNLayer( num_channels=self.input_image_channel, num_filters=64, filter_size=7, stride=2, act="relu", name="conv1", data_format=self.data_format) self.pool2d_max = MaxPool2D( kernel_size=3, stride=2, padding=1, data_format=self.data_format) self.block_list = [] if layers >= 50: for block in range(len(depth)): shortcut = False for i in range(depth[block]): if layers in [101, 152] and block == 2: if i == 0: conv_name = "res" + str(block + 2) + "a" else: conv_name = "res" + str(block + 2) + "b" + str(i) else: conv_name = "res" + str(block + 2) + chr(97 + i) bottleneck_block = self.add_sublayer( conv_name, BottleneckBlock( num_channels=num_channels[block] if i == 0 else num_filters[block] * 4, num_filters=num_filters[block], stride=2 if i == 0 and block != 0 else 1, shortcut=shortcut, name=conv_name, data_format=self.data_format)) self.block_list.append(bottleneck_block) shortcut = True else: for block in range(len(depth)): shortcut = False for i in range(depth[block]): conv_name = "res" + str(block + 2) + chr(97 + i) basic_block = self.add_sublayer( conv_name, BasicBlock( num_channels=num_channels[block] if i == 0 else num_filters[block], num_filters=num_filters[block], stride=2 if i == 0 and block != 0 else 1, shortcut=shortcut, name=conv_name, data_format=self.data_format)) self.block_list.append(basic_block) shortcut = True self.pool2d_avg = AdaptiveAvgPool2D(1, data_format=self.data_format) self.pool2d_avg_channels = num_channels[-1] * 2 stdv = 1.0 / math.sqrt(self.pool2d_avg_channels * 1.0) self.out = Linear( self.pool2d_avg_channels, class_dim, weight_attr=ParamAttr( initializer=Uniform(-stdv, stdv), name="fc_0.w_0"), bias_attr=ParamAttr(name="fc_0.b_0")) def forward(self, inputs): y = self.conv(inputs) y = self.pool2d_max(y) for block in self.block_list: y = block(y) y = self.pool2d_avg(y) y = paddle.reshape(y, shape=[-1, self.pool2d_avg_channels]) y = self.out(y) return y def ResNet18(**args): model = ResNet(layers=18, **args) return model
2.4训练过程定义
import paddle import numpy import paddle.nn.functional as F import time def train(model): model.train() epochs = 5 optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters()) # 用Adam作为优化函数 for epoch in range(epochs): for batch_id, data in enumerate(train_loader()): x_data = data[0] y_data = data[1] # print(y_data) predicts = model(x_data) loss = F.cross_entropy(predicts, y_data) # 计算损失 acc = paddle.metric.accuracy(predicts, y_data, k=2) loss.backward() if batch_id % 10 == 0: print("epoch: {}, batch_id: {}, loss is: {}, acc is: {}".format(epoch, batch_id, loss.numpy(), acc.numpy())) optim.step() optim.clear_grad() import paddle import numpy import paddle.nn.functional as F import time def train_amp(model): model.train() epochs = 5 optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters()) # 用Adam作为优化函数 for epoch in range(epochs): for batch_id, data in enumerate(train_loader()): x_data = data[0].astype('float16') y_data = data[1] scaler = paddle.amp.GradScaler(init_loss_scaling=1024) with paddle.amp.auto_cast(): predicts = model(x_data) loss = F.cross_entropy(predicts, y_data) scaled = scaler.scale(loss) # scale the loss scaled.backward() # do backward acc = paddle.metric.accuracy(predicts, y_data, k=2) if batch_id % 10 == 0: print("epoch: {}, batch_id: {}, loss is: {}, acc is: {}".format(epoch, batch_id, loss.numpy(), acc.numpy())) optim.step() optim.clear_grad()
2.5开启训练
此部分,分别对两种训练方式进行对比,主要关注模型的训练速度
model = ResNet18(class_dim=2) strat = time.time() train(model) end = time.time() print('no_amp:', end-strat) epoch: 0, batch_id: 0, loss is: [0.21116894], acc is: [1.] epoch: 1, batch_id: 0, loss is: [0.00010776], acc is: [1.] epoch: 2, batch_id: 0, loss is: [2.5868081e-05], acc is: [1.] epoch: 3, batch_id: 0, loss is: [1.442422e-05], acc is: [1.] epoch: 4, batch_id: 0, loss is: [1.1086402e-05], acc is: [1.] no_amp: 740.6813971996307 strat1 = time.time() train_amp(model) end1 = time.time() print('with amp:', end1-strat1) epoch: 0, batch_id: 0, loss is: [0.512834], acc is: [1.] epoch: 1, batch_id: 0, loss is: [0.00025519], acc is: [1.] epoch: 2, batch_id: 0, loss is: [5.9364465e-05], acc is: [1.] epoch: 3, batch_id: 0, loss is: [3.2305197e-05], acc is: [1.] epoch: 4, batch_id: 0, loss is: [2.4556812e-05], acc is: [1.] with amp: 740.9603228569031
3 总结
对于本次实验,由于迭代轮数较少,只迭代了5次,故时间上的优势没有体现出来,大家有兴趣的可以增加迭代次数,或者换更深的网络进行测试。
从训练的结果来看,使用混合精度训练,其loss值是高于未使用混合精度训练模型的。

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。
持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。
转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。
- 上一篇
Ascend CL两种数据预处理的方式:AIPP和DVPP
摘要:本文介绍了昇腾CANN提供的两种数据预处理的方式:DVPP和AIPP,介绍了两者的功能、差别及联系,并以具体代码示例介绍了如何使用DVPP和AIPP的功能。 本文分享自华为云社区《了解AscendCL数据预处理的两种方式:AIPP和DVPP》,作者:昇腾CANN。 数据预处理的典型使用场景 受网络结构和训练方式等因素的影响,绝大多数神经网络模型对输入数据都有格式上的限制。在计算机视觉领域,这个限制大多体现在图像的尺寸、色域、归一化参数等。如果源图或视频的尺寸、格式等与网络模型的要求不一致时,我们需要对其处理,使其符合模型的要求,这个操作,一般称之为数据预处理。 AIPP、DVPP,它们都能做什么 CANN提供了两套专门用于数据预处理的方式:AIPP和DVPP。 总结一下,虽然都是数据预处理,但AIPP与DVPP的功能范围不同(比如DVPP可以做图像编解码、视频编解码,AIPP可以做归一化配置),处理数据的计算单元也不同,AIPP用的AI Core计算加速单元,DVPP就是用的专门的图像处理单元。 AIPP、DVPP可以分开独立使用,也可以组合使用。组合使用场景下,一般先使用DVP...
- 下一篇
python进阶:带你学习实时目标跟踪
摘要:本程序主要实现了python的opencv人工智能视觉模块的目标跟踪功能。 本文分享自华为云社区《python进阶——人工智能实时目标跟踪,这一篇就够用了!》,作者:lqj_本人 。 前言 本程序主要实现了python的opencv人工智能视觉模块的目标跟踪功能。 项目介绍 区域性锁定目标实时动态跟踪(适用 警方追捕,无人机锁定拍摄等) 首先先介绍几种AI视觉算法 特性: 1.BOOSTING:算法原理类似于Harr cascdes(AdaBoost),是一种很老的算法。这个算法速度慢并且不准。 2.MIL:比BOOSTING准一点 3.KCF:速度比BOOSTING和MIL更快,与BOOSTING和MIL一样不能很好的处理遮挡问题。 4.CSRT:比KCF更准一些,但是速度比KCF慢 5.MedianFlow:对于快速移动的目标和外形比那花迅速的目标效果不好 6.TLD:会产生朵的false-posittives 7.MOSSE:算法速度非常快,但是准确率比不上KCF和CSRT,在一些追求算法的速度场合很适用 8.GOTURN:OpenCV中自带的唯一一个基于深度学习的算法,运...
相关文章
文章评论
共有0条评论来说两句吧...