糖尿病康复 > 深度学习常用python库学习笔记

深度学习常用python库学习笔记

时间：2023-02-26 04:21:36

相关推荐

深度学习常用python库学习笔记

常用的4个库一、Numpy库1、数组的创建（1）np.array()（2）np.zeros()（3）np.ones()（4）np.empty()（5）np.arange()（6）np.random.normal()2、数组的计算（1）np.dot(arr1,arr2)（2）矩阵的其它计算(3)矩阵转置（4）基本数组统计方法（5）numpy.linalg函数3、数组的索引与切片二、Pandas库1、Series（1）将列表转化为Series（2）Ser设置索引（3）Series 可以用字典实例化（4）可以通过Series的values和index属性获取其数组表示形式和索引对象（5）可以通过索引选取Series中的单个或一组值（6）Series中最重要的一个功能是：它会在算术运算中自动对齐不同索引的数据（7）切片操作（8）Series对象本身及其索引都有一个name属性，该属性跟pandas其他的关键功能关系非常密切.2、DataFrame（1）用多维数组字典、列表字典生成 DataFrame（2）使用columns参数设置列顺序（3）用 Series 字典或字典生成 DataFrame（4）获取DataFrame的列为Series（5）列可以通过赋值的方式进行修改PIL库Matplotlib库说明

常用的4个库

numpy是Python科学计算库的基础。包含了强大的N维数组对象和向量运算。

pandas是建立在numpy基础上的高效数据分析处理库，是Python的重要数据分析库。

Matplotlib是一个主要用于绘制二维图形的Python库。用途：绘图、可视化

PIL库是一个具有强大图像处理能力的第三方库。用途：图像处理

一、Numpy库

numpy中文网

1、数组的创建

（1）np.array()

import numpy as nparray1 = np.array([[1,2,3],[4,5,6]])

（2）np.zeros()

生成指定shape全0数组

zeroarray = np.zeros((2,3))print(aeroarray)

[[0. 0. 0.][0. 0. 0.]]

（3）np.ones()

生成指定shape的全1数组

onearray = np.ones((3,4))print(onearray)

[[1. 1. 1. 1.][1. 1. 1. 1.][1. 1. 1. 1.]]

（4）np.empty()

创建的数组初始内容是随机的

（5）np.arange()

生成的是一个数组

array = np.arange(10,31,5)print(array)

[10 15 20 25 30]

（6）np.random.normal()

#给定均值/标准差/维度的正态分布np.random.normal(1.75, 0.1, (2, 3))

array([[1.79250628, 1.83204225, 1.71973433],[1.58555017, 1.66339554, 1.70447666]])

2、数组的计算

（1）np.dot(arr1,arr2)

矩阵乘法

arr3 = np.array([[1,2,3],[4,5,6]])arr4 = np.ones([3,2],dtype=np.int64)print(arr3)print(arr4)print(np.dot(arr3,arr4))

[[1 2 3][4 5 6]][[1 1][1 1][1 1]][[ 6 6][15 15]]

（2）矩阵的其它计算

import numpy as nparr3 = np.array([[1,2,3],[4,5,6]])print(arr3)print(np.sum(arr3,axis=1)) #axis=1,每一行求和 axie=0,每一列求和print(np.max(arr3))print(np.min(arr3))print(np.mean(arr3))print(np.argmax(arr3))print(np.argmin(arr3))

运行结果

[[1 2 3][4 5 6]][ 6 15]613.550

(3)矩阵转置

arr3_tran = arr3.transpose()print(arr3_tran)print(arr3.flatten())

运行结果

[[1 4][2 5][3 6]][1 2 3 4 5 6]

（4）基本数组统计方法

（5）numpy.linalg函数

3、数组的索引与切片

arr5 = np.arange(0,6).reshape([2,3])print(arr5)print(arr5[1])print(arr5[1][2])print(arr5[1,2])print(arr5[1,:])print(arr5[:,1])print(arr5[1,0:2])

运行结果：

[[0 1 2][3 4 5]][3 4 5]55[3 4 5][1 4][3 4]

二、Pandas库

pandas是python第三方库，提供高性能易用数据类型和分析工具。

pandas基于numpy实现，常与numpy和matplotlib一同使用。

更多学习：Pandas中文网

Pandas 核心数据结构如下图：

1、Series

Series是一种类似于一维数组的对象，它由一维数组（各种numpy数据类型）以及一组与之相关的数据标签（即索引）组成。

可理解为带标签的一维数组，可存储整数、浮点数、字符串、Python 对象等类型的数据。

（1）将列表转化为Series

import pandas as pdimport numpy as nps = pd.Series(['a','b','c','d','e'])print(s)

运行结果

0 a1 b2 c3 d4 edtype: object

（2）Ser设置索引

Seris中可以使用index设置索引列表。

与字典不同的是，Seris允许索引重复。

s = pd.Series(['a','b','c','d','e'],index=[100,200,100,400,500])print(s)

结果

100 a200 b100 c400 d500 edtype: object

（3）Series 可以用字典实例化

d = {'b': 1, 'a': 0, 'c': 2}pd.Series(d)

运行结果：

b 1a 0c 2dtype: int64

（4）可以通过Series的values和index属性获取其数组表示形式和索引对象

print(s)print(s.values)print(s.index)

100 a200 b100 c400 d500 edtype: object['a' 'b' 'c' 'd' 'e']Int64Index([100, 200, 100, 400, 500], dtype='int64')

（5）可以通过索引选取Series中的单个或一组值

print(s[100])print(s[[400, 500]])

得到结果：

100 a100 cdtype: object400 d500 edtype: object

（6）Series中最重要的一个功能是：它会在算术运算中自动对齐不同索引的数据

Series 和多维数组的主要区别在于， Series 之间的操作会自动基于标签对齐数据。因此，不用顾及执行计算操作的 Series 是否有相同的标签。

obj1 = pd.Series({"Ohio": 35000, "Oregon": 16000, "Texas": 71000, "Utah": 5000})print(obj1)obj2 = pd.Series({"California": np.nan, "Ohio": 35000, "Oregon": 16000, "Texas": 71000})print(obj2)print(obj1 + obj2)

结果如下：

Ohio35000Oregon 16000Texas71000Utah 5000dtype: int64California NaNOhio35000.0Oregon 16000.0Texas 71000.0dtype: float64California NaNOhio 70000.0Oregon 32000.0Texas 142000.0UtahNaNdtype: float64

（7）切片操作

s = pd.Series(np.array([1,2,3,4,5]), index=['a', 'b', 'c', 'd', 'e'])print(s[1:])print(s[:-1])print(s[1:] + s[:-1])

b 2c 3d 4e 5dtype: int64a 1b 2c 3d 4dtype: int64a NaNb 4.0c 6.0d 8.0e NaNdtype: float64

（8）Series对象本身及其索引都有一个name属性，该属性跟pandas其他的关键功能关系非常密切.

import pandas as pdimport numpy as npobj1 = pd.Series({'California':np.nan,'Ohio':35000,'Oregon':16000,'Texas':71000})print(obj1)obj1.name = 'population'obj1.index.name = 'state'print(obj1)

California NaNOhio35000.0Oregon 16000.0Texas 71000.0dtype: float64stateCalifornia NaNOhio35000.0Oregon 16000.0Texas 71000.0Name: population, dtype: float64

2、DataFrame

DataFrame是一个表格型的数据结构，类似于Excel或sql表

它含有一组有序的列，每列可以是不同的值类型（数值、字符串、布尔值等）

DataFrame既有行索引也有列索引，它可以被看做由Series组成的字典（共用同一个索引）

（1）用多维数组字典、列表字典生成 DataFrame

import pandas as pddata = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'year': [2000, 2001, 2002, 2001, 2002], 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}frame = pd.DataFrame(data)print(frame)

state year pop0 Ohio 2000 1.51 Ohio 2001 1.72 Ohio 2002 3.63 Nevada 2001 2.44 Nevada 2002 2.9

（2）使用columns参数设置列顺序

#如果指定了列顺序，则DataFrame的列就会按照指定顺序进行排列frame1 = pd.DataFrame(data, columns=['year', 'state', 'pop'])print(frame1)

结果：

year state pop0 2000 Ohio 1.51 2001 Ohio 1.72 2002 Ohio 3.63 2001 Nevada 2.44 2002 Nevada 2.9

跟原Series一样，如果传入的列在数据中找不到，就会产生NAN值

data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'year': [2000, 2001, 2002, 2001, 2002], 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}frame2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'], index=['one', 'two', 'three', 'four', 'five'])print(frame2)

year state pop debtone 2000 Ohio 1.5 NaNtwo 2001 Ohio 1.7 NaNthree 2002 Ohio 3.6 NaNfour 2001 Nevada 2.4 NaNfive 2002 Nevada 2.9 NaN

（3）用 Series 字典或字典生成 DataFrame

d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}print(pd.DataFrame(d))

one twoa 1.0 1.0b 2.0 2.0c 3.0 3.0d NaN 4.0

（4）获取DataFrame的列为Series

#通过类似字典标记的方式或属性的方式，可以将DataFrame的列获取为一个Series,返回的Series拥有原DataFrame相同的索引frame2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'], index=['one', 'two', 'three', 'four', 'five'])print(frame2['state'])

one Ohiotwo OhiothreeOhiofourNevadafiveNevadaName: state, dtype: object

（5）列可以通过赋值的方式进行修改

frame2['debt'] = 16.5print(frame2)

year state pop debtone 2000 Ohio 1.5 16.5two 2001 Ohio 1.7 16.5three 2002 Ohio 3.6 16.5four 2001 Nevada 2.4 16.5five 2002 Nevada 2.9 16.5

PIL库

PIL库是一个具有强大图像处理能力的第三方库。

图像的组成：由RGB三原色组成,RGB图像中，一种彩色由R、G、B三原色按照比例混合而成。0-255区分不同亮度的颜色。

图像的数组表示：图像是一个由像素组成的矩阵，每个元素是一个RGB值。

Image是 PIL 库中代表一个图像的类（对象）

from PIL import Imageimport matplotlib.pyplot as plt#读取图片img = Image.open(r'E:\CS\深度学习\T.jpg')plt.imshow(img)plt.show()#获得图像的模式img_mode = img.modeprint(img_mode)#获取图像宽度和高度width,height = img.sizeprint(width,height)#图片旋转img_rotate = img.rotate(45)plt.imshow(img_rotate)plt.show()#图片裁剪img_crop = img.crop((677,0,1823,1273))plt.imshow(img_crop)plt.show()#图片放缩img_resize = img.resize((int(width*0.6),int(height*0.6)),Image.ANTIALIAS)plt.imshow(img_resize)plt.show()#图片左右翻转img_lr = img.transpose(Image.FLIP_LEFT_RIGHT)plt.imshow(img_lr)plt.show()#图片上下镜像img_bt = img_lr.transpose(Image.FLIP_TOP_BOTTOM)plt.imshow(img_bt)plt.show()

Matplotlib库

Matplotlib中文网

Matplotlib库由各种可视化类构成，内部结构复杂。

matplotlib.pylot是绘制各类可视化图形的命令字库。

import matplotlib.pyplot as pltimport numpy as npx = np.linspace(-1,1,50) #等差数列y1 = 2*x + 1y2 = x**2#传入x,y1通过plot()绘制折线图plt.figure()plt.plot(x,y1)#创建新窗口绘制第二幅图形plt.figure(figsize=(7,5))plt.plot(x,y2)#显示图形plt.show()

运行得到：

import matplotlib.pyplot as pltimport numpy as npx = np.linspace(-1,1,50)y1 = x*2 + 1y2 = x**2plt.figure(figsize = (7,5))#生成绘制窗口plt.plot(x,y1,color='red',linewidth=1)#绘制第一条曲线并设置颜色线宽plt.plot(x,y2,color='blue',linewidth=5)#绘制第二条曲线并设置颜色线宽#设置坐标标签和字号plt.xlabel('x',fontsize=20)plt.ylabel('y',fontsize=20)#显示图像plt.show()

运行得到：

import matplotlib.pyplot as pltimport numpy as npx = np.linspace(-1,1,50)y1 = x*2 + 1y2 = x**2#绘制曲线a1, = plt.plot(x,y1,color='red',linewidth=1)a2, = plt.plot(x,y2,color='blue',linewidth=5)#设置图例plt.legend(handles=[a1,a2],labels=['aa','bb'],loc='best')#设置x,y坐标标签plt.xlabel('x')plt.ylabel('y')#设置显示范围plt.xlim((0,1)) #x轴只截取一段进行显示plt.ylim((0,1)) #y轴只截取一段进行显示#显示图形plt.show()

运行得到：

import numpy as npimport matplotlib.pyplot as pltdots1 =np.random.rand(50)dots2 =np.random.rand(50)#绘制散点图plt.scatter(dots1,dots2,c='red',alpha=0.5) #c表示颜色，alpha表示透明度plt.show()

import matplotlib.pyplot as pltimport numpy as npx = np.arange(10)y = 2**x+10plt.bar(x,y,facecolor='#9999ff',edgecolor='white')plt.show()

x = np.arange(10)y = 2**x+10plt.bar(x,y,facecolor='#9999ff',edgecolor='white')for ax,ay in zip(x,y):plt.text(ax,ay,'%.1f' % ay,ha='center',va='bottom')plt.show(