关于，我为什么要写这篇博客

我的 DeepLearning 是在网易云那里上的课，所以没有作业，也没有一些模块文件以及训练用的数据集，所以就自己编写了数据集，对于没使用过 Python 的我当初是挺烦恼的，也很难找到一些我想需模块的使用文档。所以决定在这里帮助跟我一样遇到麻烦，C币或积分又寥寥无几的朋友们。话不多说，下面就开始！

1- Packages

这里将是我们需要用到的包，请先导入！

对于用到的包，我就不解释了，有兴趣的可以自己搜搜，更详细。

import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from os import walk% matplotlib inline

2 - 从外存中读取图片

def readImg(roots ="" , label = 1): 
""" roots表示的是目录路径，可以直接将图片放在一个文件夹A，文件夹A与此模块在同一个目录下便可以直接访问到。 """for (root, dirs, files) in walk(roots):images = []for image in files:fname = root + "/" + imageimage = np.array(ndimage.imread(fname, flatten=False))image = scipy.misc.imresize(image, size=(64,64)) #此处的size 可以自己调整，但要确保所有数据的都一致；（64，64）是课程中的Size。images.append(image)images = np.array(images)        labels = (np.zeros((1,images.shape[0])) + label)labels =  labels.astype(int)return images,labels

3 - 整合数据

利用上面所写的函数，将外存中属于同一个DB的图片读取出来并整合

#所传的第一个参数是我的文件夹的名字
images_test_cat,labels_test_cat = readImg("test_set_cat", label = 1);
images_test_nocat,labels_test_nocat =  readImg("test_set_nocat", label = 0);images = np.vstack((images_test_cat,images_test_nocat))#纵向合并
labels = np.hstack((labels_test_cat,labels_test_nocat))#横向合并

4- 写入数据库

def write_dataset(dbName,images,labels):try:f = h5py.File(dbName,"w")f.create_dataset("test_set_x",data = images)#第一个参数是数据集的名字f.create_dataset("test_set_y",data = labels)finally:f.close()

5 - 读取数据库

def read_dataset(dbName,xName,yName):try:f = h5py.File(dbName,"r")X = f[xName][:]Y = f[yName][:]finally:f.close()return X,Y

6 - lr_utils.py 文件

这个模块文件代码，我贴在下面，大家写作业的时候会用到。希望大家根据这个文件来编写数据库以及数据集的名字，以便完成作业时，自己的变量名与课程的一致。不过自己根据自己写的数据集，运行结果肯定是会有不同的。

import numpy as np
import h5pydef load_dataset():train_dataset = h5py.File('train_catvnoncat.h5', "r")train_set_x_orig = np.array(train_dataset["train_set_x"][:])  # your train set featuresprint(train_set_x_orig.shape)train_set_y_orig = np.array(train_dataset["train_set_y"][:])  # your train set labelsprint(train_set_y_orig.shape)test_dataset = h5py.File('test_catvnoncat.h5', "r")test_set_x_orig = np.array(test_dataset["test_set_x"][:])  # your test set featurestest_set_y_orig = np.array(test_dataset["test_set_y"][:])  # your test set labelsclasses = np.array(test_dataset["list_classes"][:])  # the list of classestrain_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))print(train_set_y_orig.shape)test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

7 - 最后

下面链接是与课程（第一课第二周）一致的数据集下载地址（仅有数据集和模块文件），没有相关题目！

点我下载

最后，如果想要以上教程的文件到下面链接下载！放入图片，即可使用，或者修改参数，改成自己想要名字，之类的操作。

数据库教程下载

如果想更多的了解一下 h5py ，请参考 h5py

吴恩达 DeepLearning 第一课第二周数据库制作教程