TensorFlow深度学习实战（7）——分类任务详解网站首页 技术交流

TensorFlow深度学习实战（7）——分类任务详解

盼小辉丶 2025-08-17 00:01:04

简介TensorFlow深度学习实战（7）——分类任务详解

TensorFlow深度学习实战（7）——分类任务详解

0. 前言

分类任务 (Classification Task) 是机器学习中的一种监督学习问题，其目的是将输入数据(特征向量)映射到离散的类别标签。广泛应用于如文本分类、图像识别、垃圾邮件检测、医学诊断等多种领域。

1. 分类任务

1.1 分类任务简介

分类任务的目标是通过训练数据学习一个模型，使得对于新的输入数据能够预测其所属的类别。输入数据是模型的自变量，通常是特征向量 $[x_1, x_2, dots, x_n])$ ，其中 $n$ 是特征的维度，每个特征可能是连续值(如温度、年龄)或离散值(如颜色、性别)。输出是一个类别标签，表示每个输入数据点的所属类别，对于二分类任务，输出标签通常为 0 或 1；而对于多分类任务，标签的数量可以是多个类别，例如 0、1、2、3 等。
根据类别的数量不同，可以将分类任务归为不同类型：

二分类 (Binary Classification)：输出只有两个类别，例如“是”与“否”
多分类 (Multiclass Classification)：输出包含多个类别标签，适用于每个样本属于多个可能类别中的一个的任务，例如“猫”、“狗”、“狮子”、“大象”等
多标签分类 (Multilabel Classification)：与传统的单一类别分类不同，每个样本可以同时属于多个类别

1.2 分类与回归的区别

回归和分类任务之间的区别：

在分类任务中，数据被分成不同的类别，而在回归中，目标是根据给定的数据得到一个连续值。例如，识别手写数字的任务属于分类任务，所有的手写数字都属于 0 到 9 之间的某个数字；而根据不同的输入变量预测房屋价格则属于回归任务
在分类任务中，模型的目标是找到分隔不同类别的决策边界；而在回归任务中，模型的目标是逼近一个适合输入输出关系的函数。

分类和回归任务的不同之处如下图所示。在分类中，我们需要找到分隔类别的线(或平面，或超平面)。在回归中，目标是找到一条(或一个平面，或一个超平面)拟合给定输入与输出关系的线。

分类与回归

2. 逻辑回归

逻辑回归 (Logistic regression) 用于确定事件发生的概率。通常，事件表示为分类的因变量。事件发生的概率使用 sigmoid (或logit )函数表示：
$1{1+e^{-(b+W^Tx)}}$
目标是估计权重 $W={w_1,w_2,...,w_n}$ 和偏置项 $b$ 。在逻辑回归中，系数可以使用最大似然估计或随机梯度下降来估计。如果 $p$ 是输入数据样本的总数，损失通常定义为交叉熵项：
$loss=sum_{i=1}^pY_ilog(hat Y_i)+(1-Y_i)log(1-hat Y_i)$
逻辑回归用于分类问题。例如，在分析医疗数据时，我们可以使用逻辑回归来分类一个人是否患有癌症。如果输出的分类变量具有两个或多个，可以使用多分类逻辑回归。对于多分类逻辑回归，交叉熵损失函数可以改写为：
$loss=sum_{i=1}^psum_{j=1}^kY_{ij}loghat Y_{ij}$
其中 $k$ 是类别总数。了解了逻辑回归的原理后，接下来，将其应用于具体实践中。

3. 使用 TensorFlow 实现逻辑回归

接下来，使用 TensorFlow 实现逻辑回归，对 MNIST 手写数字进行分类。MNIST 数据集包含手写数字的图像，每个图像都有一个标签值(介于 0 到 9 之间)标注图像中的数字值。因此，属于多类别分类问题。
为了实现逻辑回归，构建一个仅包含一个全连接层的模型。输出中的每个类别由一个神经元表示，由于我们有 10 个类别，输出层的神经元数为 10。逻辑回归中使用的概率函数类似于 sigmoid 激活函数，因此，模型使用 sigmoid 激活。接下来，构建模型。

(1) 首先，导入所需库。由于全连接层接收的输入为一维数据，因此使用 Flatten 层，用于将 MNIST 数据集中的 28 x 28 二维输入图像调整为一个包含 784 个元素的一维数组：

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow.keras as K
from tensorflow.keras.layers import Dense, Flatten

(2) 从 tensorflow.keras 数据集中获取 MNIST 输入数据：

((train_data, train_labels),(test_data, test_labels)) = tf.keras.datasets.mnist.load_data()

(3) 对数据进行预处理。对图像进行归一化，MNIST 数据集的图像是灰度图像，每个像素的强度值介于 0 到 255 之间，将其除以 255，使数值范围在 0 到 1 之间：

train_data = train_data/np.float32(255)
train_labels = train_labels.astype(np.int32)  
test_data = test_data/np.float32(255)
test_labels = test_labels.astype(np.int32)

(4) 定义模型，模型只有一个具有 10 个神经元的 Dense 层，输入大小为 784，从模型摘要的输出中可以看到，只有 Dense 层具有可训练的参数：

model = K.Sequential([
                      # Dense(64,  activation='relu'),
                      # Dense(32,  activation='relu'),
                      Flatten(input_shape=(28, 28)),
                      Dense(10, activation='sigmoid')
])
print(model.summary())

模型架构

(5) 因为标签是整数值，因此使用 SparseCategoricalCrossentropy 损失函数，设置 logits 参数为 True。选择 Adam 优化器，此外，定义准确率作为在训练过程中需要记录的指标。模型训练 50 个 epochs，使用 80:20 的比例拆分训练-验证集：

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
history = model.fit(x=train_data,y=train_labels, epochs=50, verbose=1, validation_split=0.2)

(6) 绘制损失曲线观察模型性能表现。可以看到随着 epoch 的增加，训练损失降低的同时，验证损失逐渐增加，因此模型出现过拟合，可以通过添加隐藏层来改善模型性能：

plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

训练过程监测

(7) 为了更好地理解结果，构建两个实用函数，用于可视化手写数字以及模型输出的 10 个神经元的概率：

predictions = model.predict(test_data)

def plot_image(i, predictions_array, true_label, img):
    true_label, img = true_label[i], img[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])

    plt.imshow(img, cmap=plt.cm.binary)

    predicted_label = np.argmax(predictions_array)
    if predicted_label == true_label:
        color = 'blue'
    else:
        color = 'red'

    plt.xlabel("Pred {} Conf: {:2.0f}% True ({})".format(predicted_label,
                                    100*np.max(predictions_array),
                                    true_label),
                                    color=color)

def plot_value_array(i, predictions_array, true_label):
    true_label = true_label[i]
    plt.grid(False)
    plt.xticks(range(10))
    plt.yticks([])
    thisplot = plt.bar(range(10), predictions_array, color="#777777")
    plt.ylim([0, 1])
    predicted_label = np.argmax(predictions_array)

    thisplot[predicted_label].set_color('red')
    thisplot[true_label].set_color('blue')

(8) 绘制预测结果：

i = 56
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_data)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i],  test_labels)
plt.show()

左侧的图像是手写数字图像，图像下方显示了预测的标签、预测的置信度以及真实标签。右侧的图像显示了 10 个神经元输出的概率(逻辑输出)，可以看到代表数字 4 的神经元具有最高的概率：

训练结果

(9) 为了保持逻辑回归的特性，以上代码仅使用了一个包含 sigmoid 激活函数的 Dense 层。为了获得更好的性能，可以添加 Dense 层并使用 softmax 作为最终的激活函数，以下模型在验证数据集上能够达到 97% 的准确率：

better_model = K.Sequential([
                      Flatten(input_shape=(28, 28)),
                      Dense(128,  activation='relu'),
                      #Dense(64,  activation='relu'),
                      Dense(10, activation='softmax')
])
better_model.summary()

better_model.compile(optimizer='adam',
                     loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                     metrics=['accuracy'])

history = better_model.fit(x=train_data,y=train_labels, epochs=10, verbose=1, validation_split=0.2)

plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

predictions = better_model.predict(test_data)
i = 0
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_data)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i],  test_labels)
plt.show()