Loading and Preparing CIFAR-10 Dataset for Training with PyTorch

zeah 2024. 9. 22. 07:13

2024. 9. 22. 07:13

Explanation of the Code:

This Python code demonstrates how to load and preprocess the CIFAR-10 dataset for training, validation, and testing using PyTorch. The CIFAR-10 dataset is commonly used for image classification tasks and consists of 10 classes, with 60,000 images total. The code includes splitting the training dataset into a training and validation set, applying transformations, and loading data using the DataLoader class.

Step-by-Step Breakdown:

Image Preprocessing (Transforms):
- This part of the code defines the preprocessing steps applied to the CIFAR-10 images before they are passed through the neural network.
  - Resize(64): Each image is resized to 64x64 pixels.
  - ToTensor(): The images are converted to PyTorch tensors (required for model input).
  - Normalize(): The image data is normalized using the mean and standard deviation values calculated from the CIFAR-10 dataset.
Loading CIFAR-10 Training Data:
- This code block loads the CIFAR-10 training data from the specified path (c:/Users/zeah/data/cifar-10-batches-py). The train=True flag ensures the training portion of the dataset is loaded. The transform=transform argument applies the previously defined transformations.
Splitting Training Data into Training and Validation Sets:
- The training data is split into two parts: 90% for training and 10% for validation. This ensures that a portion of the data is reserved to evaluate the model's performance during training without overfitting on the training data.
- random_split() is used to divide the dataset randomly.
Loading CIFAR-10 Testing Data:
- The CIFAR-10 testing data is loaded in this block. Setting train=False loads the test set, which consists of 10,000 images. The transformations defined earlier are also applied here.
Creating DataLoaders for Training, Validation, and Testing:
- DataLoader is used to handle the batches of data that will be fed into the model during training. The batch_size=32 argument specifies that 32 images will be processed in each iteration.
- shuffle=True is applied to the training data so that the images are fed into the model in a random order during each epoch, which helps improve the generalization of the model.
Inspecting Data Batches:
- These lines grab the first batch of images and their corresponding labels from the training, validation, and test sets. This is useful for inspecting the data shape and verifying that the loaders are working as expected.
Printing the Shape of the Data Batches:
- Finally, the shapes of the data batches are printed to ensure that the data is correctly loaded and prepared. Since the batch size is 32, and each image has three color channels (RGB) with a size of 64x64 pixels, the expected output shape for the training batch is [32, 3, 64, 64].

CIFAR_10_Dataset.py

0.00MB

'AI' 카테고리의 다른 글

중국의 Manus AI 발표의 의미와 영향 분석 (2)	2025.03.06
중국, 새로운 AI 에이전트 'Manus' 공개… OpenAI 모델보다 우수하다고 주장 (1)	2025.03.06
ChatGPT Prompt (0)	2024.01.01
ChatGPT Cheat Sheet (0)	2023.12.31
Kernel Trick (0)	2023.12.26

Zeah Engineering Factory

Loading and Preparing CIFAR-10 Dataset for Training with PyTorch

Explanation of the Code:

Step-by-Step Breakdown:

'AI' 카테고리의 다른 글

+ Recent posts

티스토리툴바