Data Preprocessing & Augmentation

Study/딥러닝

Data Preprocessing & Augmentation

Jiwon Kim

|2023. 10. 22. 16:28

[ Zero-Centering & Normalization ]

* why zero-centering?

- recall how gradient was oriented inefficiently when all the inputs were positive either negative

- classification becomes less sensitive to small changes in weights

* zero-centering : substract the data by global mean

* normalizing : divide the data by standard deviation

[ PCA & Whitening ]

* PCA : data becomes zero-centered AND axis-aligned

* Whitening : covariance matrix becomes Identity matrix (each axis would have same importance)

[ Data Augmentation ]

Data Augmentation : 원본 고양이 사진을 살짝 변형 : no semantic difference, but BIG difference by pixel - level

우리에게 있는 data는 실제 data의 극히 일부분이므로 조금씩 변형을 하여 최대한 다양한 example들을 기계에 학습시키고, 우리의 모델이 실제로 데이터를 보고 classification 작업을 할 때 invariant할 수 있도록 만들어주는 작업이다.

(1) Horizontal Flips

- 좌우반전 시켜도 사람 눈엔 동일한 고양이

- 우리의 분류기도 사람처럼 좌우반전된 두 사진이 모두 '고양이'로 분류하도록 두 번 학습시킴

(2) Random Crops and Scales

- random crop을 통해 고양이의 일부만 보이더라도 분류기가 'cat'으로 인식하길 바람

- translation invariance : two images with just a few pixels shifted should be recognized same, even though the pixels are completely different

# ResNet

Training : sample random crops/scales

(1) Pick random L in range [256, 480]

(2) Resize the image so that the shorter side = L

(3) Sample random 224*224 patch from the resized image (L이 클수록 더 확대된 상태가 샘플됨)

Testing : average a fixed set of crops

(1) Resize image at 5 scales : {224, 256, 384, 480, 640}

(2) For each size, use 10 224*224 crops : 4 corners + center, and their flips

(3) Use their average or max value for final classification

(3) Color Jitter

- 색조의 차이

- Hue (색상), Saturation (채도), Lightness (명도)에 따라 이미지의 색상이 달라짐

- HSL이 달라진다고 다른 label로 분류되지 않도록 분류기를 학습시켜야 함

- convert from RGB to HSL, add noise to one component, and then reconvert to RGB

(4) 그 외 data domain 에 따라 다양함

https://pytorch.org/vision/main/transforms.html

Transforming and augmenting images — Torchvision main documentation

Shortcuts

pytorch.org

출처 : cs231n

'Study > 딥러닝' 카테고리의 다른 글

파이토치 (0)	2023.12.11
Weight Initialization (1)	2023.10.22
Activation Functions (0)	2023.10.21
Convolutional Neural Networks (2)	2023.10.14
Nerual Networks and Backpropagation (0)	2023.10.13