[머신러닝 코세라 강의] (2주차) "Gradient Descent 벡터 표현" Machine Learning (by Andrew Ng)

머신 러닝

[머신러닝 코세라 강의] (2주차) "Gradient Descent 벡터 표현" Machine Learning (by Andrew Ng)

마빈 Marvin 2022. 5. 29. 05:15

Andrew Ng 교수님의 Coursera 머신 러닝 수업 중 Octave 를 사용하는 튜토리얼 내용을 정리중이다. 오늘은 다른 프로그래밍 언어에서도 참고할만한 vectorization 개념과 Octave 코드에 대해서 다루어보도록 하겠다. 파이썬 코드는 다음 포스팅에 마련해두어야겠다.

Octave 관련 내용 목차:

(1) Basic Operations

(2) Moving Data Around

(3) Computing Data

(4) Plotting Data

(5) Control Statements: for, while, if statement
[Octave 관련 이전 포스팅, (1-5) 바로가기 링크]

(6) Vectorization (이번 포스트)

(6) Vectorization

(numerical) linear algebra 라이브러리를 이용하면 속도가 더 빨라질 수 있다.

$$h_{\theta} (x) = \sum_{j=0}^n \theta_j x_j $$

$$=\theta^T x $$

where $\theta$ 와 $x$ 는 vector.

Unvectorized implementation.

octave:3> prediction = 0.0;

octave:4> n = length(x)

n = 3

octave:5> for j = 1:n,

> prediction = prediction + theta(j)*x(j)

> end;

prediction = 2

prediction = 14

prediction = 44

Vectorized implementation.

octave:6> prediction = theta'*x;

octave:7> prediction

prediction = 44

두 값이 동일함을 볼 수 있다. 코드도 짧고 속도도 빠르다.

Gradient descent

$$\theta_j := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta (x_i) - y_i ) x_i^j$$

for $n=2$,

$\theta_0 := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta (x_i) - y_i ) x_i^0$

$\theta_1 := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta (x_i) - y_i ) x_i^1$

$\theta_2 := \theta_j - \alpha \frac{1}{m} \sum_{i=1}^m (h_\theta (x_i) - y_i ) x_i^2$

를 벡터 형태로 만들어보자.

$$\theta := \theta - \alpha \delta$$

where $\delta = \frac{1}{m} \sum_{i=1}^m (h_\theta (x_i) - y_i ) x_i$

에서 $\theta \in \mathbb{R}^{n+1}$, $\alpha \in \mathbb{R}^n$, $\delta \in \mathbb{R}^{n+1}$.

$\delta$ 에 집중해보자.

$\delta_0 = \frac{1}{m} \sum_{i=1}^m (h_\theta (x_i) - y_i ) x_i^0$.

인데, $(h_\theta (x_i) - y_i ) \in \mathbb{R}$ 이고, $x_i \in \mathbb{R}^{n+1}$.

$\delta = \sum_{i=1}^m (h_\theta (x_i)-y_i) x_i = (h_\theta (x_1) - y_1) x_1 + (h_\theta (x_2) - y_2) x_2 + ... + (h_\theta (x_n) - y_n ) x_n $ 이다.

여기서 각각의 $(h_\theta (x_i) - y_i) x_i$ 는 scala $\times$ vector 형태이다.

$\delta$ 를 매트릭스의 곱형태로 표현해보자.

$\delta = \sum_{i=1}^m (h_\theta (x_i)-y_i) x_i = (h_\theta (x_1) - y_1) x_1 + (h_\theta (x_2) - y_2) x_2 + ... + (h_\theta (x_n) - y_n ) x_n$

$=(h_\theta (x_1) - y_1) \begin{bmatrix} x_1^0 \\ x_1^1 \\ x_1^2 \end{bmatrix} + (h_\theta (x_2) - y_2) \begin{bmatrix} x_2^0 \\ x_2^1 \\ x_2^2 \end{bmatrix} + ... + (h_\theta (x_n) - y_n) \begin{bmatrix} x_n^0 \\ x_n^1 \\ x_n^2 \end{bmatrix}$

$=\begin{bmatrix} x_1^0 & x_2^0 & ... & x_n^0 \\ x_1^1 & x_2^1 & ... & x_n^1 \\ x_1^2 & x_2^2 & ... & x_n^2 \end{bmatrix} \begin{bmatrix} (h_\theta (x_1) - y_1) \\ (h_\theta (x_2) - y_2) \\ ... \\ (h_\theta (x_n) - y_n) \end{bmatrix}$

$= x^T (h_\theta (x) - y) $

where $y = \begin{bmatrix} y_1 \\ y_2 \\ ... \\ y_n \end{bmatrix}$

마무리하며

[ ] 추후에 gradient descent 알고리즘을 위의 matrix 방식을 적용하면 계산 속도가 더 빨라질 것 같다.

- cost function 과 gradient descent algorithm 을 for loop 을 이용해서 구하는 링크

부록

NOTE: $x^T ( h_\theta (x) - y)$ 를 도출하는 과정에서 vector ($x^T$) 와 scala ($h_\theta (x_i) - y_i$) 들과 관련된 내용:

$\begin{bmatrix} 1 & 3 & 5 \\ 2 & 4 & 6 \end{bmatrix} \begin{bmatrix} a \\ b \\ c \end{bmatrix} $

$=\begin{bmatrix} 1 a + 3b + 5c \\ 2 a + 4 b + 6 c \end{bmatrix} $

$=\begin{bmatrix} 1 \\ 2 \end{bmatrix} a + \begin{bmatrix} 3 \\ 4 \end{bmatrix} b + \begin{bmatrix} 5 \\ 6 \end{bmatrix} c$

'머신 러닝' 카테고리의 다른 글

[머신러닝 코세라 강의] (3주차) "로지스틱 회귀" Machine Learning (by Andrew Ng) (0)	2022.05.30
[머신러닝 코세라 강의] (2주차) "Cost function 의 벡터 표현" Machine Learning (by Andrew Ng) (0)	2022.05.29
[머신러닝 코세라 강의] (2주차) "Cost Function & Gradient Descent" Machine Learning (by Andrew Ng) (0)	2022.05.26
[데이터과학 코세라 강의] (1주차) 파이썬을 이용한 머신러닝 (0)	2022.05.22
[데이터과학 코세라 강의] (4주차) 데이터 과학을 위한 파이썬 (0)	2022.05.21

현재글[머신러닝 코세라 강의] (2주차) "Gradient Descent 벡터 표현" Machine Learning (by Andrew Ng)

데이터 과학과 경제학 PhD