Multinomial Logistic Regression

Notice

Recent Posts

Recent Comments

Link

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

mojo's Blog

Multinomial Logistic Regression 본문

머신러닝

Multinomial Logistic Regression

_mojo_ 2023. 3. 29. 17:41

Multinomial Logistic Regression

※ Multinomial Logistic Regression

It is a classification method that generalizes logistic regression to the multiclass problem,

i.e. with more than two possible discrete outcomes.

-> Also called softmax regression and multinomial logit

Example

- Which major will a student choose, given a status of the student?

- Which blood type does a person have, given the results of various diagnostic tests?

Classifying samples into one of three or more classes

- Binary classification: Classifying samples into one of two classes

How to classify multiple classes with some boundaries

class가 \(k \)개를 보유할 때, 최소한 \(k-1 \)개의 classifier가 필요하다.

위 그림에서는 class가 3개이므로 2 개의 decision boundary를 통해 데이터를 분류하였다.

Considering a linear decision boundary for each class

- Use \(k \) classifiers for \(k \) classes.

위 사진은 세 개의 classifier 로 각 데이터를 분류한 모습이다.

대략적으로 분류한 방법을 살펴본다면,

(1) 빨간색과 그 외의 색들로 decision boundary 를 결정

(2) 파란색과 그 외의 색들로 decision boundary 를 결정

(3) 초록색과 그 외의 색들로 decision boundary 를 결정

한다고 볼 수 있다.

※ Example: Image Classification

CIFAR - 10

- 10 labels

- 50,000 training images

- 10,000 testing images

- Each image is 32x32x3

Given an input \(x\in \mathbb{R} ^{3072\times 1}\), \(f(x; W, b)\) returns \(y\in \mathbb{R} ^{10\times 1}\)

결국 학습하고자 하는 f 함수는 위와 같이 표현이 된다.

이때, W와 b가 학습하고자 하는 parameter 라고 볼 수 있다.

위 식을 행렬 형태로 다시 살펴보도록 한다.

x 벡터는 32x32x3 의 그림을 3072x1 형태로 변형한 input 벡터이다.

학습하고자 하는 W, b는 f 함수가 최대가 나오도록 벡터들이다.

그리고 bias 인 b 를 더하지 않고 W와 x로 표현하도록 W와 b를 합치는 방법으로 아래와 같다.

위와 같이 표현함으로써 data의 어떤 부분을 학습하면 될지에 대해 보다 효과적으로 표현할 수 있게 된다.

Formulating Multinomial Logistic Regression

※ Computing the Logit

The logit is computed

\( logit = ln \frac{P(y=j|x, W)}{1-P(y=j|x, W)} = w_{j}^{T}x \)

\( \frac{P(y=j|x, W)}{1-P(y=j|x, W)} = e^{w_{j}^{T}x} \)

j 번째 클래스에 속하는지에 대한 값을 로그 함수로 취한 값(logit) 으로 표현된다.

그리고 j 번째 클래스 뿐만 아니라 각각의 클래스에도 위와 같은 식이 적용된다.

※ What is the Softmax Function?

To represent a probability, the odds are normalized.

\( P(y=j|x, W)= \frac{e^{w_{j}^{T}x}}{\sum_{i=1}^{k}e^{w_{j}^{T}x}} \)

\(\begin{bmatrix} P(y=1|x,W)\\ P(y=2|x,W)\\ \vdots \\ P(y=k|x,W)\\ \end{bmatrix} = \frac{1}{\sum_{i=1}^{k} e^{w_{i}^{T}x} } \begin{bmatrix} e^{w_{1}^{T}x}\\ e^{w_{2}^{T}x}\\ \vdots \\ e^{w_{k}^{T}x}\\ \end{bmatrix}\)

위와 같은 함수를 softmax function 이라고 부른다.

주어진 x와 학습하고자 하는 W에 대해 j 번째 클래스에 속할 것인지 또는 다른 클래스에 속할 것인지를

위와 같이 표현할 수 있다.

그리고 분모에 sum 값은 normalization term 으로서 각각의 클래스의 확률 값을 다 더할 때 1이 되게끔

하는 것으로 볼 수 있다.

We want to mazimize the probability for the correct class.

위와 같이 각각의 정규화되지 않은 값들을 더해서 나눠줘서 확률값들을 표현할 수 있다.

결국 정규화 과정을 걸쳐서 확률들의 합이 1이 되도록 하는 것이다.

※ Formulating the Error Function

Generalizing the error function of binary classification

\( E(w) = -\sum_{i=1}^{n} \left ( y^{(i)}ln(P(y^{(i)}=1|x^{(i)},w)) + (1-y^{(i)})ln(1 - P(y^{(i)}=1|x^{(i)},w)) \right ) \)

위와 같은 binary classification 을 아래와 같이 일반화한다.

\( E(w) = -\sum_{i=1}^{n}\sum_{j=1}^{k} \Pi \left [ y^{(i)} = j \right ] ln (P(y^{(i)} = j | x^{(i)}, w)) \)

\(\Pi \left [ y^{(i)} = j \right ] = \begin{cases} 1 & \text{ if } y^{(i)}=j \\ 0 & otherwise \end{cases}\) - indicator function

\( P(y^{(i)} = j | x^{(i)}, w) = \frac{e^{w_{j}^{T}x^{(i)}}}{\sum_{i=1}^{k}e^{w_{i}^{T}x^{(i)}}} \)

위 식은 결국 logistic regression 에서 봤던 cross entropy function 을 K 개의 label 에 대해서

일반화한 식으로 이해할 수 있겠다.

정리하면 softmax function 을 이용해서 error function 이 정의되고,

indicator function 에 의해서 특정한 부분에 집중해서 표현이 된다.

Training Multinomial Logistic Regression

※ Training Logistic Regression

Simple concept: follow the gradient downhill

Process

(1) Pick a start position: \(w^{0} = (w_{0}, ..., w_{d}) \)

(2) Determine the descent direction: \(\Delta w = \bigtriangledown E(w^{t}) \)

(3) Choose a learning rate: \(\eta\)

(4) Update your position: \(w^{t + 1} = w^{t} - \eta \Delta w\)

(5) Repeat from 2) until stopping criterion-is satisfied.

이전과 같이 Gradient Descent 방식을 적용해서 해결한다.

Randomly choose an initial solution \(w^{0}\),

Repeat

Choose a random sample set \( B \subseteq D \).

\( \bigtriangleup w = \sum_{ (x^{(i)}, y^{(i)})\in B }^{} (h(x^{(i)}) - y^{(i)})x^{(i)} \)

\(w^{t + 1} = w^{t} - \eta \bigtriangleup w \)

Until stopping condition is satisfied.

이 방식은 mini batch 기반의 gradient descent 를 적용한 사례이다.

logistic regression을 일반화한 softmax regression의 경우에도 전반적인 모양은 같지만,

다만 \( \bigtriangleup w\) 를 계산하는 부분만 달라지는 것이라고 볼 수 있겠다.

※ Solving Softmax Regression by GD

For the error function, compute the partial derivative of \(w\).

\( E(w) = -\sum_{i=1}^{n}\sum_{j=1}^{k} \Pi \left [ y^{(i)} = j \right ] ln (P(y^{(i)} = j | x^{(i)}, w)) \)

Then, apply \(\bigtriangleup w = \frac{\partial E}{\partial w}\) to the gradient descent method.

\(w^{t + 1} = w^{t} - \eta \bigtriangleup w \)

결국 필요한 것은 \(\bigtriangleup w = \frac{\partial E}{\partial w}\) 의 값이다.

Error function 을 \(w\) 에 대해 편미분하여 Gradient Descent 를 적용하면 된다.

'머신러닝' 카테고리의 다른 글

Cross-Validation (0)	2023.04.04
The Overfitting Problem (0)	2023.04.03
The Concept of Logistic Regression (0)	2023.03.29
Parameter Estimation (0)	2023.03.21
Classification Problem (0)	2023.03.19

'머신러닝' Related Articles

Comments

mojo's Blog

Multinomial Logistic Regression 본문

Multinomial Logistic Regression

Multinomial Logistic Regression

Formulating Multinomial Logistic Regression

Training Multinomial Logistic Regression

'머신러닝' 카테고리의 다른 글

티스토리툴바