[Python] Decision Tree 구성하기

Python/Python 실습 2025. 8. 8. 16:38

Decision Tree 정의

Decision Tree (의사결정나무)는 직관적이고, 널리 쓰이는 지도 학습 알고리즘 중 하나다. 데이터를 분류하고 예측하는데 활용할 수 있다. 데이터를 기준 (Feature)로 질문을 계속하면서, 트리 형태로 분기해 나가는 알고리즘이다. 최종적으로 분류나 수치 예측 결과에 도달한다.

Decision Tree 개념

- Node : 하나의 질문 또는 분기점

- Root Node : 맨 상위에 있는 Node, 첫 질문

- Leaf Node : 맨 마지막, 자식이 없는 Node

- Branch : 질문 결과에 따라 나뉘는 경로

- Depth : 질문의 단계수. 너무 깊으면 과적합 위험

Decision Tree 절차

- 데이터를 나누는 질문을 선택할 때, 가능한 순수하게 만들기 위해 사용된다. 질문에 의해 분할된 클래스가 혼합되지 않도록 하는 것이 목적이다.

- 범주형 결과 (Class)를 예측하는 분류 (Classification)과, 연속형 숫자를 예측하는 (Regression)이 있다.

- 장점 : 이해하기 쉽고, 시각화 가능하다. 전처리가 거의 필요 없다.

- 단점 : 과적합되기 쉽다. 약간의 데이터 변화에도 트리가 크게 달라질 수 있다. 일반화 성능이 떨어질 수 있다.

Decision Tree 예제코드

# 1. 라이브러리 불러오기
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
import matplotlib.pyplot as plt
import pandas as pd
import os

# 2. 데이터 로드
iris = load_iris()
X = iris.data          # 특성 (꽃받침 길이, 너비, 꽃잎 길이, 너비)
y = iris.target        # 타겟 (0: setosa, 1: versicolor, 2: virginica)

df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
df['target_name'] = df['target'].apply(lambda x: iris.target_names[x])

save_path = "D:/PythonData"
csv_filename = "iris_dataset.csv"
full_path = os.path.join(save_path, csv_filename)
os.makedirs(save_path, exist_ok=True)
df.to_csv(full_path, index=False)

# 3. 훈련용/테스트용 데이터 분할
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 4. 모델 생성 및 훈련
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

# 5. 예측 및 정확도 평가
accuracy = model.score(X_test, y_test)
print(f"테스트 정확도: {accuracy:.2f}")

# 6. 결정 트리 시각화
plt.figure(figsize=(12, 8))
plot_tree(model, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.show()

1. 라이브러리 불러오기

from sklearn.datasets import load_iris

- scikit-learn에서 제공하는 붓꽃 데이터를 불러온다.

from sklearn.model_selection import train_test_split

- 데이터를 훈련셋과 테스트셋으로 나눌 때 사용한다.

from sklearn.tree import DecisionTreeClassifier, plot_tree

- Decision Tree 시각화를 위해 사용된다.

2. 데이터 로드

- load_iris() 함수를 통해 데이터를 라이브러리에서 불러온다.

- iris.data, iris.target으로 구분하여 데이터를 받는다.

f = pd.DataFrame(iris.data, columns=iris.feature_names)

df['target'] = iris.target

df['target_name'] = df['target'].apply(lambda x: iris.target_names[x])

save_path = "D:/PythonData"

csv_filename = "iris_dataset.csv"

full_path = os.path.join(save_path, csv_filename)

os.makedirs(save_path, exist_ok=True)

df.to_csv(full_path, index=False)

- pandas 라이브러리를 통해, 불러온 데이터를 csv파일로 저장할 수 있다.

- pandas 라이브러리를 통해, 불러온 데이터를 csv파일로 저장할 수 있다.

Sepal	꽃받침
Petal	꽃잎
Setosa	붓꽃 종류. 작고 귀엽게 생김
Versicolor	붓꽃 종류. 중간 크기
Virginica	붓꽃 종류. 가장 큼

3. 데이터 분할

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

- 모델이 훈련셋 데이터에 과하게 맞추는 걸 방지하기 위해, Train Set과 Test Set으로 나눈다.

4. 모델 생성 및 학습

model = DecisionTreeClassifier(random_state=42)

model.fit(X_train, y_train)

- 트리 기반 모델을 생성하고, Train 데이터를 통해 모델 학습을 수행한다.

5. 정확도 평가

accuracy = model.score(X_test, y_test)

print(f"테스트 정확도: {accuracy:.2f}")

- 학습된 모델이 테스트 데이터로 얼마나 잘 예측했는지 측정한다.

6. 시각화

plt.figure(figsize=(12, 8))

plot_tree(model, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)

plt.show()

- 의사결정 트리를 시각적으로 나타내어 모델의 의사결정 과정을 볼 수 있다.

결과

gini : 지니 불순도	gini = 0 : 오직 한 클래스만 있음 (완벽한 분류) gini = 0.5 : 두 클래스가 반반 섞여있음 gini = 0.667 : 세 클래스가 1/3씩 섞임 (최악의 경우)
samples	샘플수. 트리 분할이 진행될수록 sample 수는 줄어듬
Value	[setosa수, Versicolor수, Virginica수]. 각 클래스별 샘플 개수
Class	해당 노드에서 가장 많은 빈도의 클래스를 예측값으로 선택

'Python > Python 실습' 카테고리의 다른 글

[Python] Feature 순위 매기기 (ANOVA F-검정) (2)	2025.08.17
[Python] 파이썬 설치하기, VSCODE 가상 환경 구축 (1)	2025.07.26
[Python] p-value 의미와 사용 방법 (1)	2024.10.09

ABOUT ME

블루 프로그래머 블루 프로그래머

Decision Tree 정의

Decision Tree 개념

Decision Tree 절차

Decision Tree 예제코드

1. 라이브러리 불러오기

2. 데이터 로드

3. 데이터 분할

4. 모델 생성 및 학습

5. 정확도 평가

6. 시각화

결과

'Python > Python 실습' 카테고리의 다른 글

티스토리툴바

ABOUT ME

Decision Tree 정의

Decision Tree 개념

Decision Tree 절차

Decision Tree 예제코드

1. 라이브러리 불러오기

2. 데이터 로드

3. 데이터 분할

4. 모델 생성 및 학습

5. 정확도 평가

6. 시각화

결과

'Python > Python 실습' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바