[빅분기] 데이터 전처리 정리

롤 랭킹 데이터 : https://www.kaggle.com/datasnaek/league-of-legendsDataUrl = ‘https://raw.githubusercontent.com/Datamanim/pandas/main/lol.csv’

pandas.read_csv("csv 파일명 or 경로")

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/pandas/main/lol.csv")

로드한 df 파일을 확인해보면 뭔가 이상하다.

\를 기준으로 구별이 필요하다.

.read_csv 옵션에서 sep='\t'를 추가하여 로드해주기

df = pd.read_csv("https://raw.githubusercontent.com/Datamanim/pandas/main/lol.csv", sep="\t")

df.head(), df.tail() ()안에 숫자는 변경 가능, defalt는 5.

# 상위 5개 행
df.head(5)

# 하위 5개 행
df.tail(5)

df.shape # (51490, 61)

df.shape를 이용해서 행과 열의 개수를 각각 확인 할 수 있다.

# 행 개수 확인
df.shape[0] # 51490

# 열 개수 확인
df.shape[1] # 61

df.columns

print(df.columns)

print 함수로 출력하면 좀 더 깔끔하게 확인 할 수 있다. 출력 결과를 확인하면 리스트로 묶여있는 걸 확인 할 수 있는데 인덱싱을 사용해서 n원하는 인덱스의 특정 컬럼명을 불러올 수 있다.

df.columns[0] # gameId

# Series 
df['열이름'].dtype

# DataFrame 
df.info()

Series가 여러개 모이면 DataFrame이 된다.

# Series
df['열이름']

# Dataframe
df[['열이름']]

df.index # RangeIndex(start=0, stop=51490, step=1)

데이터 셋의 인덱스 구성은 변경이 가능하다.

[빅분기] 빅데이터 분석기사 2유형 템플릿 (1)	2024.11.28
[빅분기] 빅데이터 분석기사 3유형 벼락치기 요약 정리 (0)	2024.11.26
[빅분기] 빅데이터 분석기사 실기 3유형 테스트 문제풀이 (0)	2024.11.26
Pandas axis=1 뜻 with df.drop 함수 (0)	2024.11.23
Pandas TypeError: agg function failed 해결 with select_dtypes (1)	2024.11.23

티스토리툴바