본문 바로가기

상상의 창 블로그/배움의 창

케글연습 첫날 (22.08.23)

728x90

 

예전에 사 놓았던 케글 우승작으로 배우는 머신러닝 책을 오랜만에 다시 보게되었다.

 

전에 구매하고 조금 하다가 이래저래 못하다가 갑자기 생각이 나서 다시 해 보게 되었다.

 

예전 케글 내용이지만 오랜만에 접속해서 데이터를 내려 받고 하나 시작을 해 보았다.

 

처음 해 본 것은 스페인의 산탄데르 은행이 제시한 은행방문고객에게 제품을 추천해주는 내용의 모델을 만드는 프로젝트이다.

 

트레이닝데이터가 13만개, 변수가 48개이다.

 

전반적인 내용을 둘러보는 내용까지만 해 보았는데 오랜만에 해 보니 쉽진 않았다.

 

 

import pandas as pd import numpy as np trn = pd.read_csv('train_ver2.csv')
  •  

C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2698: DtypeWarning: Columns (5,8,11,15) have mixed types. Specify dtype option on import or set low_memory=False. interactivity=interactivity, compiler=compiler, result=result)

In [2]:

 
trn.shape
  •  

Out[2]:

(13647309, 48)

In [3]:

 

 
trn.head()

 

Out[3]:


fecha_dato
ncodpers
ind_empleado
pais_residencia
sexo
age
fecha_alta
ind_nuevo
antiguedad
indrel
...
ind_hip_fin_ult1
ind_plan_fin_ult1
ind_pres_fin_ult1
ind_reca_fin_ult1
ind_tjcr_fin_ult1
ind_valo_fin_ult1
ind_viv_fin_ult1
ind_nomina_ult1
ind_nom_pens_ult1
ind_recibo_ult1
0
2015-01-28
1375586
N
ES
H
35
2015-01-12
0.0
6
1.0
...
0
0
0
0
0
0
0
0.0
0.0
0
1
2015-01-28
1050611
N
ES
V
23
2012-08-10
0.0
35
1.0
...
0
0
0
0
0
0
0
0.0
0.0
0
2
2015-01-28
1050612
N
ES
V
23
2012-08-10
0.0
35
1.0
...
0
0
0
0
0
0
0
0.0
0.0
0
3
2015-01-28
1050613
N
ES
H
22
2012-08-10
0.0
35
1.0
...
0
0
0
0
0
0
0
0.0
0.0
0
4
2015-01-28
1050614
N
ES
V
23
2012-08-10
0.0
35
1.0
...
0
0
0
0
0
0
0
0.0
0.0
0

 

5 rows × 48 columns

In [4]:

for col in trn.columns: print('{}\n'.format(trn[col].head()))

 

0 2015-01-28 1 2015-01-28 2 2015-01-28 3 2015-01-28 4 2015-01-28 Name: fecha_dato, dtype: object 0 1375586 1 1050611 2 1050612 3 1050613 4 1050614 Name: ncodpers, dtype: int64 0 N 1 N 2 N 3 N 4 N Name: ind_empleado, dtype: object 0 ES 1 ES 2 ES 3 ES 4 ES Name: pais_residencia, dtype: object 0 H 1 V 2 V 3 H 4 V Name: sexo, dtype: object 0 35 1 23 2 23 3 22 4 23 Name: age, dtype: object 0 2015-01-12 1 2012-08-10 2 2012-08-10 3 2012-08-10 4 2012-08-10 Name: fecha_alta, dtype: object 0 0.0 1 0.0 2 0.0 3 0.0 4 0.0 Name: ind_nuevo, dtype: float64 0 6 1 35 2 35 3 35 4 35 Name: antiguedad, dtype: object 0 1.0 1 1.0 2 1.0 3 1.0 4 1.0 Name: indrel, dtype: float64 0 NaN 1 NaN 2 NaN 3 NaN 4 NaN Name: ult_fec_cli_1t, dtype: object 0 1 1 1 2 1 3 1 4 1 Name: indrel_1mes, dtype: object 0 A 1 I 2 I 3 I 4 A Name: tiprel_1mes, dtype: object 0 S 1 S 2 S 3 S 4 S Name: indresi, dtype: object 0 N 1 S 2 N 3 N 4 N Name: indext, dtype: object 0 NaN 1 NaN 2 NaN 3 NaN 4 NaN Name: conyuemp, dtype: object 0 KHL 1 KHE 2 KHE 3 KHD 4 KHE Name: canal_entrada, dtype: object 0 N 1 N 2 N 3 N 4 N Name: indfall, dtype: object 0 1.0 1 1.0 2 1.0 3 1.0 4 1.0 Name: tipodom, dtype: float64 0 29.0 1 13.0 2 13.0 3 50.0 4 50.0 Name: cod_prov, dtype: float64 0 MALAGA 1 CIUDAD REAL 2 CIUDAD REAL 3 ZARAGOZA 4 ZARAGOZA Name: nomprov, dtype: object 0 1.0 1 0.0 2 0.0 3 0.0 4 1.0 Name: ind_actividad_cliente, dtype: float64 0 87218.10 1 35548.74 2 122179.11 3 119775.54 4 NaN Name: renta, dtype: float64 0 02 - PARTICULARES 1 03 - UNIVERSITARIO 2 03 - UNIVERSITARIO 3 03 - UNIVERSITARIO 4 03 - UNIVERSITARIO Name: segmento, dtype: object 0 0 1 0 2 0 3 0 4 0 Name: ind_ahor_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_aval_fin_ult1, dtype: int64 0 1 1 1 2 1 3 0 4 1 Name: ind_cco_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_cder_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_cno_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_ctju_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_ctma_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_ctop_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_ctpp_fin_ult1, dtype: int64 0 0 1 0 2 0 3 1 4 0 Name: ind_deco_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_deme_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_dela_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_ecue_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_fond_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_hip_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_plan_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_pres_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_reca_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_tjcr_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_valo_fin_ult1, dtype: int64 0 0 1 0 2 0 3 0 4 0 Name: ind_viv_fin_ult1, dtype: int64 0 0.0 1 0.0 2 0.0 3 0.0 4 0.0 Name: ind_nomina_ult1, dtype: float64 0 0.0 1 0.0 2 0.0 3 0.0 4 0.0 Name: ind_nom_pens_ult1, dtype: float64 0 0 1 0 2 0 3 0 4 0 Name: ind_recibo_ult1, dtype: int64

In [5]:

 
trn.info()

 

<class 'pandas.core.frame.DataFrame'> RangeIndex: 13647309 entries, 0 to 13647308 Data columns (total 48 columns): fecha_dato object ncodpers int64 ind_empleado object pais_residencia object sexo object age object fecha_alta object ind_nuevo float64 antiguedad object indrel float64 ult_fec_cli_1t object indrel_1mes object tiprel_1mes object indresi object indext object conyuemp object canal_entrada object indfall object tipodom float64 cod_prov float64 nomprov object ind_actividad_cliente float64 renta float64 segmento object ind_ahor_fin_ult1 int64 ind_aval_fin_ult1 int64 ind_cco_fin_ult1 int64 ind_cder_fin_ult1 int64 ind_cno_fin_ult1 int64 ind_ctju_fin_ult1 int64 ind_ctma_fin_ult1 int64 ind_ctop_fin_ult1 int64 ind_ctpp_fin_ult1 int64 ind_deco_fin_ult1 int64 ind_deme_fin_ult1 int64 ind_dela_fin_ult1 int64 ind_ecue_fin_ult1 int64 ind_fond_fin_ult1 int64 ind_hip_fin_ult1 int64 ind_plan_fin_ult1 int64 ind_pres_fin_ult1 int64 ind_reca_fin_ult1 int64 ind_tjcr_fin_ult1 int64 ind_valo_fin_ult1 int64 ind_viv_fin_ult1 int64 ind_nomina_ult1 float64 ind_nom_pens_ult1 float64 ind_recibo_ult1 int64 dtypes: float64(8), int64(23), object(17) memory usage: 4.9+ GB

In [6]:

num_cols=[col for col in trn.columns[:24] if trn[col].dtype in ['int64', 'float64']] trn[num_cols].describe()

 

Out[6]:


ncodpers
ind_nuevo
indrel
tipodom
cod_prov
ind_actividad_cliente
renta
count
1.364731e+07
1.361958e+07
1.361958e+07
13619574.0
1.355372e+07
1.361958e+07
1.085293e+07
mean
8.349042e+05
5.956184e-02
1.178399e+00
1.0
2.657147e+01
4.578105e-01
1.342543e+05
std
4.315650e+05
2.366733e-01
4.177469e+00
0.0
1.278402e+01
4.982169e-01
2.306202e+05
min
1.588900e+04
0.000000e+00
1.000000e+00
1.0
1.000000e+00
0.000000e+00
1.202730e+03
25%
4.528130e+05
0.000000e+00
1.000000e+00
1.0
1.500000e+01
0.000000e+00
6.871098e+04
50%
9.318930e+05
0.000000e+00
1.000000e+00
1.0
2.800000e+01
0.000000e+00
1.018500e+05
75%
1.199286e+06
0.000000e+00
1.000000e+00
1.0
3.500000e+01
1.000000e+00
1.559560e+05
max
1.553689e+06
1.000000e+00
9.900000e+01
1.0
5.200000e+01
1.000000e+00
2.889440e+07

 

In [9]:

cat_cols=[col for col in trn.columns[:24] if trn[col].dtype in ['O']] trn[cat_cols].describe()

 

Out[9]:


fecha_dato
ind_empleado
pais_residencia
sexo
age
fecha_alta
antiguedad
ult_fec_cli_1t
indrel_1mes
tiprel_1mes
indresi
indext
conyuemp
canal_entrada
indfall
nomprov
segmento
count
13647309
13619575
13619575
13619505
13647309
13619575
13647309
24793
13497528.0
13497528
13619575
13619575
1808
13461183
13619575
13553718
13457941
unique
17
5
118
2
235
6756
507
223
13.0
5
2
2
2
162
2
52
3
top
2016-05-28
N
ES
V
23
2014-07-28
0
2015-12-24
1.0
I
S
N
N
KHE
N
MADRID
02 - PARTICULARES
freq
931453
13610977
13553710
7424252
542682
57389
134335
763
7277607.0
7304875
13553711
12974839
1791
4055270
13584813
4409600
7960220

 

In [10]:

for col in cat_cols: uniq = np.unique(trn[col].astype(str)) print('-'*50) print('# col {}, n_uniq {}, uniq {}'.format(col, len(uniq), uniq))

 

-------------------------------------------------- # col fecha_dato, n_uniq 17, uniq ['2015-01-28' '2015-02-28' '2015-03-28' '2015-04-28' '2015-05-28' '2015-06-28' '2015-07-28' '2015-08-28' '2015-09-28' '2015-10-28' '2015-11-28' '2015-12-28' '2016-01-28' '2016-02-28' '2016-03-28' '2016-04-28' '2016-05-28'] -------------------------------------------------- # col ind_empleado, n_uniq 6, uniq ['A' 'B' 'F' 'N' 'S' 'nan'] -------------------------------------------------- # col pais_residencia, n_uniq 119, uniq ['AD' 'AE' 'AL' 'AO' 'AR' 'AT' 'AU' 'BA' 'BE' 'BG' 'BM' 'BO' 'BR' 'BY' 'BZ' 'CA' 'CD' 'CF' 'CG' 'CH' 'CI' 'CL' 'CM' 'CN' 'CO' 'CR' 'CU' 'CZ' 'DE' 'DJ' 'DK' 'DO' 'DZ' 'EC' 'EE' 'EG' 'ES' 'ET' 'FI' 'FR' 'GA' 'GB' 'GE' 'GH' 'GI' 'GM' 'GN' 'GQ' 'GR' 'GT' 'GW' 'HK' 'HN' 'HR' 'HU' 'IE' 'IL' 'IN' 'IS' 'IT' 'JM' 'JP' 'KE' 'KH' 'KR' 'KW' 'KZ' 'LB' 'LT' 'LU' 'LV' 'LY' 'MA' 'MD' 'MK' 'ML' 'MM' 'MR' 'MT' 'MX' 'MZ' 'NG' 'NI' 'NL' 'NO' 'NZ' 'OM' 'PA' 'PE' 'PH' 'PK' 'PL' 'PR' 'PT' 'PY' 'QA' 'RO' 'RS' 'RU' 'SA' 'SE' 'SG' 'SK' 'SL' 'SN' 'SV' 'TG' 'TH' 'TN' 'TR' 'TW' 'UA' 'US' 'UY' 'VE' 'VN' 'ZA' 'ZW' 'nan'] -------------------------------------------------- # col sexo, n_uniq 3, uniq ['H' 'V' 'nan'] -------------------------------------------------- # col age, n_uniq 219, uniq [' 2' ' 3' ' 4' ' 5' ' 6' ' 7' ' 8' ' 9' ' 10' ' 11' ' 12' ' 13' ' 14' ' 15' ' 16' ' 17' ' 18' ' 19' ' 20' ' 21' ' 22' ' 23' ' 24' ' 25' ' 26' ' 27' ' 28' ' 29' ' 30' ' 31' ' 32' ' 33' ' 34' ' 35' ' 36' ' 37' ' 38' ' 39' ' 40' ' 41' ' 42' ' 43' ' 44' ' 45' ' 46' ' 47' ' 48' ' 49' ' 50' ' 51' ' 52' ' 53' ' 54' ' 55' ' 56' ' 57' ' 58' ' 59' ' 60' ' 61' ' 62' ' 63' ' 64' ' 65' ' 66' ' 67' ' 68' ' 69' ' 70' ' 71' ' 72' ' 73' ' 74' ' 75' ' 76' ' 77' ' 78' ' 79' ' 80' ' 81' ' 82' ' 83' ' 84' ' 85' ' 86' ' 87' ' 88' ' 89' ' 90' ' 91' ' 92' ' 93' ' 94' ' 95' ' 96' ' 97' ' 98' ' 99' ' NA' '10' '100' '101' '102' '103' '104' '105' '106' '107' '108' '109' '11' '110' '111' '112' '113' '114' '115' '116' '117' '12' '126' '127' '13' '14' '15' '16' '163' '164' '17' '18' '19' '2' '20' '21' '22' '23' '24' '25' '26' '27' '28' '29' '3' '30' '31' '32' '33' '34' '35' '36' '37' '38' '39' '4' '40' '41' '42' '43' '44' '45' '46' '47' '48' '49' '5' '50' '51' '52' '53' '54' '55' '56' '57' '58' '59' '6' '60' '61' '62' '63' '64' '65' '66' '67' '68' '69' '7' '70' '71' '72' '73' '74' '75' '76' '77' '78' '79' '8' '80' '81' '82' '83' '84' '85' '86' '87' '88' '89' '9' '90' '91' '92' '93' '94' '95' '96' '97' '98' '99'] -------------------------------------------------- # col fecha_alta, n_uniq 6757, uniq ['1995-01-16' '1995-01-17' '1995-01-23' ... '2016-05-30' '2016-05-31' 'nan'] -------------------------------------------------- # col antiguedad, n_uniq 506, uniq [' 0' ' 1' ' 2' ' 3' ' 4' ' 5' ' 6' ' 7' ' 8' ' 9' ' 10' ' 11' ' 12' ' 13' ' 14' ' 15' ' 16' ' 17' ' 18' ' 19' ' 20' ' 21' ' 22' ' 23' ' 24' ' 25' ' 26' ' 27' ' 28' ' 29' ' 30' ' 31' ' 32' ' 33' ' 34' ' 35' ' 36' ' 37' ' 38' ' 39' ' 40' ' 41' ' 42' ' 43' ' 44' ' 45' ' 46' ' 47' ' 48' ' 49' ' 50' ' 51' ' 52' ' 53' ' 54' ' 55' ' 56' ' 57' ' 58' ' 59' ' 60' ' 61' ' 62' ' 63' ' 64' ' 65' ' 66' ' 67' ' 68' ' 69' ' 70' ' 71' ' 72' ' 73' ' 74' ' 75' ' 76' ' 77' ' 78' ' 79' ' 80' ' 81' ' 82' ' 83' ' 84' ' 85' ' 86' ' 87' ' 88' ' 89' ' 90' ' 91' ' 92' ' 93' ' 94' ' 95' ' 96' ' 97' ' 98' ' 99' ' NA' ' 100' ' 101' ' 102' ' 103' ' 104' ' 105' ' 106' ' 107' ' 108' ' 109' ' 110' ' 111' ' 112' ' 113' ' 114' ' 115' ' 116' ' 117' ' 118' ' 119' ' 120' ' 121' ' 122' ' 123' ' 124' ' 125' ' 126' ' 127' ' 128' ' 129' ' 130' ' 131' ' 132' ' 133' ' 134' ' 135' ' 136' ' 137' ' 138' ' 139' ' 140' ' 141' ' 142' ' 143' ' 144' ' 145' ' 146' ' 147' ' 148' ' 149' ' 150' ' 151' ' 152' ' 153' ' 154' ' 155' ' 156' ' 157' ' 158' ' 159' ' 160' ' 161' ' 162' ' 163' ' 164' ' 165' ' 166' ' 167' ' 168' ' 169' ' 170' ' 171' ' 172' ' 173' ' 174' ' 175' ' 176' ' 177' ' 178' ' 179' ' 180' ' 181' ' 182' ' 183' ' 184' ' 185' ' 186' ' 187' ' 188' ' 189' ' 190' ' 191' ' 192' ' 193' ' 194' ' 195' ' 196' ' 197' ' 198' ' 199' ' 200' ' 201' ' 202' ' 203' ' 204' ' 205' ' 206' ' 207' ' 208' ' 209' ' 210' ' 211' ' 212' ' 213' ' 214' ' 215' ' 216' ' 217' ' 218' ' 219' ' 220' ' 221' ' 222' ' 223' ' 224' ' 225' ' 226' ' 227' ' 228' ' 229' ' 230' ' 231' ' 232' ' 233' ' 234' ' 235' ' 236' ' 237' ' 238' ' 239' ' 240' ' 241' ' 242' ' 243' ' 244' ' 245' ' 246' '-999999' '0' '1' '10' '100' '101' '102' '103' '104' '105' '106' '107' '108' '109' '11' '110' '111' '112' '113' '114' '115' '116' '117' '118' '119' '12' '120' '121' '122' '123' '124' '125' '126' '127' '128' '129' '13' '130' '131' '132' '133' '134' '135' '136' '137' '138' '139' '14' '140' '141' '142' '143' '144' '145' '146' '147' '148' '149' '15' '150' '151' '152' '153' '154' '155' '156' '157' '158' '159' '16' '160' '161' '162' '163' '164' '165' '166' '167' '168' '169' '17' '170' '171' '172' '173' '174' '175' '176' '177' '178' '179' '18' '180' '181' '182' '183' '184' '185' '186' '187' '188' '189' '19' '190' '191' '192' '193' '194' '195' '196' '197' '198' '199' '2' '20' '200' '201' '202' '203' '204' '205' '206' '207' '208' '209' '21' '210' '211' '212' '213' '214' '215' '216' '217' '218' '219' '22' '220' '221' '222' '223' '224' '225' '226' '227' '228' '229' '23' '230' '231' '232' '233' '234' '235' '236' '237' '238' '239' '24' '240' '241' '242' '243' '244' '245' '246' '247' '248' '249' '25' '250' '251' '252' '253' '254' '255' '256' '26' '27' '28' '29' '3' '30' '31' '32' '33' '34' '35' '36' '37' '38' '39' '4' '40' '41' '42' '43' '44' '45' '46' '47' '48' '49' '5' '50' '51' '52' '53' '54' '55' '56' '57' '58' '59' '6' '60' '61' '62' '63' '64' '65' '66' '67' '68' '69' '7' '70' '71' '72' '73' '74' '75' '76' '77' '78' '79' '8' '80' '81' '82' '83' '84' '85' '86' '87' '88' '89' '9' '90' '91' '92' '93' '94' '95' '96' '97' '98' '99'] -------------------------------------------------- # col ult_fec_cli_1t, n_uniq 224, uniq ['2015-07-01' '2015-07-02' '2015-07-03' '2015-07-06' '2015-07-07' '2015-07-08' '2015-07-09' '2015-07-10' '2015-07-13' '2015-07-14' '2015-07-15' '2015-07-16' '2015-07-17' '2015-07-20' '2015-07-21' '2015-07-22' '2015-07-23' '2015-07-24' '2015-07-27' '2015-07-28' '2015-07-29' '2015-07-30' '2015-08-03' '2015-08-04' '2015-08-05' '2015-08-06' '2015-08-07' '2015-08-10' '2015-08-11' '2015-08-12' '2015-08-13' '2015-08-14' '2015-08-17' '2015-08-18' '2015-08-19' '2015-08-20' '2015-08-21' '2015-08-24' '2015-08-25' '2015-08-26' '2015-08-27' '2015-08-28' '2015-09-01' '2015-09-02' '2015-09-03' '2015-09-04' '2015-09-07' '2015-09-08' '2015-09-09' '2015-09-10' '2015-09-11' '2015-09-14' '2015-09-15' '2015-09-16' '2015-09-17' '2015-09-18' '2015-09-21' '2015-09-22' '2015-09-23' '2015-09-24' '2015-09-25' '2015-09-28' '2015-09-29' '2015-10-01' '2015-10-02' '2015-10-05' '2015-10-06' '2015-10-07' '2015-10-08' '2015-10-09' '2015-10-13' '2015-10-14' '2015-10-15' '2015-10-16' '2015-10-19' '2015-10-20' '2015-10-21' '2015-10-22' '2015-10-23' '2015-10-26' '2015-10-27' '2015-10-28' '2015-10-29' '2015-11-02' '2015-11-03' '2015-11-04' '2015-11-05' '2015-11-06' '2015-11-09' '2015-11-10' '2015-11-11' '2015-11-12' '2015-11-13' '2015-11-16' '2015-11-17' '2015-11-18' '2015-11-19' '2015-11-20' '2015-11-23' '2015-11-24' '2015-11-25' '2015-11-26' '2015-11-27' '2015-12-01' '2015-12-02' '2015-12-03' '2015-12-04' '2015-12-07' '2015-12-09' '2015-12-10' '2015-12-11' '2015-12-14' '2015-12-15' '2015-12-16' '2015-12-17' '2015-12-18' '2015-12-21' '2015-12-22' '2015-12-23' '2015-12-24' '2015-12-28' '2015-12-29' '2015-12-30' '2016-01-04' '2016-01-05' '2016-01-07' '2016-01-08' '2016-01-11' '2016-01-12' '2016-01-13' '2016-01-14' '2016-01-15' '2016-01-18' '2016-01-19' '2016-01-20' '2016-01-21' '2016-01-22' '2016-01-25' '2016-01-26' '2016-01-27' '2016-01-28' '2016-02-01' '2016-02-02' '2016-02-03' '2016-02-04' '2016-02-05' '2016-02-08' '2016-02-09' '2016-02-10' '2016-02-11' '2016-02-12' '2016-02-15' '2016-02-16' '2016-02-17' '2016-02-18' '2016-02-19' '2016-02-22' '2016-02-23' '2016-02-24' '2016-02-25' '2016-02-26' '2016-03-01' '2016-03-02' '2016-03-03' '2016-03-04' '2016-03-07' '2016-03-08' '2016-03-09' '2016-03-10' '2016-03-11' '2016-03-14' '2016-03-15' '2016-03-16' '2016-03-17' '2016-03-18' '2016-03-21' '2016-03-22' '2016-03-23' '2016-03-24' '2016-03-28' '2016-03-29' '2016-03-30' '2016-04-01' '2016-04-04' '2016-04-05' '2016-04-06' '2016-04-07' '2016-04-08' '2016-04-11' '2016-04-12' '2016-04-13' '2016-04-14' '2016-04-15' '2016-04-18' '2016-04-19' '2016-04-20' '2016-04-21' '2016-04-22' '2016-04-25' '2016-04-26' '2016-04-27' '2016-04-28' '2016-05-02' '2016-05-03' '2016-05-04' '2016-05-05' '2016-05-06' '2016-05-09' '2016-05-10' '2016-05-11' '2016-05-12' '2016-05-13' '2016-05-16' '2016-05-17' '2016-05-18' '2016-05-19' '2016-05-20' '2016-05-23' '2016-05-24' '2016-05-25' '2016-05-26' '2016-05-27' '2016-05-30' 'nan'] -------------------------------------------------- # col indrel_1mes, n_uniq 10, uniq ['1' '1.0' '2' '2.0' '3' '3.0' '4' '4.0' 'P' 'nan'] -------------------------------------------------- # col tiprel_1mes, n_uniq 6, uniq ['A' 'I' 'N' 'P' 'R' 'nan'] -------------------------------------------------- # col indresi, n_uniq 3, uniq ['N' 'S' 'nan'] -------------------------------------------------- # col indext, n_uniq 3, uniq ['N' 'S' 'nan'] -------------------------------------------------- # col conyuemp, n_uniq 3, uniq ['N' 'S' 'nan'] -------------------------------------------------- # col canal_entrada, n_uniq 163, uniq ['004' '007' '013' '025' 'K00' 'KAA' 'KAB' 'KAC' 'KAD' 'KAE' 'KAF' 'KAG' 'KAH' 'KAI' 'KAJ' 'KAK' 'KAL' 'KAM' 'KAN' 'KAO' 'KAP' 'KAQ' 'KAR' 'KAS' 'KAT' 'KAU' 'KAV' 'KAW' 'KAY' 'KAZ' 'KBB' 'KBD' 'KBE' 'KBF' 'KBG' 'KBH' 'KBJ' 'KBL' 'KBM' 'KBN' 'KBO' 'KBP' 'KBQ' 'KBR' 'KBS' 'KBU' 'KBV' 'KBW' 'KBX' 'KBY' 'KBZ' 'KCA' 'KCB' 'KCC' 'KCD' 'KCE' 'KCF' 'KCG' 'KCH' 'KCI' 'KCJ' 'KCK' 'KCL' 'KCM' 'KCN' 'KCO' 'KCP' 'KCQ' 'KCR' 'KCS' 'KCT' 'KCU' 'KCV' 'KCX' 'KDA' 'KDB' 'KDC' 'KDD' 'KDE' 'KDF' 'KDG' 'KDH' 'KDI' 'KDL' 'KDM' 'KDN' 'KDO' 'KDP' 'KDQ' 'KDR' 'KDS' 'KDT' 'KDU' 'KDV' 'KDW' 'KDX' 'KDY' 'KDZ' 'KEA' 'KEB' 'KEC' 'KED' 'KEE' 'KEF' 'KEG' 'KEH' 'KEI' 'KEJ' 'KEK' 'KEL' 'KEM' 'KEN' 'KEO' 'KEQ' 'KES' 'KEU' 'KEV' 'KEW' 'KEY' 'KEZ' 'KFA' 'KFB' 'KFC' 'KFD' 'KFE' 'KFF' 'KFG' 'KFH' 'KFI' 'KFJ' 'KFK' 'KFL' 'KFM' 'KFN' 'KFP' 'KFR' 'KFS' 'KFT' 'KFU' 'KFV' 'KGC' 'KGN' 'KGU' 'KGV' 'KGW' 'KGX' 'KGY' 'KHA' 'KHC' 'KHD' 'KHE' 'KHF' 'KHK' 'KHL' 'KHM' 'KHN' 'KHO' 'KHP' 'KHQ' 'KHR' 'KHS' 'RED' 'nan'] -------------------------------------------------- # col indfall, n_uniq 3, uniq ['N' 'S' 'nan'] -------------------------------------------------- # col nomprov, n_uniq 53, uniq ['ALAVA' 'ALBACETE' 'ALICANTE' 'ALMERIA' 'ASTURIAS' 'AVILA' 'BADAJOZ' 'BALEARS, ILLES' 'BARCELONA' 'BIZKAIA' 'BURGOS' 'CACERES' 'CADIZ' 'CANTABRIA' 'CASTELLON' 'CEUTA' 'CIUDAD REAL' 'CORDOBA' 'CORUÑA, A' 'CUENCA' 'GIPUZKOA' 'GIRONA' 'GRANADA' 'GUADALAJARA' 'HUELVA' 'HUESCA' 'JAEN' 'LEON' 'LERIDA' 'LUGO' 'MADRID' 'MALAGA' 'MELILLA' 'MURCIA' 'NAVARRA' 'OURENSE' 'PALENCIA' 'PALMAS, LAS' 'PONTEVEDRA' 'RIOJA, LA' 'SALAMANCA' 'SANTA CRUZ DE TENERIFE' 'SEGOVIA' 'SEVILLA' 'SORIA' 'TARRAGONA' 'TERUEL' 'TOLEDO' 'VALENCIA' 'VALLADOLID' 'ZAMORA' 'ZARAGOZA' 'nan'] -------------------------------------------------- # col segmento, n_uniq 4, uniq ['01 - TOP' '02 - PARTICULARES' '03 - UNIVERSITARIO' 'nan']

In [15]:

import matplotlib import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns

 

In [17]:

skip_cols =['ncodpers','renta'] for col in trn.columns: if col in skip_cols: continue print('-'*50) print('col : ', col) f, ax =plt.subplots(figsize=(20,15)) sns.countplot(x=col, data=trn, alpha=0.5) plt.show()

 

-------------------------------------------------- col : fecha_dato

-------------------------------------------------- col : ind_empleado

-------------------------------------------------- col : pais_residencia

-------------------------------------------------- col : sexo

-------------------------------------------------- col : age

-------------------------------------------------- col : fecha_alta

-------------------------------------------------- col : ind_nuevo

-------------------------------------------------- col : antiguedad

-------------------------------------------------- col : indrel

-------------------------------------------------- col : ult_fec_cli_1t

-------------------------------------------------- col : indrel_1mes

-------------------------------------------------- col : tiprel_1mes

-------------------------------------------------- col : indresi

-------------------------------------------------- col : indext

-------------------------------------------------- col : conyuemp

-------------------------------------------------- col : canal_entrada

-------------------------------------------------- col : indfall

-------------------------------------------------- col : tipodom

-------------------------------------------------- col : cod_prov

-------------------------------------------------- col : nomprov

-------------------------------------------------- col : ind_actividad_cliente

-------------------------------------------------- col : segmento

-------------------------------------------------- col : ind_ahor_fin_ult1

-------------------------------------------------- col : ind_aval_fin_ult1

-------------------------------------------------- col : ind_cco_fin_ult1

-------------------------------------------------- col : ind_cder_fin_ult1

-------------------------------------------------- col : ind_cno_fin_ult1

-------------------------------------------------- col : ind_ctju_fin_ult1

-------------------------------------------------- col : ind_ctma_fin_ult1

-------------------------------------------------- col : ind_ctop_fin_ult1

-------------------------------------------------- col : ind_ctpp_fin_ult1

-------------------------------------------------- col : ind_deco_fin_ult1

-------------------------------------------------- col : ind_deme_fin_ult1

-------------------------------------------------- col : ind_dela_fin_ult1

-------------------------------------------------- col : ind_ecue_fin_ult1

-------------------------------------------------- col : ind_fond_fin_ult1

-------------------------------------------------- col : ind_hip_fin_ult1

-------------------------------------------------- col : ind_plan_fin_ult1

-------------------------------------------------- col : ind_pres_fin_ult1

-------------------------------------------------- col : ind_reca_fin_ult1

-------------------------------------------------- col : ind_tjcr_fin_ult1

-------------------------------------------------- col : ind_valo_fin_ult1

-------------------------------------------------- col : ind_viv_fin_ult1

-------------------------------------------------- col : ind_nomina_ult1

-------------------------------------------------- col : ind_nom_pens_ult1

-------------------------------------------------- col : ind_recibo_ult1

 

 

 

(차트는 네이버 블로그로 긁을 수 없게 되어있나보다. 복사가 되질 않는다.

 

차트를 블로그에 올릴 수 있는 방법을 찾아봐야 겠다. 기분이니 차트 하나만 캡쳐해서 넣어본다.

 

 

728x90