Machine Learning in Finance - python

Charchil Kasliwal

Data Science Fanatic
A die-hard Python fan

Lets Start by Importing few Important Libraries

please make sure to save the notebook on the same directory where you have saved the data.

Enjoy!

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import os
import datetime
from functools import reduce
import warnings
warnings.filterwarnings('ignore')

Set up jupyter notebook

Extract the repositories from git
Run following kernel to get file import hassel free working

Enjoy!

from pathlib import Path
mypath = Path().absolute()
mypath = str(mypath) + '/Data'

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
onlyfiles

[‘DGS1.xls’, ‘DGS10.xls’, ‘DGS2.xls’, ‘DGS3.xls’, ‘DGS30.xls’, ‘DGS3MO.xls’, ‘DGS5.xls’]

Example data from the exel, uncleaned

First few lines contains the description about the data, which we don’t need for our analysis.

file_1 = pd.read_excel(str(mypath)+'/DGS10.xls')
file_1.head(15)

	FRED Graph Observations	Unnamed: 1
0	Federal Reserve Economic Data	NaN
1	Link: https://fred.stlouisfed.org	NaN
2	Help: https://fred.stlouisfed.org/help-faq	NaN
3	Economic Research Division	NaN
4	Federal Reserve Bank of St. Louis	NaN
5	NaN	NaN
6	DGS10	10-Year Treasury Constant Maturity Rate, Perce...
7	NaN	NaN
8	Frequency: Daily	NaN
9	observation_date	DGS10
10	1962-01-02 00:00:00	4.06
11	1962-01-03 00:00:00	4.03
12	1962-01-04 00:00:00	3.99
13	1962-01-05 00:00:00	4.02
14	1962-01-08 00:00:00	4.03

Data Import and Cleaning

Using a For loop to find all the excel files
Each loop does following:

Import the file
Remove the unwanted description.
Remove NAs
Reset Index
Rename the column
Remove records with interest rate = 0 (mostly Sunday and public holidays)
Convert date from str to datetime

files_xls = [i for i in onlyfiles if i[-3:]== 'xls']                 # import only xls

all_file={}

for j in files_xls:
    
    file = pd.read_excel('Data/'+ str(j)).dropna()                   # import excel file, removed rows with Nan
    
    file.reset_index(inplace= True, drop =True)                      # reset index

    for k in file.index:                                             # To rename the columns, for loop will search for datetime 
        if type(file.loc[k][0]) == datetime.datetime:                # untill finds a datetime and rename the column 
            break
    file.columns = file.iloc[k-1]

    for k,i in  enumerate(file[str(file.columns[0])]):               # Data cleaning for bad datetime
        if type(i) != datetime.datetime:
            file.drop(k, inplace= True)
            
    file.drop(file[file[file.columns[-1]]==0].index, inplace= True)  # Drop all rows with interest rate 0, mainly weekends
    
    file.reset_index(drop= True, inplace = True)                     # Reset Index
    
    file['observation_date'] = pd.to_datetime(file['observation_date'])
    
    all_file[str(j)] = file                                          # Save clean DataFrame in the empty dictionary
    

Merge columns from all the DataFrames

Using Pandas.merge(), inner join on date field Using reduce fuction, inner join all seven Dataframes in single line of code,

result = reduce(lambda left, right: pd.merge(left, right , sort=False,
                                              on = ['observation_date'], 
                                              how = 'outer'), all_file.values())
result.tail(15)

1	observation_date	DGS1	DGS10	DGS2	DGS3	DGS30	DGS3MO	DGS5
14146	2018-08-22	2.43	2.82	2.6	2.65	2.99	2.09	2.7
14147	2018-08-23	2.43	2.82	2.61	2.66	2.97	2.08	2.72
14148	2018-08-24	2.44	2.82	2.63	2.68	2.97	2.09	2.72
14149	2018-08-27	2.47	2.85	2.67	2.7	3	2.12	2.74
14150	2018-08-28	2.47	2.88	2.67	2.73	3.03	2.13	2.77
14151	2018-08-29	2.48	2.89	2.67	2.75	3.02	2.13	2.78
14152	2018-08-30	2.47	2.86	2.64	2.72	3	2.11	2.75
14153	2018-08-31	2.46	2.86	2.62	2.7	3.02	2.11	2.74
14154	2018-09-04	2.49	2.9	2.66	2.73	3.07	2.13	2.78
14155	2018-09-05	2.49	2.9	2.66	2.72	3.08	2.14	2.77
14156	2018-09-06	2.5	2.88	2.64	2.71	3.06	2.13	2.76
14157	2018-09-07	2.53	2.94	2.71	2.78	3.11	2.14	2.82
14158	2018-09-10	2.54	2.94	2.73	2.78	3.09	2.14	2.83
14159	2018-09-11	2.55	2.98	2.76	2.83	3.13	2.15	2.87
14160	2018-09-12	2.56	2.97	2.74	2.82	3.11	2.16	2.87

Plot Interactive Visualizations

for installation
$ pip install plotly == 2.7.0

Using plotly for python, use mouse pointer to see the exact tooltip values Enjoy the zoom by drawing a square to part of the graph to look closely.

Enjoy!

Future version will have a drop down filters

import plotly
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import cufflinks as cf
init_notebook_mode(connected=True)
cf.go_offline()

#plotly.tools.set_credentials_file(username='charchil1010', api_key='h6iK0B8bluU0nNJ1oZti')

y_col =  [i for i in result.columns if i != 'observation_date']

result.iplot(kind='scatter',x='observation_date',y= y_col, size=25)

If plotly doesn’t work

Using matplotlib plot()

result.plot(x ='observation_date', y = y_col, figsize= (30,20))

<matplotlib.axes._subplots.AxesSubplot at 0x2188f0d9630>

png

This post is still In process…

2018

Extract-Transform-Load (ETL) with Python

7 minute read

Read, Clean, Create Schema and Load bunch of csv files to MySQL with Python

Machine Learning in Finance - python

3 minute read

Implement Machine learning models to predict interest rates of Bonds from scratch