Lets Start by Importing few Important Libraries

please make sure to save the notebook on the same directory where you have saved the data.

Enjoy!

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import os
import datetime
from functools import reduce
import warnings
warnings.filterwarnings('ignore')

Set up jupyter notebook

Extract the repositories from git
Run following kernel to get file import hassel free working

Enjoy!

from pathlib import Path
mypath = Path().absolute()
mypath = str(mypath) + '/Data'

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
onlyfiles

[‘DGS1.xls’, ‘DGS10.xls’, ‘DGS2.xls’, ‘DGS3.xls’, ‘DGS30.xls’, ‘DGS3MO.xls’, ‘DGS5.xls’]

Example data from the exel, uncleaned

First few lines contains the description about the data, which we don’t need for our analysis.

file_1 = pd.read_excel(str(mypath)+'/DGS10.xls')
file_1.head(15)
FRED Graph Observations Unnamed: 1
0 Federal Reserve Economic Data NaN
1 Link: https://fred.stlouisfed.org NaN
2 Help: https://fred.stlouisfed.org/help-faq NaN
3 Economic Research Division NaN
4 Federal Reserve Bank of St. Louis NaN
5 NaN NaN
6 DGS10 10-Year Treasury Constant Maturity Rate, Perce...
7 NaN NaN
8 Frequency: Daily NaN
9 observation_date DGS10
10 1962-01-02 00:00:00 4.06
11 1962-01-03 00:00:00 4.03
12 1962-01-04 00:00:00 3.99
13 1962-01-05 00:00:00 4.02
14 1962-01-08 00:00:00 4.03

Data Import and Cleaning

Using a For loop to find all the excel files
Each loop does following:

  1. Import the file
  2. Remove the unwanted description.
  3. Remove NAs
  4. Reset Index
  5. Rename the column
  6. Remove records with interest rate = 0 (mostly Sunday and public holidays)
  7. Convert date from str to datetime

files_xls = [i for i in onlyfiles if i[-3:]== 'xls']                 # import only xls

all_file={}

for j in files_xls:
    
    file = pd.read_excel('Data/'+ str(j)).dropna()                   # import excel file, removed rows with Nan
    
    file.reset_index(inplace= True, drop =True)                      # reset index

    for k in file.index:                                             # To rename the columns, for loop will search for datetime 
        if type(file.loc[k][0]) == datetime.datetime:                # untill finds a datetime and rename the column 
            break
    file.columns = file.iloc[k-1]

    for k,i in  enumerate(file[str(file.columns[0])]):               # Data cleaning for bad datetime
        if type(i) != datetime.datetime:
            file.drop(k, inplace= True)
            
    file.drop(file[file[file.columns[-1]]==0].index, inplace= True)  # Drop all rows with interest rate 0, mainly weekends
    
    file.reset_index(drop= True, inplace = True)                     # Reset Index
    
    file['observation_date'] = pd.to_datetime(file['observation_date'])
    
    all_file[str(j)] = file                                          # Save clean DataFrame in the empty dictionary
    

Merge columns from all the DataFrames

Using Pandas.merge(), inner join on date field Using reduce fuction, inner join all seven Dataframes in single line of code,

result = reduce(lambda left, right: pd.merge(left, right , sort=False,
                                              on = ['observation_date'], 
                                              how = 'outer'), all_file.values())
result.tail(15)
1 observation_date DGS1 DGS10 DGS2 DGS3 DGS30 DGS3MO DGS5
14146 2018-08-22 2.43 2.82 2.6 2.65 2.99 2.09 2.7
14147 2018-08-23 2.43 2.82 2.61 2.66 2.97 2.08 2.72
14148 2018-08-24 2.44 2.82 2.63 2.68 2.97 2.09 2.72
14149 2018-08-27 2.47 2.85 2.67 2.7 3 2.12 2.74
14150 2018-08-28 2.47 2.88 2.67 2.73 3.03 2.13 2.77
14151 2018-08-29 2.48 2.89 2.67 2.75 3.02 2.13 2.78
14152 2018-08-30 2.47 2.86 2.64 2.72 3 2.11 2.75
14153 2018-08-31 2.46 2.86 2.62 2.7 3.02 2.11 2.74
14154 2018-09-04 2.49 2.9 2.66 2.73 3.07 2.13 2.78
14155 2018-09-05 2.49 2.9 2.66 2.72 3.08 2.14 2.77
14156 2018-09-06 2.5 2.88 2.64 2.71 3.06 2.13 2.76
14157 2018-09-07 2.53 2.94 2.71 2.78 3.11 2.14 2.82
14158 2018-09-10 2.54 2.94 2.73 2.78 3.09 2.14 2.83
14159 2018-09-11 2.55 2.98 2.76 2.83 3.13 2.15 2.87
14160 2018-09-12 2.56 2.97 2.74 2.82 3.11 2.16 2.87

Plot Interactive Visualizations

  • for installation
    $ pip install plotly == 2.7.0

Using plotly for python, use mouse pointer to see the exact tooltip values Enjoy the zoom by drawing a square to part of the graph to look closely.

Enjoy!

Future version will have a drop down filters

import plotly
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import cufflinks as cf
init_notebook_mode(connected=True)
cf.go_offline()

#plotly.tools.set_credentials_file(username='charchil1010', api_key='h6iK0B8bluU0nNJ1oZti')

y_col =  [i for i in result.columns if i != 'observation_date']

result.iplot(kind='scatter',x='observation_date',y= y_col, size=25)

If plotly doesn’t work

Using matplotlib plot()

result.plot(x ='observation_date', y = y_col, figsize= (30,20))

<matplotlib.axes._subplots.AxesSubplot at 0x2188f0d9630>

png

This post is still In process…

2018

Back to Top ↑