{ "cells": [ { "cell_type": "raw", "metadata": {}, "source": [ "---\n", "title: Identify correlated stocks from F&O stock list\n", "summary: Download data for all the stocks traded in F&O, calculate the correlation matrix on close price and finally filter stocks with high correlation\n", "tags: [\"jugaad-data\", \"strategies\"]\n", "categories: [\"posts\"]\n", "date: 2020-08-25T02:02:57Z\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [Go back to index](../)\n", "\n", "## Finding a needle in haystack\n", "There are more than 1600 companies listed in NSE, which gives you more than 1.2 million potential pairs. This is not a very helpful number, even for the computer it might become computationally challenging to handle such a big number. So we need to apply series of fileters to narrow down our search. Here's what I'm going to do-\n", "\n", "### 1. Narrow down the universe - Consider stocks traded in F&O only\n", "Pair trading strategies require you to go long on one stock and short on the other. The only way we can short a stock right now is in futures market (as far as I know). So we can execute pair trading only for stocks in F&O and that'll be our search universe.\n", "\n", "You can download the list from this link and save in the project folder - https://archives.nseindia.com/content/fo/fo_mktlots.csv)\n", "\n", "### 2. Filter highly correlated stocks\n", "We calculate correlation of every stock with every other stock in F&O list and then filter all correlations above a threshold. You can decide the threshold by trial and error. Higher the threshold, fewer the pairs and vice verca. So you use your judgement for this.\n", "\n", "### 3. Manual filtering of the pair\n", "If required, Filter the list further with criterias like industry, sector and co-dependancy.\n", "\n", "So lets get started\n", "\n", "## Install dependencies\n", "Let us install `jugaad-data` for downloading data, `pandas` for all the analytics and `seaborn` for plotting" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install pandas jugaad-data\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Download the list from NSE's website](https://archives.nseindia.com/content/fo/fo_mktlots.csv) and place it in the same folder." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Stocks in F&O: 139\n" ] } ], "source": [ "import csv\n", "# Gather all symbols from CSV\n", "with open('fo_mktlots.csv') as fp:\n", " reader = csv.reader(fp)\n", " rows = list(reader)\n", " # stocks start from 5th row and symbols are in 2nd column\n", " fno_stocks = [row[1].lstrip().rstrip() for row in rows[4:]]\n", "print(\"Stocks in F&O: {}\".format(len(fno_stocks)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download the data\n", "\n", "Considering it has been a choppy year due to Covid, let us run the experiment for last year. We will use data from first half of the 2019 (Jan to Jun) to derive ratios and second half of the 2019 to test our hypothesis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "###########################################################################################################################################\n", " DATE SERIES OPEN HIGH LOW PREV. CLOSE LTP CLOSE \\\n", "0 2019-06-28 EQ 1324.90 1382.00 1319.30 1312.45 1365.0 1359.20 \n", "1 2019-06-27 EQ 1310.55 1322.95 1292.60 1307.95 1313.1 1312.45 \n", "2 2019-06-26 EQ 1320.00 1325.05 1301.05 1324.85 1305.5 1307.95 \n", "3 2019-06-25 EQ 1345.00 1351.45 1309.95 1344.65 1328.6 1324.85 \n", "4 2019-06-24 EQ 1353.90 1357.10 1330.05 1348.25 1345.0 1344.65 \n", "\n", " VWAP 52W H 52W L VOLUME VALUE NO OF TRADES SYMBOL \n", "0 1355.70 1406.0 910.1 1568379 2.126244e+09 61027 APOLLOHOSP \n", "1 1309.95 1406.0 910.1 688891 9.024130e+08 26612 APOLLOHOSP \n", "2 1309.64 1406.0 910.1 597977 7.831336e+08 30659 APOLLOHOSP \n", "3 1326.99 1406.0 910.1 1156668 1.534883e+09 54399 APOLLOHOSP \n", "4 1339.41 1406.0 910.1 420849 5.636875e+08 30942 APOLLOHOSP \n" ] } ], "source": [ "from datetime import date\n", "from jugaad_data import nse\n", "import pandas as pd\n", "from_date = date(2019,1,1)\n", "to_date = date(2019,6,30)\n", "dfs = []\n", "for stock in fno_stocks:\n", " df = nse.stock_df(stock, from_date, to_date)\n", " dfs.append(df)\n", " print(\"#\", end='')\n", "print()\n", "print(dfs[0].head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data preparation\n", "\n", "In [varity article](https://zerodha.com/varsity/chapter/pair-stats/) there are two methods of calculating correlation-\n", "\n", "1. Based on daily closing price\n", "2. **Based on daily return series**\n", "\n", "We are going to calculate returns based on daily returns as recommended in the varsity article. I have tried the first approach as well, its output had a lot of noise, it provided high scores for seemingly uncorrelated stocks also, so use the first method with caution." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " APOLLOHOSP AMBUJACEM BANDHANBNK AUROPHARMA BEL FEDERALBNK \\\n", "0 0.035620 -0.025635 0.001766 -0.006860 -0.021314 -0.001382 \n", "1 0.003440 -0.004103 0.005140 -0.014408 0.004808 0.039253 \n", "2 -0.012756 0.019521 0.013928 0.035851 0.004390 0.004810 \n", "3 -0.014725 0.000698 -0.011613 0.022855 -0.007407 0.010204 \n", "4 -0.002670 0.010101 0.005935 -0.026807 0.029148 0.000486 \n", "\n", " BPCL CADILAHC ADANIENT CUMMINSIND ... WIPRO L&TFH \\\n", "0 0.000637 0.016187 -0.015375 -0.014892 ... -0.005672 -0.008183 \n", "1 0.002044 -0.004395 -0.006177 0.006554 ... -0.014153 0.005630 \n", "2 -0.004325 0.036892 0.019556 0.017295 ... 0.002979 0.020327 \n", "3 0.032165 0.005894 0.020981 0.003311 ... 0.004931 -0.005275 \n", "4 0.002237 -0.025526 0.006471 0.018963 ... -0.006822 -0.003940 \n", "\n", " NATIONALUM PNB POWERGRID BAJFINANCE RECLTD MUTHOOTFIN \\\n", "0 -0.017893 0.019872 -0.003852 0.010680 0.011043 0.007421 \n", "1 0.004995 -0.011407 -0.009301 0.003928 -0.006098 0.000000 \n", "2 0.020387 0.023346 0.039931 0.013436 0.019267 0.000938 \n", "3 0.003067 0.003253 0.012811 0.009731 0.032734 0.040348 \n", "4 -0.005086 -0.000650 -0.004252 -0.005749 0.007762 -0.014194 \n", "\n", " KOTAKBANK SIEMENS \n", "0 -0.003037 0.013251 \n", "1 -0.006038 0.032799 \n", "2 0.004515 0.001398 \n", "3 0.009044 0.002805 \n", "4 -0.012888 0.006209 \n", "\n", "[5 rows x 139 columns]\n" ] } ], "source": [ "all_stock_df = pd.DataFrame()\n", "for i, df in enumerate(dfs):\n", " all_stock_df[fno_stocks[i]] = (df['CLOSE'] - df['PREV. CLOSE'])/df['PREV. CLOSE']\n", "print(all_stock_df.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate correlations\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " APOLLOHOSP AMBUJACEM BANDHANBNK AUROPHARMA BEL \\\n", "APOLLOHOSP 1.000000 0.320794 0.151532 0.024116 0.305902 \n", "AMBUJACEM 0.320794 1.000000 0.217605 0.312233 0.437229 \n", "BANDHANBNK 0.151532 0.217605 1.000000 0.092184 0.137484 \n", "AUROPHARMA 0.024116 0.312233 0.092184 1.000000 0.193852 \n", "BEL 0.305902 0.437229 0.137484 0.193852 1.000000 \n", "\n", " FEDERALBNK BPCL CADILAHC ADANIENT CUMMINSIND ... \\\n", "APOLLOHOSP 0.260664 0.205749 0.156552 0.317534 0.313665 ... \n", "AMBUJACEM 0.365048 0.445099 0.248823 0.347647 0.379015 ... \n", "BANDHANBNK 0.102062 0.218680 0.169849 0.305171 0.120858 ... \n", "AUROPHARMA 0.182673 0.236757 0.516735 0.186846 0.077641 ... \n", "BEL 0.565289 0.405377 0.245842 0.439243 0.349208 ... \n", "\n", " WIPRO L&TFH NATIONALUM PNB POWERGRID BAJFINANCE \\\n", "APOLLOHOSP 0.072392 0.287634 0.328091 0.398197 0.044383 0.214700 \n", "AMBUJACEM 0.020011 0.524240 0.332308 0.431475 0.326298 0.363156 \n", "BANDHANBNK -0.078667 0.365273 0.142910 0.242020 0.105363 0.243383 \n", "AUROPHARMA 0.066967 0.349406 0.131476 0.206744 0.083969 0.150650 \n", "BEL -0.061190 0.509221 0.445504 0.509757 0.260057 0.399151 \n", "\n", " RECLTD MUTHOOTFIN KOTAKBANK SIEMENS \n", "APOLLOHOSP 0.189239 0.186098 0.056665 0.306106 \n", "AMBUJACEM 0.206820 0.377478 0.253772 0.498506 \n", "BANDHANBNK 0.103857 0.244627 0.161533 0.313044 \n", "AUROPHARMA 0.183215 0.295931 0.033207 0.344312 \n", "BEL 0.308352 0.415240 0.201867 0.393015 \n", "\n", "[5 rows x 139 columns]\n" ] } ], "source": [ "stock_corr = all_stock_df.corr()\n", "print(stock_corr.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filter based on threshold\n", "I'm setting a threshold of 0.7 and I'm getting below output. You can reduce the threshold and you will see lot many pairs. One more thing to note is you'll see pairs repeating, this is because `corr(a, b) = corr(b, a)`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " ACC\n", "AMBUJACEM 0.806813\n", "------------------------------------\n", "\n", "\n", " HINDPETRO IOC\n", "BPCL 0.798469 0.702593\n", "------------------------------------\n", "\n", "\n", " JSWSTEEL\n", "JINDALSTEL 0.722094\n", "------------------------------------\n", "\n", "\n", " BAJFINANCE\n", "BAJAJFINSV 0.702545\n", "------------------------------------\n", "\n", "\n", " EQUITAS\n", "UJJIVAN 0.73688\n", "------------------------------------\n", "\n", "\n", " UJJIVAN\n", "EQUITAS 0.73688\n", "------------------------------------\n", "\n", "\n", " JINDALSTEL\n", "JSWSTEEL 0.722094\n", "------------------------------------\n", "\n", "\n", " BANKBARODA SBIN PNB\n", "CANBK 0.794729 0.701325 0.803416\n", "------------------------------------\n", "\n", "\n", " AMBUJACEM\n", "ACC 0.806813\n", "------------------------------------\n", "\n", "\n", " L&TFH\n", "M&MFIN 0.732753\n", "------------------------------------\n", "\n", "\n", " BPCL IOC\n", "HINDPETRO 0.798469 0.738071\n", "------------------------------------\n", "\n", "\n", " BPCL HINDPETRO\n", "IOC 0.702593 0.738071\n", "------------------------------------\n", "\n", "\n", " ULTRACEMCO\n", "RAMCOCEM 0.729297\n", "------------------------------------\n", "\n", "\n", " CANBK PNB\n", "BANKBARODA 0.794729 0.774867\n", "------------------------------------\n", "\n", "\n", " CANBK\n", "SBIN 0.701325\n", "------------------------------------\n", "\n", "\n", " RAMCOCEM\n", "ULTRACEMCO 0.729297\n", "------------------------------------\n", "\n", "\n", " M&MFIN\n", "L&TFH 0.732753\n", "------------------------------------\n", "\n", "\n", " CANBK BANKBARODA\n", "PNB 0.803416 0.774867\n", "------------------------------------\n", "\n", "\n", " BAJAJFINSV\n", "BAJFINANCE 0.702545\n", "------------------------------------\n", "\n", "\n" ] } ], "source": [ "threshold = 0.7\n", "for stock in fno_stocks[:len(fno_stocks)]:\n", " k = stock_corr[(stock_corr[stock] > threshold) & (stock_corr[stock] < 1)][stock]\n", " if k.any():\n", " print(k.to_frame().T)\n", " print('------------------------------------\\n\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyse the output\n", "\n", "Below are the pairs identified-\n", "\n", "1. ACC and AMBUJACEM\n", "2. BPCL, HINDPETRO and IOC\n", "3. JINDALSTEL and JSWSTEEL\n", "4. BAJFINANCE and BAJAJFINSV\n", "5. UJJIVAN and EQUITAS\n", "6. BANKBARODA, SBIN, PNB and CANBK\n", "7. L&TFH and M&MFIN\n", "8. RAMCOCEM and ULTRACEMCO\n", "\n", "I dont see any anomaly in this particular output, but you should always validate the output before further exploration.\n", "\n", "### [Next step](../pair-stats/)\n", "\n", "### [Go back to index](../)\n", "\n", "### [Download notebook](../identify-stocks.ipynb)\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }