Identify correlated stocks from F&O stock list

Go back to index

Finding a needle in haystack

There are more than 1600 companies listed in NSE, which gives you more than 1.2 million potential pairs. This is not a very helpful number, even for the computer it might become computationally challenging to handle such a big number. So we need to apply series of fileters to narrow down our search. Here’s what I’m going to do-

1. Narrow down the universe - Consider stocks traded in F&O only

Pair trading strategies require you to go long on one stock and short on the other. The only way we can short a stock right now is in futures market (as far as I know). So we can execute pair trading only for stocks in F&O and that’ll be our search universe.

You can download the list from this link and save in the project folder - https://archives.nseindia.com/content/fo/fo_mktlots.csv)

2. Filter highly correlated stocks

We calculate correlation of every stock with every other stock in F&O list and then filter all correlations above a threshold. You can decide the threshold by trial and error. Higher the threshold, fewer the pairs and vice verca. So you use your judgement for this.

3. Manual filtering of the pair

Filter the list further with criterias like industry, sector and co-dependancy.

Install dependencies

Let us install jugaad-data for downloading data, pandas for all the analytics and seaborn for plotting

!pip install pandas jugaad-data

Download the list from NSE’s website and place it in the same folder.

import csv
# Gather all symbols from CSV
with open('fo_mktlots.csv') as fp:
    reader = csv.reader(fp)
    rows = list(reader)
    # stocks start from 5th row and symbols are in 2nd column
    fno_stocks = [row[1].lstrip().rstrip() for row in rows[4:]]
print("Stocks in F&O: {}".format(len(fno_stocks)))
Stocks in F&O: 139

Download the data

Considering it has been a choppy year due to Covid, let us run the experiment for last year. We will use data from first half of the 2019 (Jan to Jun) to derive ratios and second half of the 2019 to test our hypothesis

from datetime import date
from jugaad_data import nse
import pandas as pd
from_date = date(2019,1,1)
to_date = date(2019,6,30)
dfs = []
for stock in fno_stocks:
    df = nse.stock_df(stock, from_date, to_date)
    dfs.append(df)
    print("#", end='')
print()
print(dfs[0].head())
###########################################################################################################################################
        DATE SERIES     OPEN     HIGH      LOW  PREV. CLOSE     LTP    CLOSE  \
0 2019-06-28     EQ  1324.90  1382.00  1319.30      1312.45  1365.0  1359.20   
1 2019-06-27     EQ  1310.55  1322.95  1292.60      1307.95  1313.1  1312.45   
2 2019-06-26     EQ  1320.00  1325.05  1301.05      1324.85  1305.5  1307.95   
3 2019-06-25     EQ  1345.00  1351.45  1309.95      1344.65  1328.6  1324.85   
4 2019-06-24     EQ  1353.90  1357.10  1330.05      1348.25  1345.0  1344.65   

      VWAP   52W H  52W L   VOLUME         VALUE  NO OF TRADES      SYMBOL  
0  1355.70  1406.0  910.1  1568379  2.126244e+09         61027  APOLLOHOSP  
1  1309.95  1406.0  910.1   688891  9.024130e+08         26612  APOLLOHOSP  
2  1309.64  1406.0  910.1   597977  7.831336e+08         30659  APOLLOHOSP  
3  1326.99  1406.0  910.1  1156668  1.534883e+09         54399  APOLLOHOSP  
4  1339.41  1406.0  910.1   420849  5.636875e+08         30942  APOLLOHOSP  

Data wrangling

Combine stock close of all fno_stocks in a sinle dataframe-

all_stock_df = pd.DataFrame()
for i, df in enumerate(dfs):
    all_stock_df[fno_stocks[i]] = df['CLOSE']
print(all_stock_df.head())
   APOLLOHOSP  AMBUJACEM  BANDHANBNK  AUROPHARMA     BEL  FEDERALBNK    BPCL  \
0     1359.20     212.85      538.75      608.05  112.50      108.40  392.45   
1     1312.45     218.45      537.80      612.25  114.95      108.55  392.20   
2     1307.95     219.35      535.05      621.20  114.40      104.45  391.40   
3     1324.85     215.15      527.70      599.70  113.90      103.95  393.10   
4     1344.65     215.00      533.90      586.30  114.75      102.90  380.85   

   CADILAHC  ADANIENT  CUMMINSIND  ...   WIPRO   L&TFH  NATIONALUM    PNB  \
0    241.70    150.50      764.05  ...  280.50  115.15       49.40  79.55   
1    237.85    152.85      775.60  ...  282.10  116.10       50.30  78.00   
2    238.90    153.80      770.55  ...  286.15  115.45       50.05  78.90   
3    230.40    150.85      757.45  ...  285.30  113.15       49.05  77.10   
4    229.05    147.75      754.95  ...  283.90  113.75       48.90  76.85   

   POWERGRID  BAJFINANCE  RECLTD  MUTHOOTFIN  KOTAKBANK  SIEMENS  
0     206.90     3681.10   164.8      644.80     1477.1  1311.35  
1     207.70     3642.20   163.0      640.05     1481.6  1294.20  
2     209.65     3627.95   164.0      640.05     1490.6  1253.10  
3     201.60     3579.85   160.9      639.45     1483.9  1251.35  
4     199.05     3545.35   155.8      614.65     1470.6  1247.85  

[5 rows x 139 columns]

Calculate correlations

stock_corr = all_stock_df.corr()
print(stock_corr.head())
            APOLLOHOSP  AMBUJACEM  BANDHANBNK  AUROPHARMA       BEL  \
APOLLOHOSP    1.000000  -0.098696   -0.114040   -0.314229  0.437795   
AMBUJACEM    -0.098696   1.000000    0.543459    0.097391  0.492189   
BANDHANBNK   -0.114040   0.543459    1.000000   -0.326436  0.566101   
AUROPHARMA   -0.314229   0.097391   -0.326436    1.000000 -0.677914   
BEL           0.437795   0.492189    0.566101   -0.677914  1.000000   

            FEDERALBNK      BPCL  CADILAHC  ADANIENT  CUMMINSIND  ...  \
APOLLOHOSP    0.419438  0.084221 -0.198073  0.513364    0.603441  ...   
AMBUJACEM     0.479027  0.642434  0.028189  0.470602   -0.001393  ...   
BANDHANBNK    0.712406  0.604799 -0.500142  0.221851   -0.369125  ...   
AUROPHARMA   -0.568241 -0.421526  0.882661 -0.355406    0.065054  ...   
BEL           0.915799  0.782225 -0.667126  0.730377    0.130452  ...   

               WIPRO     L&TFH  NATIONALUM       PNB  POWERGRID  BAJFINANCE  \
APOLLOHOSP  0.054274 -0.069220    0.279340 -0.109935   0.449659    0.285503   
AMBUJACEM  -0.732046  0.539610   -0.046655  0.772385   0.457006    0.409431   
BANDHANBNK -0.633067 -0.123392   -0.551747  0.429121   0.196804    0.735177   
AUROPHARMA  0.090131  0.658168    0.557208  0.385672  -0.107277   -0.694677   
BEL        -0.621080 -0.154813   -0.339003  0.274645   0.585579    0.879276   

              RECLTD  MUTHOOTFIN  KOTAKBANK   SIEMENS  
APOLLOHOSP  0.158146    0.143575   0.200802  0.478500  
AMBUJACEM   0.554389    0.655019   0.340996  0.456619  
BANDHANBNK  0.639491    0.727082   0.740476  0.712231  
AUROPHARMA -0.390314   -0.439768  -0.662226 -0.574784  
BEL         0.736498    0.838437   0.792487  0.894408  

[5 rows x 139 columns]

Filter based on threshold

I’m setting a very high threshold of 0.95 and I’m getting below output. You can reduce the threshold and you will see lot many pairs. One more thing to note is you’ll see pairs repeating, this is because corr(a, b) = corr(b, a).

threshold = 0.95
for stock in fno_stocks[:len(fno_stocks)]:
    k = stock_corr[(stock_corr[stock] > threshold) & (stock_corr[stock] < 1)][stock]
    if k.any():
        print(k.to_frame().T)
        print('------------------------------------\n\n')
           LT
BEL  0.966845
------------------------------------


                SBIN
FEDERALBNK  0.957604
------------------------------------


          TORNTPHARM
CADILAHC    0.954483
------------------------------------


          SHREECEM  BAJAJFINSV  BAJFINANCE
HDFCBANK  0.950064    0.961924    0.962466
------------------------------------


         BEL
LT  0.966845
------------------------------------


            CADILAHC
TORNTPHARM  0.954483
------------------------------------


      ULTRACEMCO
ONGC    0.952448
------------------------------------


          HDFCBANK  BAJAJFINSV  RAMCOCEM  ULTRACEMCO  BAJFINANCE
SHREECEM  0.950064     0.97875  0.964504    0.950736    0.954447
------------------------------------


            HDFCBANK  SHREECEM  BAJFINANCE
BAJAJFINSV  0.961924   0.97875     0.97229
------------------------------------


            SRF  BAJFINANCE
TITAN  0.973173    0.964968
------------------------------------


          EQUITAS
UJJIVAN  0.956955
------------------------------------


          UJJIVAN
EQUITAS  0.956955
------------------------------------


     RBLBANK
ACC   0.9531
------------------------------------


          SHREECEM
RAMCOCEM  0.964504
------------------------------------


            ACC
RBLBANK  0.9531
------------------------------------


      FEDERALBNK
SBIN    0.957604
------------------------------------


        TITAN  BAJFINANCE
SRF  0.973173    0.969878
------------------------------------


                ONGC  SHREECEM
ULTRACEMCO  0.952448  0.950736
------------------------------------


            HDFCBANK  SHREECEM  BAJAJFINSV     TITAN       SRF  KOTAKBANK
BAJFINANCE  0.962466  0.954447     0.97229  0.964968  0.969878   0.950621
------------------------------------


           BAJFINANCE
KOTAKBANK    0.950621
------------------------------------

Analyse the output and filter further

Here comes the human judgment part and trying to interprete the number with other factors like are these pairs in same sector, are they dependent, who are their customers etc.

So lets identify pairs in same industry, this should be easy

Industry Pair
Pharma TORNTPHARM, CADILAHC
Financial Services HDFCBANK, BAJAJFINSV, BAJFINANCE
Financial Services UJJIVAN, EQUITAS
Financial Services SBIN, FEDERALBNK
Financial Services KOTAK, BAJAJFINANCE
Cement Industry ULTRACEMCO, SHREECEM
Cement Industry RAMCOCEM, SHREECEM
Heavy Industry LT, BEL

Let us further try to analyse some of the interesting findings-

LT and BEL

BEL is a government owned aerospace and defence company, even L&T is a big player in aerospace and defence, but L&T is doing lot of other things too, but both have a lot of dependency on governement policies, government expenditure etc, so there might be a substance to this pair and worth exploring further, hence included in the list.

Correlation of Banks with cement companies or companies like TITAN, SRF

Several cement stocks are highly correlated with Banks, now this might be a genuine relation. This might need more expert analysis to validate the reason behind correlation. When I think about it, Banks might show fairly high correlation to many stocks as their performance directly impacts the economy as a whole, so in context of pair trading these might be ignored.

(ONGC, ULTRACEMCO) or (SRF and TITAN)

Even though these pairs have a high correlation, the industries/sectors are completely different. Either this is an anomaly or I don’t have enough knowledge or experites to understand the relation

Next step

Go back to index

Download notebook