Tutorials/04 Strategy Library/1032 Bitcoin ARIMA Forecasting/03 Method.html at 2f33b7508c01d2d58487b3e9f46def93d08c4373 · QuantConnect/Tutorials · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
<p>
    An ARIMA model requires a stationary time-series, that is, the mean and variance stay relatively the same over time.
    However, Bitcoin went from a few hundred dollars to tens of thousands of dollars in just the past few years, so
    clearly Bitcoin prices aren&apos;t stationary. To transform the price data into a stationary series, we take the first order
    difference of the log of the prices. Given X as the Bitcoin closing prices, we use the following code:
</p>

<div class="section-example-container">
<pre class="python">
    import numpy as np
    X = np.diff(np.log(X))
</pre>
</div>

<p>then to ensure the stationarity of our transformed data, we pass it into the following function:</p>

<div class="section-example-container">
<pre class="python">
    def __is_stationary(self, X, significance_level=.05):
        # include above: from statsmodels.tsa.stattools import adfuller
        result = adfuller(X)
        p_value = result[1]
        return p_value < significance_level
</pre>
</div>

<p>After taking the most recent 400 points of Bitcoin closing prices and performing the transformation, we get a p-value of 2.5e-16 for stationarity, far below our critical value of .05. Since we only want to use stationary data with our ARIMA model, our algorithm stops trading when the recent historical data is no longer stationary after the transformation.</p>

<p>We then want to grid search different ARIMA orders to find the model that minimizes the Mean Squared Error (MSE) on unseen data. The ARIMA order is represented by (p, d, q), where p stands for the past values for Auto-Regression (hence AR in ARIMA), d stands for the degree of differencing (or the order of Integration, hence I in ARIMA), and q stands for how the past errors are accounted for in future predictions (which is a Moving Average model, hence MA in ARIMA). The possible p and q values range between 0 and 5, while the d term is kept at 1. Then, we want to iterate over each (p, d, q) combination to minimize our MSE. Before we show the code for the grid search, let&rsquo;s see how we can evaluate an ARIMA model given a single order:</p>

<div class="section-example-container">
<pre class="python">
    def evaluate_arima_model(X, arima_order, oos_size=20):
        train_data, oos_data = X[:-oos_size], X[-oos_size:]
        history = deque([x for x in train_data], maxlen=len(train_data))

        predictions = []
        for i in range(len(oos_data)):
            model = ARIMA(np.array(history), order=arima_order)
            model_fit = model.fit(disp=0)
            y_hat = model_fit.forecast()[0]
            predictions.append(y_hat)
            history.append(oos_data[i])
        # include above: from sklearn import metrics
        return metrics.mean_squared_error(oos_data, predictions)
</pre>
</div>

<p>We essentially have a rolling window with 80% of the data from the left, and fit an ARIMA model using these points of data along with the specific (p, d, q) order. We forecast out one time-step into the future, record this value along with the actual value. Then, we shift this window to the right, and repeat the last step. We repeat this process until we&rsquo;ve forecasted the remaining values, and then we compute the MSE by plugging in the forecasted and actual values into sklearn&rsquo;s <strong>mean_squared_error</strong> function. Now for the grid search:</p>

<div class="section-example-container">
<pre class="python">
    def Train(X, p_values=range(6), d_values=[1], q_values=range(6)):
        data = transform_data(X)

        if not is_stationary(data):
            return None

        best_score, best_pdq = float("inf"), None
        for p in p_values:
            for d in d_values:
                for q in q_values:
                    order = (p,d,q)
                    try:
                        mse = evaluate_arima_model(data, order)
                        if mse < best_score:
                            best_score, best_pdq = mse, order
                    except:
                        continue

        return best_pdq
</pre>
</div>

<p>As described earlier, we iterate over and evaluate all possible (p, d, q), and choose the best model. </p>

<p>
    The trading logic is quite simple. At the start of each month, we take the past 70 points of data to find the ARIMA
    order that maximizes the out-of-sample MSE. Every day, we fit 50 points of the most recent historical data to an ARIMA
    model given our order, using the order of that month, to forecast one value into the future. Because our data was
    transformed through a logarithm and differencing, we would need to undo the transformation on the forecasted value
    first, and this can be seen in the <strong>__undo_forecast_transform</strong> method in <strong>Model.py</strong>
    under the <strong>Algorithm </strong>section. We then calculate the percent change in price using:
    <strong>forecasted price / current price - 1</strong>. Then, we emit an Insight based on the direction of the percent
    change with the weight of the Insight as the absolute value of the percent change. We use the
    <strong>InsightWeightPortfolioConstructionModel</strong> so that the weight of the Insight determines the portfolio
    allocation percentage, which means larger forecasted moves will have larger allocation.
</p>