Navigating Bitcoin Panic-Selling using Linear Approach

COVID-19 affects significant human activity around the globe, including Bitcoin prices. The Bitcoin price is well known for its volatility, so it is not a big shocker when the panic-selling occurs during the pandemic. However, the mechanism to cope with these breakouts, especially the bearish one, is contentious. The experts give numerous pieces of advice with different conclusions in the end. It is also the same with Machine Learning. Various kernels show different results regarding how the price will move. It depends on the window size, how the data is being preprocessed, and the algorithm used. This paper inspects the best combination that various machine learning can offer with a linear approach to navigate the price prediction based on its depth interval, window size until the algorithms themselves. This paper also proposed a new approach to seeing the prediction range called s-steps ahead prediction using a linear model. The result shows that simple machine learning can herd 99.715% profit even during the bearish breakout.


INTRODUCTION
Since the Bitcoin market is live 24/7, it gives realtime dynamics that the stock market cannot offer [1]. Moreover, it is easier to access, resulting in high volatility in its movement [2]. For a short time interval, it is quite hard to oversee where the market is heading, but in the long term, it gets even more confusing [3]- [5]. It is understandable because Bitcoin's very nature is not like the stock market, where it has a real ground commodity to offer. Bitcoin offers a free-bank ecosystem in which everyone can transact their money in decentralized ways [6].
People often talk about how the holder behaves when the stress test occurs [7]. During breakouts, the various big players may test the market by pushing it to its resistance level or support level. When it breaks, the market could go in different directions, mostly in a violent way. But there is a catch; in every extreme turn up or turn down, there will always be a dead cat bounce effect.
Based on Fig.1, it can be seen that two dead cat bounce effects happened at the transaction that occurs between March 15th, 2019, to August 29th, 2019. Two dead cat bounce effects happened [8]. There are significant drops that occur during these times. People who hold great expectations since starting of the bullish trend might get scared seeing a sudden drop on the second wave of the dead cat bounce effect since it might never return to its highest position again. It also allows people who don't buy it at a lower price to sell it again at a higher price in the last period of the dead cat bounce effect, as can be seen in Fig.1. COVID-19 certainly causes massive breakout unintentionally, resulting in panic-selling [9]- [11]. This paper investigates various Machine Learning techniques with a linear approach to navigate this unexpected movement [12].
This paper assumes that every time series elements contribute to how people behave and think decide to buy or sell the coin. Based on this same approach, people used Technical Analysis to plan their portfolios. The only problem in Technical Analysis is prone to human subjectivity. If there are patterns in every decision is made on the market, it is not strange that there must be some correlation between the particular position of the price and the next movement of the price.
Research by [14] achieved a 5.36% mean absolute percentage error (MAPE) in predicting Bitcoin price using an autoregressive integrated moving average (ARIMA). On the other hand, research by [15] classified Bitcoin trends using ARIMA, Long Short-Term Memory (LSTM), recurrent neural network (RNN) with accuracies 52.78%, 50.25%, 50.05%, respectively. Research by [16] also found that Twitter significantly affects the next day's trading volume. This paper proposed a new approach to seeing the prediction range we called -steps ahead prediction using a linear model. This paper also proposed a new approach for preprocessing the raw data based on Fast Fourier Transform and Particle Swarm Optimization behavior. We also conduct window size inspection to get a better understanding of how to analyze the Bitcoin market.

II. RESEARCH METHOD
This paper uses several phases to build the kernel. The first phase is the randomness and slope test to get a basic idea of how the market flow for the past four years. The second phase is a short periodic test to grasp what kind of model is suitable for the predicting. The last stage is deploying full force prediction using hints showed by the second phase called backtracking.

A. Random Test
Four random distributions are used to check the predictability of the market; Uniform, Exponential, Logistic, and Poisson distribution [17]- [20]. Uniform distribution follows the probability density function as ( ) = ( − ) −1 within [ , ) and 0 elsewhere. Exponential distribution follows the probability density function as ( , ) = . Logistic distribution follows the probability density function as, Last, Poisson distribution follows the probability density function as ( , ) = − ! ⁄ .
If the market is well predicted using a particular random prediction, then the market trend is followed by these distributions. If all the distributions fail to predict the market, then there is a chance that the market is not random, after all.

B. Slope Test
The slope can be defined as how big the change occurs. In mathematical terms, it can be written as = ( + − )/ . We might consider slope at a specific point of a function ( ) as ′( ). However, it's not feasible to take the derivative or the slope of a market for a time range of nearly 0 seconds, lim → 0 . The market's movement for little time depth is adorned by flakes and noises fluctuating temporary price in a short amount of time. Thus, it is more probable to take slope by setting the in the higher term while minimizing the noise effect in the market [21].

C. Linear Models
Let's assume the movement pattern is consistent all over time. If ̅ is the series of specific time ranges, then there must be a vector ̅ that satisfies = 0 + ∑ +1 ( , ) + , where is the predicted price, is the function model, and is the error. The objective is to find ̅, where it holds the minimum value. Since ̅ will be constant, then ̅ can also be considered the pattern of the market-pair. In this paper, the model that can satisfy this equation also will be called the kernel [22] [23].
Five kernels will be used to check the regularity of the movements, 1. Multi-task Lasso (MTL), 2. Lasso LARS (LLC), 3. Huber Regressor (HR), 4. Multi-task ElasticNet (MTLENC), and 5. Ridge Regression (Ridge). These kernels will be tested its accuracy on 4 hours interval market movement. The five kernels' performance will be evaluated progressively on this market data. If the kernel didn't perform well during the test, it would be discharged for further schemes. Then, the kernel with plausible prediction will be chosen to do full force automatic daily trading simulation. In this paper, all kernels used the crossvalidation technique, except Huber and Ridge Regression. a) Multi-task Lasso: In Multi-task Lasso, the objective function to minimize can be expressed as follows.
This Regression can overcome multicollinearity [25]. In the case of this kernel, Least Angle Regression and Shrinkage (LARS) algorithm is used to estimate the Lasso parameters effectively [26] [27]. However, since this algorithm is performed based on residuals iterative computation, it might not be robust with the presence of noise [28]. c) Huber Regressor: Hubber Regression treats the sample with two different loss function, squared loss ℓ 2 and absolute loss ℓ 1 : ( ,̂) = ( −̂) 2 when the residual is lesser than or equal to ℎ, and ( ,̂) = | − | when the residual is higher than ℎ, where ℎ is the hyperparameter. Thus, for loss that falls into the ℓ 2 category, this Regression behaves similarly with the normal distribution. However, since this regression technique behaves similarly with Laplace distribution for the loss higher than ℎ, this very nature makes Huber Regression robust to outliers [29].
Outliers in the market can be compared to how the market behaves during breakout [30]. And the good news about the breakout is that there will be a dead cat bounce effect after the sudden move. How much the change caused by the dead cat bounce effect can be predicted using the Fibonacci Fan technique. The question is that whether after the dead cat bounce effect, the price will correct itself, or it returns to a new trend phase. Thus, Hubber Regression makes an excellent candidate to predict the overall market movement. d) Multi-task ElasticNet: Multi-task ElasticNet trains the data with a mixture of ℓ 1 , ℓ 2 -norm with ℓ 2 regularization. It has the ability to estimates sparse coefficients. The following objective is used to perform this kernel.
(W, ) = ‖XW − Y‖ Fro 2 + 21 1 ‖W‖ 21 + e) Ridge Regression: Ridge Regression, also is known as Tikhonov regularization, can solve the multicollinearity problem. It is started by standardization of the data value by subtracting its means and dividing by its standard deviation. Thus, if Ŵ = (X′X) −1 X′Y can approach W from Y = XW + in ordinary least squares, then Ridge Regression approached W by adding value to its diagonal X′X elements Ŵ = (X′X + I) −1 X′Y [32].
Ridge Regression is said to have stabler performance when there are small changes in the data. Thus, it is one of the excellent candidates in predicting Bitcoin price movement [33].

D. Fast Fourier Transform Approach
The Fast Fourier Transform (FFT) is one technique to perform the Discrete Fourier Transform (DFT). DFT itself converts a series of complex numbers from time domain into another equal size of complex number series in the frequency domain. The DFT can be described as The lower index of indicates the lower frequency, and the higher index of indicates the higher frequency [34].
Since the market movement in short depth is dominated by speculator noise, then it is probable that we filter the market price by its frequency. If the intention is for a long-term trend, it might be beneficial to remove the higher frequency altogether. It might be beneficial for the daily trader if the lower frequency is removed, remaining the full spectrum of speculator movement. This technique will be used to preprocess the raw data before being fed to the kernel training.

E. Particle Swarm Optimization Approach
The basic idea of Particle Swarm Optimization (PSO) is combining the local trend and global trend of the swarm. It can be denoted using the following formula.
where +1 is the velocity next of the th particle, compared to the current velocity, denoted as .
In this paper, a scheme that mimics this behavior also is applied. The short time interval price movement is what limited eye view can see; however, the long time interval movement is why another trend dominates particular trends. When the price didn't have any meaningful activity for a long time with small exchanges, then the market's liquidity is on the verge of collapse, whether it will go into bullish or bearish. This collapse can be seen as a conclusion of the market to take another heed. But when the volatility is vast, and the market price is still searching for stability, we can safely say that this movement is remarkably similar in what individual confidence means in the PSO paradigm.
In this paper, the PSO is not used for parameter optimization as previous researches did. This paper extracts the individual confidence and the swarm confidence from the PSO paradigm into a short time interval price window and a longtime interval price window. This technique also will be used to preprocess the raw data before being fed to the kernel training.

F. Data Preprocessing
Bitcoin pairs, just like any other pairs, have five primary data that can be used for elemental analysis: Since CP is the next OP of the price, only OP, MxP, and MnP will be evaluated to avoid multicollinearity in this paper.
Let's say ̅ =< 0 , 1 , … , −1 > is the OP of the Bitcoin through a specific range time. First, the data will be expanded into matrix form, X and Y . X is defined as X I×J , where , = + , I + J = | ̅ | , and J = , which is the window time. On the other hand, Y is defined as Y I×1 , where ,1 = + + −1 and is the number of steps of the price that will be predicted.
For example, if we had data collection from 10 consecutive days within a one-day interval and would like to predict Bitcoin's price at three days after that ten days, then the value of will be 10, the value of will be 3. In this paper, the best optimum number of and , where , ∈ ℕ, are inspected to yield the best prediction result.

G. Schemes
This paper used the 4 hours interval (4H) within the four-year Bitcoin price movement data range between August 17th, 2017, until July 24th, 2020. Fig. 2 shows the training and testing division. Based on the figure, we can see that the training data set is set to the first 80% of the market movement data.
After that, the data will be evaluated under six massive schemes, as mentioned in Table 1. Scheme A used the raw data sequence based on OP. Scheme B, however, used global maximum based normalization. Hence ̅ = ̅ /max ̅ , where ̅ is the normalized sequence. This ̅ sequence will be expanded into matrix form too, which is X. Scheme C used local maximum within window range for its normalization, hence its expanded form will be , = + /max < , +1 , … , + −1 >. Schemes A to E will be labeled with the steps that follow. For example, if scheme A is used to predict the next -steps of price, it will be labeled as scheme A-. The steps inspected in this paper are 1 ≤ ≤ 4. As shown in Table 2, several schemes are used to inspect the probability of riding the wave based on the FFT definition. Let's assume ̅ is the frequency domain version of ̅ , and is the index between 0 < < | ̅ |/2, where we can consider any index within the range [0, ) as lower frequencies, and any index within the range [ , | ̅ |) as higher frequencies.
Schemes FFT will be assigned with its respective cutting percentage. For example, if scheme FFT-A sacrifices 25% of its frequencies' elements, it will be labeled as scheme FFT-A 25% cut. The cutting variants are 25%, 50%, and 75%.

A. Random Test
Random Test is conducted by assuming the price movement is in random walk mode, which is increased or decreased by or random sequences following four different distributions mentioned before. Thus, +1 ′ = + , where +1 ′ is the next price prediction, and is the previous real price. In this test, we just want to predict the movement trend, whether bullish or bearish, resulting in +1 ′ − = . This result predicted difference price will be compared with the actual difference price, +1 − . First, the up and down movement of Bitcoin is calculated from 4 years Bitcoin price movement, up for +1 − ≥ 0, and down for +1 − < 0. Then, each random distribution will generate the corresponding random walk movement for four years. This process is iterated 100 times. For the Exponential and Poisson distribution, the random walk movement is generated by combining them with Uniform distribution with a range from [−1, 1) to ensure the generated number has negative and positive values.
The result can be seen in Fig.3, where Uniform, Exponential, Logistic, and Poisson distribution has accuracy 0.5001, 0.4999, 0.5001, and 0.4949, respectively. None of these distributions can even guess higher than 0.51 accuracy. Thus, the Bitcoin market didn't work on a random process. In other words, patterns are dominating the Bitcoin market.

B. Slope Test
In this slope test, two different variables are being tested. The variables used to determine the best combination to predict the next price movement, the number of steps behind the price that will be observed its difference, and the number of steps ahead of the price that will be predicted. If ( + − )/ ≥ 0, then + + +1 is considered up, and vice versa. Fig.4 shows the dynamic between two variables in predicting its accuracy with 1 ≤ < 20 and 1 ≤ < 5.

Fig.4. Slope Prediction Accuracy
It can be seen that it seems like the smaller the value, and the higher the value, the higher the accuracy it gets. Thus, a further test is performed for = 1 and 20 ≤ < 100 to verify this tendency. its prediction peak at 0.5202. Compare it with Fig. 4, where most of the accuracies lie below 0.5. The value seems to indicate that, indeed, there is a pattern buried within market noise movement. This result suggests that it doesn't need a lot of samples to determine the next movement. We need to amplify this accuracy.

C. Schemes
The Linear Model test gives insightful results regarding how the market behaves. Fig.6 shows Scheme A of Linear Models performance over four prediction cases. All kernels perform quite excellent in detecting the pattern only within 20-ish window size. Multi-task ElasticNet Regression, however, shows stable yet not so high accuracy in predicting the market movement. Except for Scheme A-4, Multi-task ElasticNet Regression indicates that the higher the window value, the lower the accuracy to predict the next movement. It can be seen that, within Scheme A, Multi-task Lasso Regression and Ridge Regression gives quite an outstanding result in detecting movement regardless of market noise. The case of Multi-task Lasso Regression for higher value enforces the Multi-task ElasticNet Regression result, which indicates that it doesn't need a high window to understand how the market behaves. The 20-ish window size in 4H interval translates to nearly three days range view. Most of the traders in this time scale, dominated by daily traders, only trade based on a small range of historical data movement, usually until three days ago. Fig.7 shows the Scheme B performance after the data is normalized with maximum global value; in other words, the input data is restricted only with range [0, 1] . It can be seen that Multi-task Lasso experiences a significant accuracy drop when the value is normalized. On the contrary, Multi-task ElasticNet keeps its high consistency with higher accuracy. It even gets better for Ridge Regression for higher window value within 2-steps scope prediction.
This result gives us an insightful understanding that even though there is a lot of noise within a short time interval, Ridge Regression provides us with a hint that there is a stabler pattern within a longer window and time interval. Ridge Regression understands it; daily traders dominate even the data. Multi-task ElasticNet result suggests that this kernel is stable, whether normalized or not. Its accuracy is increased for normalized data. Its performance graph even indicates a limit in detecting patterns because of its significant change in accuracy performance. Scheme B-4 performance shows that, for a concise amount of data, 7 window, it didn't give the same accuracy when the window is higher than 7. It can be understood that the highest accuracy reaches its 0.5674 level by Ridge Regression. The lowest accuracy reaches its 0.4832 level by Multi-task Lasso. Fig.8 shows the Scheme C performance after the data is normalized with maximum local value. Now, the result indicates an even exciting result. Overall, the data has higher accuracy than the previous scheme, especially the 2-steps ahead prediction. All kernels, excluding Multi-task Lasso, move toward the same direction along with a higher window value. Suppose scheme A suggests that we don't need a high window value to predict. In that case, this specialized normalized version of data reveals that more dominating holders exist in a more extended time interval domain.
In other words, based on scheme A, B, and C result, the following deduction can be drawn, 3. Scheme C: some more dominating holders decide based on maximum local value with historical data for more than eight days. This result gives us a better strategy for navigating the price movement, which we should monitor most, and which one we can consider has the lower benefit. Based on these schemes, it suggests that 2-steps ahead prediction is the most plausible. It can be understood that the highest accuracy reaches its 0.5705 level. The lowest accuracy reaches its 0.4656 level by Multi-task Lasso. Fig.9 and Fig.10 shows the performance of the weighted price inputted. In this regard, the first scheme is scheme D, where the local max based normalized price is multiplied by its global max based normalized price. The inputs are guaranteed to have a range within [0, 1]. Since scheme D incorporates the short view with the long view of the price movement, it is not surprising that scheme D's performances achieve higher accuracy performance than the previous schemes, shown in Fig.9.
Multi-task Lasso still has the lowest prediction accuracy, which confirms that normalized inputs don't fit well with this Regression. On the other hand, Ridge Regression keeps its performance at the highest accuracy. Not only that, but it also gains the highest accuracy with 0.5734, beating all the previous regression schemes.
It can be understood that the highest accuracy reaches its 0.5734 level by Multi-task ElasticNet. The lowest accuracy reaches its 0.4806 level by Multi-task Lasso. In this scheme, Ridge Regression didn't hint like scheme C, where the higher the window, the more accurate the result. This condition might be caused by the fact that the short time interval is being incorporated, resulting in negating the longer time interval tendencies to be more accurate for a higher window. Since longer time interval dominates more than a short time interval, scheme E used = 0.75 value to calculate its formula, = ( ) +(1 − ) ( ) . Since Multi-task Lasso didn't perform well within the normalized value, this scheme excluded this kernel. The performance of each kernel can be seen in Fig.10. The highest accuracy reaches 0.5684 value, and the lowest accuracy reaches 0.4866 value. The same kernel, Huber Regressor, possesses both numbers. Based on the overall result, Huber Regressor is the most volatile in terms of giving accuracy performance. Ridge Regressor still keeps its stability in its execution. On the other hand, Lasso LARS provides another hint for E-2 and E-3 that a higher window might better predict when the kernel learns historical data higher than eight days. Although this scheme doesn't perform as well as scheme C or D, this scheme gives the best lowest accuracy compared to the others. As can be seen in Fig.11, this scheme, among many other schemes, gives the lowest accuracy performance. The highest accuracy reaches its 0.5297 level, and the lowest accuracy reaches its 0.4449 level. It has a similar result with the slope test. Thus, it is clear that the majority trend didn't guarantee the next movement.  Table 3 shows the overall result of each scheme. Predicting bitcoin next movement based on weighting local max based normalized data with global max based normalized data through multiplication returns the highest prediction among any schemes. On the other hand, determining the next price based on random thought is proven as a lousy strategy.

IV. DISCUSSION
Since scheme D has the highest accuracy performance, this scheme will be used as the base for our automatic market trader. But first, let's check its hit accuracy performance. All these schemes are predicting whether the next opening price is up or down. But the price movement is not that simple. Even the next opening price is higher than the previous opening price, there is always a probability that the price went down for a moment before its rising. This kind of movement is captured inside the minimum price and maximum price range. Thus, it begets another question, what is the appropriate price position we should take for the next prediction. The predicted next price should be checked to ensure the decision's safety, whether it is inside the minimum price and maximum price range. Fig.12 shows the hit accuracy of scheme D. The highest hit accuracy, 80.3%, is attained by Huber Regressor at 2-steps ahead scheme with two window size. The higher the window size, the lower the hit accuracy gets. On the other hand, the lowest hit accuracy, 13.2%, is attained by Huber Regressor at 4steps ahead scheme with 49 window size. However, this result contradicts our former assumption, where the higher the window, the higher the accuracy becomes. Most Bitcoin markets implement transaction fees, called maker and taker fees. Maker is the condition where the order is processed until when another trader agreed and executed our order. Taker is the condition when the order is immediately executed with the current market liquidity. Since taker order is much more spontaneous than maker order, some markets apply higher fees. For example, Binance used a 0.10% fee for all maker and taker orders. Thus, to make a profit in our trading, if we buy the coin at a price , we should sell our coin at a price +1 where it should be higher than / 0.999 2 . In other words, when the bitcoin price moves higher than 0.2003%, then at that point, we can gain profit.
Since this 0.2003% increment is a tricky move for a short time interval, a high hit accuracy rate for a short time interval doesn't guarantee the movement already gives a profit to the trader. Further spot trading simulation is needed to inspect the real profitability of this scheme D-2. Jurnal Infotel Vol. 12   The highest performer is claimed by Huber Regressor, where it reaches 99.715% profitability, with 2-steps ahead prediction and window size as 18. In other words, if the trader has $1000 as his initial capital, with this scheme at the end of the day, he will get $1997.15 as his final capital. However, the loss will occur for all kernels if the window size is smaller than 4.  Table IV shows that Huber Regressor herd the highest profitability among four kernels, yet the most volatile. On the other hand, Ridge Regressor has the most stable performance for all window size schemes, followed by Multi-task ElasticNet. Thus, Ridge Regressor could be the best choice for low-risk taker stability, although the gain is not that high compared to others. For the high-risk taker, Huber Regressor might be the best choice. Finally, let's look at how Huber Regressor, with its 18 window size and 2-steps ahead scenario, keeps capital growth during Bitcoin panic-selling due to the COVID-19 outbreak. Fig. 14 shows how this scheme guard the capital loss during COVID-19 panic-selling. Even the Bitcoin price fell more than 50%; the initial capital didn't lose more than 75% from its original valuethis scheme minimizing the loss during the panicselling period. After the panic-selling period is over, this scheme boosts the trading even more during a bullish or recovery period.

V. CONCLUSION
It combined a short time with a long interval market trend movement by multiplying their normalized prices. Similar to the PSO approach, a high window period yields promising results for growing capital value in the Bitcoin market using Lasso Lars, Huber, Multi-task Elastic Net, and Ridge Regressor. The best kernel design is not being made for predicting the next price, but for predicting the next second price. This result might be caused by the daily trader's fluctuation that dominates a short time and the trend dominated by traders who decide based on historical data for more than eight days. This combination can be tackled well by Huber Regressor with a relatively medium window size, resulting in protecting capital during the panic-selling and boosting capital during the bullish period.