Python
Guides
Extracting Data

Extracting Data

⚠️

Backtest mode is no longer actively maintained, as such we provide an alternative method of data extraction that does not involve either Caching or running the Backtest mode.

Deprecated documentation

Besides direct backtesting and live trading, you are also able to extract data from our data providers through cybotrade.

For a programmatic approach to extracting data, you are able to run a Backtest and have a global array that you insert the incoming data into. See below:

class Strategy(BaseStrategy):
    data = []
 
    # When data comes in, append to the 'data' array.
    async def on_datasource_interval(
        self, strategy: StrategyTrader, topic: str, data_list: List[Dict[str, str]]
    ):
        logging.info(f"{data_list}")
        datasource_data = self.data_map[topic]
        self.data.append(datasource_data[-1])
        logging.info(f"datasource_data : {datasource_data[-1]}")
 
    # When the backtest is over, convert 'data' array to csv.
    async def on_backtest_complete(self, strategy):
        df = pd.DataFrame(self.data)
        df["start_time"] = df["start_time"].astype(float)
        df = df[df["start_time"] >= start_time.timestamp() * 1000]
        df = df.drop_duplicates(subset=["start_time"])
        df["time"] = pd.to_datetime(df["start_time"], unit="ms")
        df.to_csv("eth_data_2021_2024.csv")

While this is a straight forward approach for retrieving and extracting data it may take some time having to wait for the backtest to complete.

This brings us into the next method,

Extracting via Caching

There is a far more convenient approach to fetching historical data and that is to leverage the use of our provided caching API.
If you've taken a look at how to enable data caching from the aforementioned link, you would be able to access the stored cache data in your CACHE_DIR.

With this approach, you wouldn't have to have any logic dealing with when the data returns or handle when the backtest ends. This leads to much more performant 'extraction'. A simple minimal code would look like such:

class Strategy(BaseStrategy):
    def __init__(self):
        handler = colorlog.StreamHandler()
        handler.setFormatter(
            colorlog.ColoredFormatter(f"%(log_color)s{Strategy.LOG_FORMAT}")
        )
        file_handler = logging.FileHandler("test-datamap.log")
        file_handler.setLevel(logging.DEBUG)
        super().__init__(log_level=logging.DEBUG, handlers=[handler,file_handler])
 
config = RuntimeConfig(
    mode=RuntimeMode.Backtest,
    datasource_topics=[
        "coinglass|1m|futures/openInterest/ohlc-history?exchange=Binance&symbol=BTCUSDT&interval=1h",
        "cryptoquant|1m|btc/market-data/funding-rates?window=hour&exchange=binance",
        "glassnode|1m|derivatives/options_25delta_skew_1_week?a=BTC&i=1h&e=deribit",
    ],
    candle_topics=[],
    active_order_interval=15,
    start_time=datetime(2024, 5, 1, 0, 0, 0, tzinfo=timezone.utc),
    end_time=datetime(2024, 10, 30, 0, 0, 0, tzinfo=timezone.utc),
    data_count=10,
    cache_path="./CACHE_DIR",
    api_key="YOUR_API_KEY",
    api_secret="YOUR_API_SECRET",
)
 
permutation = Permutation(config);
hyper_parameters = {}
hyper_parameters["hahaha"] = [10]
 
async def test_datamap():
    await permutation.run(hyper_parameters, Strategy)
 
asyncio.run(test_datamap())

After running this, if you were to peek into your set CACHE_DIR and run the following command
For Unix-like:

ls -la

For Windows:

dir /a:h

You should be able to see the 'hidden' cache files like such: Cache Dir example

You can now run this simple 3-line code snippet:

import pandas as pd
json_data = pd.read_json("./CACHE_DIR/{YOUR_CACHED_DATA_NAME}.json")
json_data.to_csv("YOUR_CSV_NAME.csv")

An example:

If your cached data is in a directory called CACHE_DIR, you should be able to open the data as such:

import pandas as pd
json_data = pd.read_json("./CACHE_DIR/.coinglass|1m|futures_openInterest_ohlc-history?exchange=Binance&interval=1h&symbol=BTCUSDT.json")
json_data.to_csv("cryptoquant.csv")

An easier approach is to copy the relative path from VSCode.
With this, you should immediately be able to convert the cached json to csv in little to no time!

cybotrade-datasource

cybotrade-datasource is used to refer to a separate python package that provides the functionality of both historical data extraction as well as live-data streaming. This package is completely detached and unrelated from the Cybotrade that has been explored in this documentation.

We suggest the use of the cybotrade-datasource python package for any extraction/data use-cases that are not related to execution, such as backtesting.

The API documentation for the package is linked here. (opens in a new tab) There are also some simple usage examples in the PyPi project description, which may be more straightforward than the documentation above. (opens in a new tab)

Example Usage

import os
import pandas as pd
import asyncio
import cybotrade_datasource
from datetime import datetime, timezone
 
 
API_KEY = os.environ["API_KEY"]
 
 
async def main():
    data = await cybotrade_datasource.query_paginated(
        api_key=API_KEY,
        topic='cryptoquant|btc/inter-entity-flows/miner-to-miner?from_miner=f2pool&to_miner=all_miner&window=hour',
        start_time=datetime(year=2024, month=1, day=1, tzinfo=timezone.utc),
        end_time=datetime(year=2025, month=1, day=1, tzinfo=timezone.utc)
    )
    df = pd.DataFrame(data)
    print(df)
 
 
asyncio.run(main())

This simple example simply fetches one years worth of cryptoquant data. You may notice that import cybotrade is not necessary here. For additional usages please refer to the documentation or project description linked above.