Python
Guides
Extracting Data

Extracting Data

Beside direct backtesting and live trading, you are also able to extract data from our data providers through cybotrade.

For a programmatic approach to extracting data, you are able to run a Backtest and have a global array that you insert the incoming data into. See below:

class Strategy(BaseStrategy):
    data = []
 
    # When data comes in, append to the 'data' array.
    async def on_datasource_interval(
        self, strategy: StrategyTrader, topic: str, data_list: List[Dict[str, str]]
    ):
        logging.info(f"{data_list}")
        datasource_data = self.data_map[topic]
        self.data.append(datasource_data[-1])
        logging.info(f"datasource_data : {datasource_data[-1]}")
 
    # When the backtest is over, convert 'data' array to csv.
    async def on_backtest_complete(self, strategy):
        df = pd.DataFrame(self.data)
        df["start_time"] = df["start_time"].astype(float)
        df = df[df["start_time"] >= start_time.timestamp() * 1000]
        df = df.drop_duplicates(subset=["start_time"])
        df["time"] = pd.to_datetime(df["start_time"], unit="ms")
        df.to_csv("eth_data_2021_2024.csv")

While this is a straight forward approach for retrieving and extracting data it may take some time having to wait for the backtest to complete.

This brings us into the next method,

Extracting via Caching

There is a far more convenient approach to fetching historical data and that is to leverage the use of our provided caching API.
If you've taken a look at how to enable data caching from the aforementioned link, you would be able to access the stored cache data in your CACHE_DIR.

With this approach, you wouldn't have to have any logic dealing with when the data returns or handle when the backtest ends. This leads to much more performant 'extraction'. A simple minimal code would look like such:

class Strategy(BaseStrategy):
    def __init__(self):
        handler = colorlog.StreamHandler()
        handler.setFormatter(
            colorlog.ColoredFormatter(f"%(log_color)s{Strategy.LOG_FORMAT}")
        )
        file_handler = logging.FileHandler("test-datamap.log")
        file_handler.setLevel(logging.DEBUG)
        super().__init__(log_level=logging.DEBUG, handlers=[handler,file_handler])
 
config = RuntimeConfig(
    mode=RuntimeMode.Backtest,
    datasource_topics=[
        "coinglass|1m|futures/openInterest/ohlc-history?exchange=Binance&symbol=BTCUSDT&interval=1h",
        "cryptoquant|1m|btc/market-data/funding-rates?window=hour&exchange=binance",
        "glassnode|1m|derivatives/options_25delta_skew_1_week?a=BTC&i=1h&e=deribit",
    ],
    candle_topics=[],
    active_order_interval=15,
    start_time=datetime(2024, 5, 1, 0, 0, 0, tzinfo=timezone.utc),
    end_time=datetime(2024, 10, 30, 0, 0, 0, tzinfo=timezone.utc),
    data_count=10,
    cache_path="./CACHE_DIR",
    api_key="YOUR_API_KEY",
    api_secret="YOUR_API_SECRET",
)
 
permutation = Permutation(config);
hyper_parameters = {}
hyper_parameters["hahaha"] = [10]
 
async def test_datamap():
    await permutation.run(hyper_parameters, Strategy)
 
asyncio.run(test_datamap())

After running this, if you were to peek into your set CACHE_DIR and run the following command
For Unix-like:

ls -la

For Windows:

dir /a:h

You should be able to see the 'hidden' cache files like such: Cache Dir example

You can now run this simple 3-line code snippet:

import pandas as pd
json_data = pd.read_json("./CACHE_DIR/{YOUR_CACHED_DATA_NAME}.json")
json_data.to_csv("YOUR_CSV_NAME.csv")

An example:

If your cached data is in a directory called CACHE_DIR, you should be able to open the data as such:

import pandas as pd
json_data = pd.read_json("./CACHE_DIR/.coinglass|1m|futures_openInterest_ohlc-history?exchange=Binance&interval=1h&symbol=BTCUSDT.json")
json_data.to_csv("cryptoquant.csv")

An easier approach is to copy the relative path from VSCode.
With this, you should immediately be able to convert the cached json to csv in little to no time!