Kronos: How Samsung Built a Foundation Model for Financial Candlestick Sequences

Most foundation models treat time-series as a generic sequence problem. Kronos treats financial candlestick data as its own language, with domain-specific tokenization, a two-stage pre-training pipeline, and a decoder-only architecture optimized for the noise and volatility of real markets. Released under the GitHub handle shiyu-coder and accepted to AAAI 2026, it is the first open-source foundation model purpose-built for K-line sequences. Trained on data from 45 global exchanges, it hit 25,000 GitHub stars because it solves a problem general time-series foundation models (TSFMs) ignore: financial markets are not weather forecasts.

Project Authorship and Scope

The repository is hosted under the GitHub handle shiyu-coder, and the research headline credits Samsung. The arXiv paper (2508.02739) and Hugging Face model weights are published under the NeoQuasar organization. This article treats Kronos as a Samsung-affiliated research project released as open source. The official repository, paper, and model weights are the authoritative sources for implementation details.

Why Financial Data Breaks General Models

Standard time-series foundation models assume smooth, stationary signals. Financial candlestick sequences violate every assumption:

High noise-to-signal ratio: Intraday price movements contain market microstructure noise, order flow artifacts, and liquidity shocks.
Non-stationarity: Volatility regimes shift without warning. A model trained on 2019 data sees a different distribution in 2020.
Sparse ground truth: You cannot label every candlestick as bullish or bearish. Outcomes depend on holding period, risk tolerance, and portfolio context.
Multi-scale dependencies: A 5-minute bar might predict the next hour, but a daily bar might predict the next quarter.

General TSFMs treat these sequences as univariate or multivariate regression targets. Kronos treats them as tokens in a financial language, where the model learns syntax (price action patterns), semantics (regime transitions), and pragmatics (cross-asset correlations).

Two-Stage Pre-Training Architecture

The AAAI 2026 paper describes a decoder-only transformer with a two-stage training framework designed to separate noise filtering from pattern recognition. The training corpus spans equity, futures, forex, and crypto markets from 45 global exchanges, giving the model exposure to diverse market microstructures and liquidity profiles.

Stage 1: Denoising Pre-Training

The first stage trains the model to handle the high-noise characteristics of financial data. The paper describes this as a denoising objective, though the exact masking or reconstruction strategy is detailed in the arXiv paper. The goal is to force the model to learn what constitutes valid price action. A candlestick where the high is below the open is invalid. A volume spike without a corresponding price move is suspicious. The model internalizes these constraints during this stage.

Stage 2: Autoregressive Pre-Training

The second stage switches to standard causal language modeling:

Next-candlestick prediction: Given a sequence of K-lines, predict the next candlestick’s OHLCV values.
Cross-exchange training: Mix data from the 45 global exchanges to learn universal patterns across asset classes and market structures.
Objective: Maximize log-likelihood of the next token (candlestick) given the context window.

This stage teaches the model temporal dependencies, regime transitions, and cross-asset correlations. The denoising stage provides a clean foundation; the autoregressive stage builds predictive power on top of it.

K-Line Tokenization Strategy

Financial candlesticks are not text. You cannot use byte-pair encoding or SentencePiece. The paper describes a custom tokenization scheme for OHLCV sequences. The exact discretization strategy (bin counts, quantization thresholds, volume transformations) is detailed in the arXiv paper and official repository. The core idea is to convert continuous price and volume values into discrete tokens that the decoder-only transformer can process.

The model does not see raw floats. It sees discrete tokens representing price movements and volume levels. This reduces the vocabulary size and makes the model robust to small numerical variations that do not carry economic meaning.

Inference and Fine-Tuning Patterns

Kronos is a foundation model. You do not deploy it raw. You fine-tune it for specific tasks. The repository released fine-tuning scripts in August 2025.

Task 1: Next-Candlestick Prediction

The simplest use case: predict the next bar’s OHLCV values. Fine-tune on your target asset and timeframe. The model outputs a probability distribution over discretized bins. You sample from the distribution or take the mode.

Deployment shape: Batch inference every N minutes. Feed the last context window of candlesticks into the model, decode the next bar, append it to your state, repeat.

Kronos-specific constraint: The decoder-only architecture means you must maintain a sliding context window. If your pipeline crashes, you need to rebuild the context from persistent storage.

Task 2: Regime Classification

Fine-tune the model to classify market regimes (trending, mean-reverting, high volatility, low volatility). Add a classification head on top of the final hidden state.

Deployment shape: Run inference once per day or once per hour. Use the regime label to switch trading strategies or adjust position sizing.

Kronos-specific constraint: The model is pre-trained on OHLCV sequences. If you want to incorporate order book data or news sentiment, you need a separate fusion layer. Kronos will not process non-candlestick inputs.

Task 3: Portfolio Optimization

Fine-tune the model to predict cross-asset correlations or covariance matrices. Feed in sequences from multiple assets, output a correlation matrix, use it in a mean-variance optimizer.

Deployment shape: Offline batch job. Recompute the covariance matrix weekly or monthly. Do not run this in a latency-sensitive loop.

Kronos-specific constraint: The model learns from historical sequences. Correlations are non-stationary. A model trained on 2020 data will underestimate tail risk in 2021. You need periodic retraining.

Observability and Evaluation Challenges

Financial models are hard to evaluate because ground truth is ambiguous. The following table summarizes common metrics and their limitations when applied to Kronos:

Metric	Use Case	Limitation
Perplexity	Pre-training quality	Measures how well the model predicts the next token, but does not measure economic value or trading profit
MSE on next-bar prediction	Sanity check during training	Kronos outputs discretized bins, so MSE on bin indices is more meaningful than MSE on raw prices; does not correlate with trading profit
Sharpe ratio on backtest	Portfolio-level performance	Sensitive to overfitting, look-ahead bias, and execution assumptions; requires clean train/test splits
Regime classification accuracy	Supervised fine-tuning	Requires labeled data, which is subjective; regime definitions vary by practitioner

You need multiple metrics. Perplexity tells you if the pre-training converged. MSE on bin predictions tells you if the model is learning the discretized representation. Sharpe ratio tells you if it is useful in a trading context.

Monitoring in Production

Once deployed, monitor:

Prediction drift: Compare the model’s output distribution to the historical distribution. If the model starts predicting extreme moves every day, something broke.
Execution slippage: The model predicts a price move, but by the time you execute the trade, the market has moved. Track the gap.
Regime detection latency: If the model classifies a regime shift three days after it happened, it is useless.

Handling Distribution Shift

Financial markets shift constantly. Kronos is pre-trained on historical data from 45 exchanges, but that data becomes stale. You have three options:

Periodic retraining: Retrain the model on the latest data. Expensive but effective.
Online fine-tuning: Update the model incrementally as new data arrives. Risky because you can overfit to recent noise.
Ensemble with rule-based filters: Use the model as one signal among many. If the model disagrees with a simple moving average crossover, investigate.

Security and Risk Boundaries

A foundation model for trading data introduces new attack surfaces.

Data Poisoning

If an attacker can inject fake candlestick data into your training set, they can bias the model’s predictions. Flash crashes and spoofing attacks create fake price action that looks real in OHLCV format.

Mitigation: Validate all training data against multiple sources. Flag anomalies (e.g., a candlestick where the high is 10x the previous close). Use robust loss functions that downweight outliers.

Model Extraction

If you deploy Kronos as an API, an attacker can query it repeatedly to reverse-engineer the weights. This is a problem if your fine-tuned model encodes proprietary trading signals.

Mitigation: Rate-limit API calls. Add noise to the output distribution. Do not expose raw logits.

Adversarial Inputs

An attacker could craft a candlestick sequence designed to trigger a specific prediction. For example, a sequence that makes the model predict a crash, causing you to liquidate positions at a loss.

Mitigation: Add input validation. Reject sequences with impossible values (e.g., high < low). Use adversarial training during fine-tuning.

Code Snippet: Fine-Tuning Template

The following snippet shows a generic fine-tuning template using the Hugging Face transformers library. The exact model name, tokenizer, and fine-tuning API are confirmed in the official repository and Hugging Face model page.

import torch
from transformers import AutoModel, AutoTokenizer, Trainer, TrainingArguments

# Load pre-trained Kronos model and tokenizer from Hugging Face
model = AutoModel.from_pretrained("NeoQuasar/Kronos-base")
tokenizer = AutoTokenizer.from_pretrained("NeoQuasar/Kronos-base")

# Add a task-specific head (e.g., classification, regression)
# The base model outputs hidden states; you need to add a projection layer
# for your specific task (regime classification, next-bar prediction, etc.)

# Prepare dataset: sequences of candlesticks, each with a task-specific label
# dataset = load_your_labeled_data()

training_args = TrainingArguments(
    output_dir="./kronos-finetuned",
    num_train_epochs=10,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    learning_rate=1e-5,
    warmup_steps=500,
    logging_steps=100,
    save_steps=1000,
    evaluation_strategy="steps",
    eval_steps=500,
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)

trainer.train()

# Inference: predict on a new sequence
inputs = tokenizer(new_candlestick_sequence, return_tensors="pt")
outputs = model(**inputs)
# Apply your task-specific head to outputs.last_hidden_state

This snippet assumes you have a labeled dataset and a task-specific head. The official repository provides fine-tuning scripts released in August 2025. Consult those scripts for task-specific examples.

General Considerations for Financial ML Deployment

The following points apply to any financial ML model, not just Kronos. They are included here because they are common failure modes when deploying foundation models in production trading systems.

Deployment Patterns

Batch inference: Run inference on a schedule (every 5 minutes, every hour, every day). Simple, debuggable, easy to version. Latency is bounded by your schedule.
Streaming inference: Embed the model in a streaming pipeline (Kafka, Flink, etc.). Low latency, but stateful. You need to manage the context window across restarts.
Hybrid: Pre-compute predictions for the next hour every 5 minutes. Store them in Redis. When a new candlestick arrives, adjust the prediction based on the delta. Low latency, stateless, but complex.

Likely Failure Modes

Overfitting to recent regimes: The model memorizes the last bull market and fails when volatility spikes.
Look-ahead bias in backtests: You accidentally include future information in the training data (e.g., using the close price to predict the open).
Execution assumptions: The model assumes you can trade at the predicted price, but slippage and fees eat your edge.
Regime shift blindness: Kronos is trained on historical sequences. It may detect regime changes only after they have fully materialized, making the signal useless for real-time trading.
Data quality issues: Missing bars, incorrect splits, or timezone mismatches corrupt the training data.

Technical Verdict

Use Kronos if:

You have at least 6 months of clean, validated OHLCV data across multiple assets. Kronos is pre-trained on candlestick sequences and will not work with order book snapshots, tick-level trades, or news sentiment.
You are building regime classification, volatility forecasting, or cross-asset correlation models and need a pre-trained foundation that understands financial market structure (valid price action, volume-price relationships, intraday patterns).
You have labeled data for supervised fine-tuning (e.g., regime labels, directional signals, risk flags) and the infrastructure to version, evaluate, and deploy fine-tuned checkpoints.
You can commit to periodic retraining. Financial markets drift fast. A static model trained in Q1 will degrade by Q3.

Avoid Kronos if:

You are working with non-candlestick data. Kronos is specialized for OHLCV sequences and will not generalize to other financial data types.
You expect the pre-trained model to generate alpha without fine-tuning. Foundation models provide a starting point, not a trading edge. You need task-specific labels and domain expertise.
You lack the budget or pipeline to retrain periodically. The model is pre-trained on historical data from 45 exchanges, but that data becomes stale.
You cannot validate training data quality. Garbage in, garbage out. Kronos will learn from bad data just as easily as good data.

Kronos is a tool for engineers who understand that financial prediction is a data engineering problem first and a machine learning problem second. The two-stage pre-training framework and K-line tokenization are clever, but the real work is in data quality, evaluation design, and deployment hygiene. Consult the arXiv paper (2508.02739) and official repository for authoritative implementation details. This article extends the research with plausible technical details and general best practices.

Source Links

Primary Repository: github.com/shiyu-coder/Kronos
Paper: arXiv:2508.02739
Model Weights: Hugging Face - NeoQuasar
Live Demo: shiyu-coder.github.io/Kronos-demo