MdevTrade: From Idea to a Complete AI Trading Agent

Table of Contents

It all started with an open-source repository I stumbled upon. This is a journey deep into the First Principles of applying LLMs to financial trading, and how we can use determinism to control a probabilistic brain.

I. It Started with a GitHub Repo #

It started on a free evening, scrolling GitHub. I forked the TradingAgents repository on impulse. Instead of using dry if-else statements to scan RSI or MACD indicators, this project used a completely different model: Multi-Agent Personas. It simulated a miniature trading floor right in the computer’s RAM. There was a news analyst, Bull and Bear factions arguing vehemently, and a Manager stepping in to make the final decision.

At the time, I asked myself: Are Large Language Models (LLMs) truly capable of synthesizing macroeconomic news to forecast the market? Or is all that sharp reasoning just sophisticated “hallucination” born from mathematical probability?

To find the answer myself, I decided to borrow the core idea of this repo and roll up my sleeves to code a more experimental and safer version. Whether this system performs well or not, I honestly cannot say for sure, but I have already pressed the start button. I’ll just let it run for a while to see if it burns all my money or turns a profit.

To ensure I can sleep soundly at night, I set up a strict risk limit: No Margin, no leverage. The system only trades Spot using Dollar-Cost Averaging (DCA) on tokenized Gold (XAUT/USDT). The reason is that with Spot trading, aside from the basic trading fees, there are almost no hidden costs (like Funding fees or overnight interest). This allows the system to comfortably “hold” long-term assets without worrying about the account “bleeding” out.

I decided to share this entire custom version. Below is what I did in the source code, along with my research findings.

II. Feasibility Research and Customization #

Before touching the code, I sat down with the academic literature to figure out whether the idea even held up. Five takeaways shaped the customizations below.

1. The Half-Life of News and the Forecasting Sweet Spot #

My initial idea was quite greedy: use AI to forecast long-term trends spanning 2 to 6 months.

However, academic research indicates that financial news has a very short half-life. Forcing a model to forecast too far ahead leaves it blind to exogenous shocks. Instead of admitting “I don’t know,” the LLM will hallucinate by linearly extrapolating today’s data into next month, leading to disaster.

My Customization: I shortened the forecasting window to a 1-to-4-week timeframe. This is the “sweet spot” where LLMs synthesize market sentiment best. At the Execution layer, I use a dynamic ATR (Average True Range) to allow the system to automatically take profits during these short-term volatility swings.

2. Context Dilution and Separating Compute from Inference #

A fatal trap when building AI is dumping 200 days of OHLCV candles straight into the prompt, hoping it will somehow “see” the 200 SMA line. LLMs are excellent at processing text but terrible at array mathematics. Cramming a mountain of numbers into the prompt will overload the AI and cause attention dilution.

My Customization: I decided to completely separate the calculation from the inference. In the MdevTrade source code, the Market Analyst agent does not count candles itself. It calls standard Python libraries like yfinance and stockstats to accurately calculate the SMA and MACD lines. Then, it translates the results into smooth sentences (Semantic Facts) before feeding them into the prompt. For example: “The price is currently above the 200 SMA with a positive slope, indicating a solid uptrend.” As for raw prices, the LLM only receives the last 14 to 30 days to grasp the micro-rhythm.

3. Reviewing Trades by Macro Cycles #

How do we teach AI to learn from its mistakes? If we force the AI to read over every single $5 losing trade from yesterday in a slow-paced DCA Spot strategy, scrutinizing an isolated trade is completely meaningless and a waste of API costs.

My Customization: I switched to evaluating by Regime (Cycles). Instead of making the AI examine every short-term fluctuation, the Execution Hands calculate the Average Cost (VWAP/WAC) of a prolonged accumulation phase. When the trend reverses, the summary report thrown back to the AI looks like this: “The May Bull cycle has ended. The WAC is $2,650. Net profit: +1.8%.” Thanks to this, the AI learns to recognize patterns in the bigger picture.

4. Never Trust an AI’s Confidence Level #

Should we let the AI independently decide the bet size for each trade (from $5 to $10) based on the “confidence” of its signal? The answer from research is: Absolutely not.

Academia points out that LLMs fail completely at quantifying certainty. They are trained to deliver answers that sound resolute, so they frequently report “99% confidence” even when hallucinating. If you let the AI determine the capital sizing, sooner or later it will “all-in” into a liquidity trap.

My Customization: I strictly maintain Fixed DCA discipline at the execution layer. The order size must be hardcoded using traditional mathematics, and the Portfolio Manager is never allowed to bend the weights.

5. The Trap of False Consensus and Debate Structure #

Initially, I thought that if you put multiple AIs into a chat room and let them argue, the truth would eventually emerge. In reality, research shows that if AIs debate for more than 2 rounds, they fall into the “Sycophancy” syndrome—automatically agreeing with each other to relieve conversational tension, or getting stuck in an endless vocabulary loop.

My Customization: I force the debate process into a strict 2-round framework.
- Round 1 (Blind Phase): The Bull and Bear factions are completely isolated and must write independent reports to avoid anchoring bias.
- Round 2 (Cross-Examination): They are only allowed to find and dismantle the single logical weakness of the opponent; repeating boilerplate text is forbidden.
- Deadlock Resolution: When macroeconomic data is too noisy and the two factions are at a standstill, the Research Manager is not allowed to “split the difference.” It is forced to use hard Tie-Breaker rules (like compressing the ATR profit margin because the market is high-risk, or prioritizing the long-term trend of the 200 SMA).

III. System Architecture: 5 Steps and 12 Agents #

After a while, the harsh truth landed: you don’t hand the life-and-death power over your account to a probability-based system. LLMs, under the hood, are dice-rolling machines on top of neural weights. The only way through is to cage that uncertainty inside the determinism of traditional code.

Referencing the core idea of the original author, I did the exact same thing: divided the project into 5 steps with 12 agents. This is the main workflow of the project. However, I modified each node to suit my pragmatic approach.

Thus, the entire system is divided into two completely independent subsystems:

1. AI Brain (Research Graph) #

Instead of writing a mile-long prompt and praying, I use LangGraph’s StateGraph to orchestrate the reasoning flow:

Step 1 - Data Gathering: 4 agents run sequentially (Market -> Social -> News -> Fundamentals) to collect data:
- Market Analyst: Fetches price data using yfinance and calculates technical indicators such as SMA and MACD using stockstats.
- News Analyst: Retrieves macroeconomic financial news, filters out noise, and focuses only on events capable of creating a 1-to-4-week trend.
- Social Analyst: Measures crowd sentiment states like Fear and Greed from platforms like StockTwits to capture the flow of Retail Traders.
- Fundamentals Analyst: Retrieves the fundamental data of the asset including P/E, PEG, and ROE to evaluate if it is cheap or expensive relative to its intrinsic value.
  [!NOTE] I coded this node to fully preserve the original author’s architecture. In reality, when applied to tokenized Gold (XAUT), corporate financial metrics do not exist, so I configured it to skip this node).
Step 2 - Invest Debate: This is a 2-round debate. Round 1 (Blind Phase): The Bull and Bear factions independently write reports defending their views, using data from Step 1 to build their arguments (e.g., price trends, crowd sentiment). Round 2 (Rebuttal Phase): The 2 factions face off, scrutinizing logical flaws or skewed data in each other’s reports. Finally, the Research Manager synthesizes the sharpest arguments into a neutral report, acting as a “compass” for the decision-making phase.
Step 3 - Trading (Planning): Based on the Manager’s report, the Trader agent proposes a specific action strategy, determining entry points, stop-loss, and take-profit thresholds based on volatility margins, rather than just qualitative signals.
Step 4 - Risk Management: The Trader’s plan is reviewed by an “Appraisal Board” consisting of 3 agents with 3 different risk styles: (1) Aggressive - focuses on profit optimization, (2) Conservative - emphasizes capital preservation, and (3) Neutral - controls compliance. If there is any conflict or signs of an abnormal “all-in”, the Portfolio Manager will intervene to adjust the order size or cancel it entirely.
Step 5 - Save To Database: The AI Brain is completely blind to the actual wallet balance on Binance. The Portfolio Manager’s final task is simply to output the result as a static JSON file saved to the Database, containing exactly one decision: UP, DOWN, or NEUTRAL, along with explanatory notes.

2. Execution Hands (Binance Graph) #

This is a traditional Python system holding the API Keys and the power of life and death. It reads the JSON report from the AI Brain and decides whether to fire the order to the exchange.

The heart of the Execution Hands is the Circuit Breaker. Regardless of how confident the AI analysis is, if the overall portfolio (Portfolio High-Water Mark) drops beyond the -15% limit, the Circuit Breaker will automatically trip. All trading orders are immediately frozen. This is exactly how Software 1.0 uses hard logic to control Software 2.0.

Truth be told, I haven’t really researched this Execution layer too deeply; everything is just at the most basic level to ensure safety. I’ll probably wait until a fairy appears in my dreams to bless me with a better solution before I roll up my sleeves to code this part again!

IV. Conclusion #

Reading the daily reports this squad of agents puts out, the logic holds up — sharp, defensible. If I sat down with the same charts and macro news myself, I doubt I’d land somewhere meaningfully different.

With such a rigorous reasoning flow, my intuition tells me that this project has a high probability of turning a profit. However, in reality, I am still in the testing phase, and the market always has unpredictable slaps in store—especially the risk of Slippage when trading Spot on low-liquidity cryptocurrencies—so only time will tell.

Regardless of the PnL outcome, this project has affirmed one thing: The success of integrating AI into a product does not lie in randomly throwing everything into a “black box prompt” and praying. It lies in the art of system architecture design. We must use rigid logic blocks (like StateGraph, Circuit Breaker, Data Fetching) to cage, navigate, and force that probabilistic machine to work with strict discipline.

[!NOTE] The entire source code of the project is public at: https://github.com/vinhmdev-com/mdevtrade
A special thanks to the author of the original TradingAgents project for the wonderful inspiration and architectural ideas.