Infrastructure
The ByNinja Trading Bot infrastructure is designed for long-running autonomous operation in unstable real-world environments. Crypto trading systems must remain operational 24/7, recover from unexpected failures automatically, and minimize downtime without requiring manual intervention.
To achieve this, the infrastructure includes:
- •Watchdog execution loops
- •Automatic crash recovery
- •Process restart orchestration
- •Graceful shutdown handling
- •Persistent runtime environment
- •Isolated module execution
The entire architecture is built around fault tolerance and operational continuity.
Watchdog Scripts
The main execution layer of the system is controlled through a dedicated watchdog launcher script:
./run.sh trading
./run.sh telegramThe watchdog script acts as a lightweight process supervisor responsible for:
- •Environment validation
- •Virtual environment activation
- •Module startup
- •Crash detection
- •Automatic restart handling
This approach avoids the need for external process managers during development or lightweight deployments while still providing production-style resilience.
Environment Validation
Before starting any module, the script validates:
- •Input arguments
- •Virtual environment existence
- •Runtime configuration
Example:
if [[ "$MODULE" != "trading" && "$MODULE" != "telegram" ]]; then
echo "Invalid argument"
exit 1
fiThe script also ensures the Python virtual environment exists:
if [ ! -f "$VENV_PATH" ]; then
echo "Virtual environment not found"
exit 1
fiThis prevents accidental startup with broken dependencies or incorrect execution parameters.
Isolated Runtime Environment
The infrastructure activates a dedicated Python virtual environment before execution:
source "./env/bin/activate"This guarantees:
- •Dependency isolation
- •Stable package versions
- •Reproducible runtime behavior
- •Clean deployment separation
The launcher also explicitly defines PYTHONPATH:
PYTHONPATH="$(pwd)/src" python3 -c "$CMD"This ensures reliable imports regardless of the current shell context or deployment location.
Auto Restart System
One of the most important infrastructure features is the automatic restart loop.
The launcher continuously monitors the bot process:
while true; do
PYTHONPATH="$(pwd)/src" python3 -c "$CMD"
doneIf the process exits unexpectedly, the watchdog immediately detects the failure and relaunches the module automatically.
Crash detection logic:
EXIT_CODE=$?
if [ $EXIT_CODE -ne 0 ]; then
echo "[CRASH] Restarting..."
sleep $RESTART_DELAY
fiThis creates a self-healing execution model capable of recovering from:
- •Unexpected exceptions
- •Network failures
- •API outages
- •Deadlocks
- •Temporary system instability
- •External process termination
Controlled Restart Delay
The infrastructure intentionally adds a restart cooldown:
RESTART_DELAY=3This prevents:
- •Infinite rapid restart loops
- •CPU spikes during repeated crashes
- •API hammering
- •Log flooding
The short delay gives external systems time to recover before retrying execution.
Separate Service Architecture
The system separates the infrastructure into two independently restartable services:
| Service | Responsibility |
|---|---|
| Trading Module | Trading engine and strategy execution |
| Telegram Module | Remote control and notifications |
This isolation provides several advantages:
- •One subsystem can fail independently
- •Telegram monitoring can remain online during trading crashes
- •Trading can continue even if Telegram fails
- •Easier debugging and maintenance
Modules communicate through the internal TCP layer rather than direct coupling.
Graceful Shutdown Handling
The infrastructure supports controlled shutdown behavior using signal handling.
Example:
signal.signal(signal.SIGINT, signal_handler)On shutdown:
- •Trading stops safely
- •Persistence flushes to disk
- •TCP connections close cleanly
- •Threads terminate gracefully
Example shutdown flow:
🛑 Shutdown signal received...
💾 Forcing save to disk...
✅ PersistentMap stoppedThis is critical in trading systems because improper shutdowns may otherwise lead to:
- •Lost position state
- •Corrupted persistence
- •Inconsistent recovery data
- •Unclosed orders
Crash Recovery Philosophy
The infrastructure follows a simple but powerful principle:
Crash → Detect → Restart → Recover State → Continue TradingThe bot is designed under the assumption that crashes are inevitable in long-running distributed systems.
Instead of trying to prevent every possible failure, the architecture focuses on:
- •Fast detection
- •Reliable restart
- •Persistent state recovery
- •Operational continuity
This dramatically increases long-term reliability.
Integration With Persistence Layer
The infrastructure works closely with the persistence subsystem.
Before shutdown or restart:
- •Runtime state is saved
- •Positions are persisted
- •Configuration remains intact
- •Recovery metadata is stored
After restart:
- •State reloads automatically
- •Trading resumes safely
- •Existing positions are restored
- •Telegram connectivity returns automatically
This creates a resilient recovery pipeline even after unexpected crashes or server restarts.
Telegram Infrastructure Monitoring
The Telegram subsystem also acts as a lightweight operational monitoring layer.
Infrastructure events are forwarded directly to Telegram:
[CRASH] Module trading exited with code 1
Restarting in 3 seconds...This gives operators instant visibility into:
- •Crashes
- •Restarts
- •Shutdowns
- •Recovery events
- •TCP connectivity problems
The result is a remotely observable infrastructure without requiring complex monitoring dashboards.
Production-Oriented Reliability Design
The ByNinja infrastructure combines several production-grade operational concepts:
- •Watchdog execution loops
- •Automatic crash recovery
- •Graceful shutdown handling
- •Persistent runtime state
- •Service isolation
- •Self-healing restart behavior
- •Remote monitoring through Telegram
- •Independent subsystem recovery
Together, these components create an infrastructure capable of supporting continuous autonomous trading operations with minimal manual supervision.