Infrastructure

The ByNinja Trading Bot infrastructure is designed for long-running autonomous operation in unstable real-world environments. Crypto trading systems must remain operational 24/7, recover from unexpected failures automatically, and minimize downtime without requiring manual intervention.

To achieve this, the infrastructure includes:

  • Watchdog execution loops
  • Automatic crash recovery
  • Process restart orchestration
  • Graceful shutdown handling
  • Persistent runtime environment
  • Isolated module execution

The entire architecture is built around fault tolerance and operational continuity.


Watchdog Scripts

The main execution layer of the system is controlled through a dedicated watchdog launcher script:

./run.sh trading
./run.sh telegram

The watchdog script acts as a lightweight process supervisor responsible for:

  • Environment validation
  • Virtual environment activation
  • Module startup
  • Crash detection
  • Automatic restart handling

This approach avoids the need for external process managers during development or lightweight deployments while still providing production-style resilience.


Environment Validation

Before starting any module, the script validates:

  • Input arguments
  • Virtual environment existence
  • Runtime configuration

Example:

if [[ "$MODULE" != "trading" && "$MODULE" != "telegram" ]]; then
    echo "Invalid argument"
    exit 1
fi

The script also ensures the Python virtual environment exists:

if [ ! -f "$VENV_PATH" ]; then
    echo "Virtual environment not found"
    exit 1
fi

This prevents accidental startup with broken dependencies or incorrect execution parameters.


Isolated Runtime Environment

The infrastructure activates a dedicated Python virtual environment before execution:

source "./env/bin/activate"

This guarantees:

  • Dependency isolation
  • Stable package versions
  • Reproducible runtime behavior
  • Clean deployment separation

The launcher also explicitly defines PYTHONPATH:

PYTHONPATH="$(pwd)/src" python3 -c "$CMD"

This ensures reliable imports regardless of the current shell context or deployment location.


Auto Restart System

One of the most important infrastructure features is the automatic restart loop.

The launcher continuously monitors the bot process:

while true; do
    PYTHONPATH="$(pwd)/src" python3 -c "$CMD"
done

If the process exits unexpectedly, the watchdog immediately detects the failure and relaunches the module automatically.

Crash detection logic:

EXIT_CODE=$?

if [ $EXIT_CODE -ne 0 ]; then
    echo "[CRASH] Restarting..."
    sleep $RESTART_DELAY
fi

This creates a self-healing execution model capable of recovering from:

  • Unexpected exceptions
  • Network failures
  • API outages
  • Deadlocks
  • Temporary system instability
  • External process termination

Controlled Restart Delay

The infrastructure intentionally adds a restart cooldown:

RESTART_DELAY=3

This prevents:

  • Infinite rapid restart loops
  • CPU spikes during repeated crashes
  • API hammering
  • Log flooding

The short delay gives external systems time to recover before retrying execution.


Separate Service Architecture

The system separates the infrastructure into two independently restartable services:

ServiceResponsibility
Trading ModuleTrading engine and strategy execution
Telegram ModuleRemote control and notifications

This isolation provides several advantages:

  • One subsystem can fail independently
  • Telegram monitoring can remain online during trading crashes
  • Trading can continue even if Telegram fails
  • Easier debugging and maintenance

Modules communicate through the internal TCP layer rather than direct coupling.


Graceful Shutdown Handling

The infrastructure supports controlled shutdown behavior using signal handling.

Example:

signal.signal(signal.SIGINT, signal_handler)

On shutdown:

  • Trading stops safely
  • Persistence flushes to disk
  • TCP connections close cleanly
  • Threads terminate gracefully

Example shutdown flow:

🛑 Shutdown signal received...
💾 Forcing save to disk...
✅ PersistentMap stopped

This is critical in trading systems because improper shutdowns may otherwise lead to:

  • Lost position state
  • Corrupted persistence
  • Inconsistent recovery data
  • Unclosed orders

Crash Recovery Philosophy

The infrastructure follows a simple but powerful principle:

Crash → Detect → Restart → Recover State → Continue Trading

The bot is designed under the assumption that crashes are inevitable in long-running distributed systems.

Instead of trying to prevent every possible failure, the architecture focuses on:

  • Fast detection
  • Reliable restart
  • Persistent state recovery
  • Operational continuity

This dramatically increases long-term reliability.


Integration With Persistence Layer

The infrastructure works closely with the persistence subsystem.

Before shutdown or restart:

  • Runtime state is saved
  • Positions are persisted
  • Configuration remains intact
  • Recovery metadata is stored

After restart:

  • State reloads automatically
  • Trading resumes safely
  • Existing positions are restored
  • Telegram connectivity returns automatically

This creates a resilient recovery pipeline even after unexpected crashes or server restarts.


Telegram Infrastructure Monitoring

The Telegram subsystem also acts as a lightweight operational monitoring layer.

Infrastructure events are forwarded directly to Telegram:

[CRASH] Module trading exited with code 1
Restarting in 3 seconds...

This gives operators instant visibility into:

  • Crashes
  • Restarts
  • Shutdowns
  • Recovery events
  • TCP connectivity problems

The result is a remotely observable infrastructure without requiring complex monitoring dashboards.


Production-Oriented Reliability Design

The ByNinja infrastructure combines several production-grade operational concepts:

  • Watchdog execution loops
  • Automatic crash recovery
  • Graceful shutdown handling
  • Persistent runtime state
  • Service isolation
  • Self-healing restart behavior
  • Remote monitoring through Telegram
  • Independent subsystem recovery

Together, these components create an infrastructure capable of supporting continuous autonomous trading operations with minimal manual supervision.