| templates | ||
| .gitignore | ||
| backup_core.py | ||
| ecosystem.config.js | ||
| gui_app.py | ||
| nginx.conf | ||
| README.md | ||
| requirements.txt | ||
| setup.sh | ||
| todo.md | ||
| vsphere_backup.py | ||
vSphere Backup Manager
An enterprise-ready web interface and CLI tool to automate, schedule, and manage snapshot-based backups for virtual machines on VMware vCenter/ESXi. Designed for performance, reliability, and security.
Key Features
- Grouped Sequential Batch Backups: Select multiple VMs to execute sequentially in a single job. Execution logs and progress indicators are merged into a single view. Multiple batch jobs can run simultaneously without interference — each has isolated thread-local logging, independent progress tracking, and separate SQLite state.
- SHA-256 Checksum Verification & Cataloging: Computes SHA-256 signatures immediately after each VMDK/VMX file download and generates a machine-readable
manifest.jsoncatalog alongside each backup run. - Pre-Upload Validation: Automatically validates local checksums prior to remote transfers (e.g., SFTP) to protect storage vaults against silent write errors or network packet loss.
- On-the-Fly ZST Verification: Supports stream-decompression on the fly to verify
.zstarchives against original manifest signatures without needing local disk extraction. - Safe Force Stop (Cancellation): Safely halt running backups via the Web UI. The engine immediately aborts socket downloads and automatically cleans up the VM snapshot on the ESXi host before gracefully terminating.
- Automated Retention Policies: Define count-based (
keep_count— keep the last N backups) or age-based (keep_days— clean up backups older than N days) retention policies per VM to manage storage automatically. - Resilient Scheduling: Uses APScheduler to schedule daily, weekly, monthly, interval, 3-monthly, 6-monthly, or yearly backups. Schedules are persisted in
jobs.dband automatically re-registered on app restarts. - Telegram Bot Alerts: Send rich formatted backup status notifications via a Telegram Bot directly to any group or channel — no open SMTP ports required. Configurable per alert level (all / failures only).
- SMTP & Sendmail Notifications: Send HTML-formatted backup completion emails via an SMTP relay or the system
sendmailbinary. - Reports & Analytics Dashboard: Visual Chart.js trends for backup size and duration over time, with per-run history log table and success-rate statistics.
- Integrated NFS Mount Manager: View, mount, and manage NFS/CIFS shares directly from the Web GUI, showing real-time mount status, total size, used capacity, and free disk space.
- CBT Incremental Backups: Optional Changed Block Tracking (CBT) mode drastically reduces transfer size for recurring scheduled jobs by downloading only changed disk extents.
Requirements
- Python 3.8+
- System packages listed in
requirements.txt:pyvmomi— VMware vSphere API Python SDKrequests— vCenter HTTPS folder API transfersparamiko— SFTP remote storage replicationzstandard— High-ratio backup compressionAPScheduler— Recurring backup schedulingflask— Web UI frameworkgunicorn— Production WSGI server
Installation
-
Clone the repository:
git clone <repository_url> cd backupvmware -
Set up a Python Virtual Environment:
- Linux:
python3 -m venv venv source venv/bin/activate - Windows:
python -m venv venv .\\venv\\Scripts\\Activate.ps1
- Linux:
-
Install dependencies:
pip install -r requirements.txt
Web GUI Setup
A Flask-based web interface utilizing a premium glassmorphic dark theme to manage backups, schedules, mounts, and real-time logs.
Running with PM2 (Recommended for Production)
PM2 natively supports Python applications and keeps the server running across restarts or process crashes.
-
Install PM2 (requires Node.js):
npm install -g pm2 -
Start the Web GUI: Using the provided
ecosystem.config.js:pm2 start ecosystem.config.js(Optional) If you are running inside a Python virtual environment (e.g.
venv), editecosystem.config.jsto point theinterpreterto your venv's Python executable:interpreter: './venv/bin/python3' -
Useful PM2 Commands:
- Status Dashboard:
pm2 status - Real-time Console Logs:
pm2 logs vsphere-backup-manager - Restart Application:
pm2 restart vsphere-backup-manager - Stop Application:
pm2 stop vsphere-backup-manager - Enable Auto-start on Boot: Run
pm2 startupand execute the command it prints, followed bypm2 save.
- Status Dashboard:
Notification Setup
Telegram Bot (Recommended — works on port 443, no SMTP server needed)
- Create a bot via @BotFather on Telegram — it will give you a Bot Token.
- Add the bot to a group or channel and send any message to it.
- Find your Chat ID using the Telegram API:
- Open in browser:
https://api.telegram.org/bot<YOUR_TOKEN>/getUpdates - Look for
"chat":{"id": -xxxxxxxxxx}in the response.
- Open in browser:
- Open Settings → Notifications in the Web UI:
- Set Webhook Payload Format to
Telegram Bot Alert. - Enter your Bot Token and Chat ID.
- Click Send Test Notification to verify.
- Set Webhook Payload Format to
SMTP Email
- Open Settings → Notifications in the Web UI.
- Enable Email Notifications and fill in your SMTP host, port, credentials, sender, and recipient.
- Click Send Test Email to verify before saving.
Webhook (Generic HTTP POST)
- Open Settings → Notifications in the Web UI.
- Enter a webhook URL (Slack, Teams, Discord, custom endpoint, etc.).
- Choose the payload format (
JSON,Form, orSlack). - Click Send Test Notification to verify.
Alert Levels
Configure in Settings to control when notifications are sent:
| Level | Triggers on |
|---|---|
all |
Every backup completion (success, warning, or failure) |
failures |
Only on failed or finished with errors status |
disabled |
Never send notifications |
CLI Usage
You can also execute standalone backups directly from the command line:
Basic Backup
python vsphere_backup.py --host vc.example.com --user administrator@vsphere.local --vm MyVM --dest /mnt/nfs-backup --compress
Backup with Remote SFTP Replication
python vsphere_backup.py --host vc.example.com --user administrator@vsphere.local --vm MyVM --dest /tmp/backups --sftp-host backup-vault.local --sftp-user vault-user --sftp-password vault-pass
Manual Restore & Clone
Backups are stored in native VMware format (VMDK + VMX), so they can be restored directly to vCenter/ESXi without any conversion.
Backup File Structure
backups/<VM_NAME>/backup-YYYYMMDDHHMMSS/
├── manifest.json ← SHA-256 checksums + metadata
├── <VM_NAME>.vmx ← VM configuration (CPU, RAM, network, etc.)
└── <datastore_name>/
└── <VM_NAME>/
├── <VM_NAME>.vmdk ← Disk descriptor (~500 bytes, plain text)
└── <VM_NAME>-flat.vmdk ← Actual disk data (full size)
With compression enabled, files are stored as .vmdk.zst / -flat.vmdk.zst.
Restoring a VM (In-Place)
Step 1 — Decompress (if compressed)
zstd -d <VM_NAME>.vmdk.zst
zstd -d <VM_NAME>-flat.vmdk.zst
Step 2 — Verify Checksum
# Compare the output with the value in manifest.json
sha256sum <VM_NAME>-flat.vmdk
Step 3 — Upload to Datastore
Option A — vSphere Web Client (easiest)
- Navigate to Storage → select the target datastore
- Create or navigate to the VM folder
- Upload the
.vmx,.vmdk, and-flat.vmdkfiles
Option B — SCP to ESXi host
# Enable SSH on the ESXi host first, then:
scp -r ./backup-20260623020000/<datastore>/<VM_NAME>/ \
root@esxi-host:/vmfs/volumes/<datastore>/<VM_NAME>/
Option C — PowerCLI
# Copy files to ESXi datastore via datastore browser
Copy-DatastoreItem -Item ".\\*.vmdk" -Destination "[datastore1] <VM_NAME>/"
Step 4 — Register the VM
Right-click the .vmx file in the datastore browser → Register VM, or use PowerCLI:
New-VM -VMFilePath "[datastore1] <VM_NAME>/<VM_NAME>.vmx" -VMHost "esxi-host"
Step 5 — Power On
Start-VM "<VM_NAME>"
Cloning from Backup (New VM)
To restore a backup as a separate new VM without affecting the original:
-
Upload files to a new folder on the datastore (e.g.
<VM_NAME>-clone/) -
Edit the
.vmxfile — change these lines to avoid UUID/MAC conflicts:displayName = "<VM_NAME>-clone" uuid.bios = "generate a new UUID" ethernet0.generateAddress = "00:0c:29:xx:xx:xx" -
Remove any snapshot references if present:
# Delete or comment out lines starting with: snapshot.redoNotWithParent = -
Register and power on:
New-VM -VMFilePath "[datastore1] <VM_NAME>-clone/<VM_NAME>.vmx" Start-VM "<VM_NAME>-clone"
Best Practices
- Keep a copy — never restore over your only backup copy
- Test restore quarterly — verify backups actually work before you need them
- Isolated network first — always boot cloned VMs on an isolated port group to check for IP conflicts before connecting to production
- CBT resets on clone — the first backup of a cloned VM will be a full backup (CBT state does not carry over)
- Snapshot cleanup — if the backup was taken with snapshots still active, remove orphaned snapshots after restore
Safety & Architecture
1. Snapshot Isolation
The backup engine creates a temporary snapshot on the target VM, downloads the locked base files (.vmdk descriptors, -flat.vmdk disk data, and .vmx configurations) directly from the vCenter Datastore HTTP gateway, and deletes the snapshot immediately afterwards. Even on forced stop, the snapshot cleanup routine runs.
2. Thread-Safe Concurrent Job Execution
Two entirely different types of concurrency safety are in place:
a) Multiple different jobs running simultaneously
Each job runs in its own background thread. Log output uses a thread-local path registry (threading.local() in backup_core.py) — the overridden print() function checks the calling thread's registered log path and writes directly to that file, bypassing any global sys.stdout redirection. This eliminates the classic ValueError: I/O operation on closed file race condition where one job closing its log file would crash another job's write.
b) Duplicate runs of the same job prevented
An in-memory active_job_threads dictionary tracks which job IDs are currently executing and in which thread. Before starting execution, run_job_thread checks this registry. If the same job is already alive in another thread (e.g., a scheduled trigger fires at the exact same moment as a manual "Run Now" click), the duplicate is silently aborted without affecting the primary run.
3. SQLite Persistence & Multi-Worker Sync
Job records, status, schedules, and configuration settings are stored in jobs.db (SQLite). The application supports running behind Gunicorn with multiple worker processes:
- Real-time progress writes: Every progress callback update from an active backup job writes directly to SQLite (
save_job_to_db_direct), not just on completion. - Route-level refresh: The
/jobs,/job/<id>, and/api/job/<id>/statusroutes callload_jobs_db()before rendering, syncing state from SQLite across all Gunicorn workers. - In-place merge strategy: When loading from DB, running jobs in the current process are never overwritten by older DB snapshots from other workers.
4. SSL Configuration
Custom certificate verification options (--no-verify-ssl or Web checkbox) allow connecting to environments using self-signed vCenter certificates.
5. Pre-flight & Post-flight Disk Checks
Before every backup, the engine checks for and resolves consolidationNeeded conditions on the VM. After snapshot removal, another consolidation check runs automatically to keep the datastore clean.