Home

Vykar is a fast, encrypted, deduplicated backup tool written in Rust. It’s centered around a simple YAML config format and includes a desktop GUI and webDAV server to browse snapshots. More about design goals.

Do not use for production backups yet, but do test it along other backup tools.

Features

Storage backends – local filesystem, S3 (any compatible provider), SFTP, dedicated REST server
Encryption with AES-256-GCM or ChaCha20-Poly1305 (auto-selected) and Argon2id key derivation
YAML-based configuration with multiple repositories, hooks, and command dumps for monitoring and database backups
Deduplication via FastCDC content-defined chunking with a memory-optimized engine (tiered dedup index plus local mmap-backed lookup caches)
Compression with LZ4 or Zstandard
Built-in WebDAV and desktop GUI to browse and restore snapshots
REST server with append-only enforcement, quotas, and server-side compaction
Concurrent multi-client backups – multiple machines back up to the same repository simultaneously; only the brief commit phase is serialized
Built-in scheduling via vykar daemon – runs backup cycles on a configurable interval or cron schedule
Resource limits for worker threads, backend connections, and upload/download bandwidth
Cross-platform – Linux, macOS, and Windows

Benchmarks

Vykar is the fastest tool for both backup and restore, with the lowest CPU cost, while maintaining competitive memory usage.

All benchmarks were run 5x on the same idle Intel i7-6700 CPU @ 3.40GHz machine with 2x Samsung PM981 NVMe drives, with results averaged across all runs. Compression settings were chosen to keep resulting repository sizes comparable. The sample corpus is a mix of small and large files with varying compressibility. See detailed results or our benchmark script for full details.

Comparison

Workflow & UX

Aspect	Borg	Restic	Rustic	Kopia	Vykar
Configuration	CLI (YAML via Borgmatic)	CLI (YAML via ResticProfile)	TOML config file	JSON config + CLI policies	YAML config with env-var expansion
Scheduling	Via Borgmatic	Via ResticProfile	External (cron/systemd)	Built-in (interval, cron)	Built-in (`vykar daemon`)
Storage	borgstore + SSH RPC	Local, S3, SFTP, REST, rclone	Local, S3, SFTP, REST	Local, S3, Azure, GCS, B2, SFTP, WebDAV, Rclone	Local, S3, SFTP, REST + vykar-server
Automation	Via Borgmatic (hooks + DB dumps)	Via ResticProfile (hooks only)	Native hooks	Native (before/after actions)	Native hooks + generic command capture
Restore UX	FUSE mount + Vorta (third-party)	FUSE mount + Backrest (third-party)	FUSE mount	FUSE mount or WebDAV + built-in UI	Built-in WebDAV + desktop GUI
Compression	LZ4, Zstd, Zlib, LZMA, None	Zstd, None	Zstd, None	Gzip, Zstd, S2, LZ4, Deflate, Pgzip	LZ4, Zstd, None

Repository Operations & Recovery

Aspect	Borg	Restic	Rustic	Kopia	Vykar
Concurrent backups	v1: exclusive; v2: shared locks	Shared locks for backup	Lock-free	Concurrent multi-client	Session-based (commit serialized)
Repository access	SSH, append-only	rest-server, append-only	Via rustic-server	Built-in server with ACLs	REST server, append-only, quotas
Crash recovery	Checkpoints, rollback	Atomic rename	Atomic rename (caveats)	Atomic blobs (caveats)	Journals + two-phase commit
Prune / GC safety	Exclusive lock	Exclusive lock	Two-phase delete (23h)	Time-based GC (24h min)	Session-aware lock
Data verification	`check --repair`, full verify	`check --read-data`, repair	Restic-compat check	Verify + optional ECC	`check --verify-data`, server offload
Unchanged-file reuse	Persistent local filecache (v1 repo-wide; v2 per-series)	Parent snapshot tree	Parent snapshot tree(s)	Previous snapshot manifests/dirs	Per-source local filecache with parent-snapshot fallback

Security Model

Aspect	Borg	Restic	Rustic	Kopia	Vykar
Crypto construction	v1: AES-CTR + HMAC (E&M); v2: AEAD	AES-CTR + Poly1305 (E-t-M)	AES-CTR + Poly1305 (Restic-compat)	AES-GCM / ChaCha20 (AEAD)	AES-GCM / ChaCha20 (AEAD, AAD)
Key derivation	v1: PBKDF2; v2: Argon2	scrypt (fixed params)	scrypt (Restic-compat)	scrypt	Argon2id (tunable)
Content addressing	Keyed HMAC-SHA-256 / BLAKE2b	SHA-256	SHA-256 (Restic-compat)	Keyed hash (BLAKE2B-256-128 default)	Keyed BLAKE2b-256 MAC
Key zeroization	Python GC (non-deterministic)	Go GC (non-deterministic)	Rust `zeroize`	Go GC (non-deterministic)	`ZeroizeOnDrop` on all key types
Implementation safety	Python + C extensions	Go (GC, bounds-checked)	Rust (minimal unsafe)	Go (GC, bounds-checked)	Rust (minimal unsafe)

Crypto construction: AEAD (Authenticated Encryption with Associated Data) provides confidentiality and integrity in a single pass. Encrypt-and-MAC (E&M) and Encrypt-then-MAC (E-t-M) are older two-step constructions. Domain-separated AAD binds ciphertext to its intended object type and identity, preventing cross-object substitution.

Content addressing: Keyed hashing prevents confirmation-of-file attacks, where an adversary who knows a file’s content computes its expected chunk ID to confirm the file exists in the repository. Unkeyed hashing (plain SHA-256) does not prevent this.

Key zeroization: ZeroizeOnDrop overwrites key material in memory immediately when it goes out of scope. Garbage-collected runtimes (Go, Python) may leave key bytes in memory until the GC reclaims the allocation.

Inspired by

BorgBackup: architecture, chunking strategy, repository concept, and overall backup pipeline.
Borgmatic: YAML configuration approach, pipe-based database dumps.
Rustic: pack file design and architectural references from a mature Rust backup tool.
Name: From Latin vicarius (“substitute, stand-in”) — because a backup is literally a substitute for lost data.

Get Started

Follow the Quick Start guide to install Vykar, create a config, and run your first backup in under 5 minutes.

Once you’re up and running:

Configure storage backends – connect S3, SFTP, or the REST server
Set up hooks and command dumps – run scripts before/after backups, capture database dumps
Browse and restore snapshots – list, search, and restore files
Maintain your repository – prune old snapshots, check integrity, compact packs
Explore backup recipes – common patterns for databases, containers, and filesystems

Quick Start

Install

Run the install script:

curl -fsSL https://vykar.borgbase.com/install.sh | sh

Or download a pre-built binary from the releases page. A Docker image is also available. See Installing for more details.

Create a config file

Generate a starter config file, then edit it to set your repository path and source directories:

vykar config

Initialize and back up

Initialize the repository (prompts for passphrase if encrypted):

vykar init

Create a backup of all configured sources:

vykar backup

Or back up any folder directly:

vykar backup ~/Documents

Inspect snapshots

List all snapshots:

vykar list

List files inside a snapshot (use the snapshot ID shown by vykar list, or latest):

vykar snapshot list a1b2c3d4

Search for a file across recent snapshots:

vykar snapshot find --name '*.txt' --since 7d

Restore

Restore files from a snapshot to a directory:

vykar restore a1b2c3d4 /tmp/restored

For backup options, snapshot browsing, and maintenance tasks, see the workflow guides.

Installing

Quick install

curl -fsSL https://vykar.borgbase.com/install.sh | sh

Or download the latest release for your platform from the releases page.

Docker

Available as ghcr.io/borgbase/vykar on GitHub Container Registry. An apprise variant (ghcr.io/borgbase/vykar:latest-apprise) is also available with the Apprise CLI pre-installed for hook notifications.

Config file

Create a vykar.yaml for Docker. Source paths must reference /data/... (the container mount point):

repositories:
  - url: s3://my-bucket/backups
    access_key_id: "..."
    secret_access_key: "..."

sources:
  - /data/documents
  - /data/photos

encryption:
  passphrase: "change-me"

retention:
  keep_daily: 7
  keep_weekly: 4

schedule:
  enabled: true
  every: "24h"
  on_startup: true

For a local repository backend, use /repo as the repo path and mount a host directory there.

Run as daemon

docker run -d \
  --name vykar-daemon \
  --hostname my-server \
  -v /path/to/vykar.yaml:/etc/vykar/config.yaml:ro \
  -v /home/user/documents:/data/documents:ro \
  -v /home/user/photos:/data/photos:ro \
  -v vykar-cache:/cache \
  ghcr.io/borgbase/vykar

Run ad-hoc commands

With a new container (uses the entrypoint, no need to repeat vykar):

docker run --rm \
  -v /path/to/vykar.yaml:/etc/vykar/config.yaml:ro \
  -v vykar-cache:/cache \
  ghcr.io/borgbase/vykar list

Or exec into a running daemon container:

docker exec vykar-daemon vykar list

Docker Compose

services:
  vykar:
    image: ghcr.io/borgbase/vykar:latest
    hostname: my-server
    restart: unless-stopped
    environment:
      - VYKAR_PASSPHRASE
      - TZ=UTC
    volumes:
      - ./vykar.yaml:/etc/vykar/config.yaml:ro
      - /home/user/documents:/data/documents:ro
      - vykar-cache:/cache
volumes:
  vykar-cache:

Reloading configuration

Send SIGHUP to the daemon container to reload the config file without restarting:

docker kill --signal=HUP vykar-daemon

With Docker Compose:

docker compose kill -s HUP vykar

The daemon logs whether the reload succeeded or was rejected (invalid config).

Triggering a backup

Send SIGUSR1 to trigger an immediate backup cycle without waiting for the next scheduled run:

docker kill --signal=USR1 vykar-daemon

With Docker Compose:

docker compose kill -s USR1 vykar

Read-only status page

Set VYKAR_HTTP_LISTEN (and VYKAR_HTTP_ALLOW_PUBLIC=1 to bind on 0.0.0.0) and publish port 7575 to expose a read-only status page in the browser. See Daemon → Read-only status page for endpoints and bind-safety rules.

docker run -d \
  --name vykar-daemon \
  -p 7575:7575 \
  -e VYKAR_HTTP_LISTEN=0.0.0.0:7575 \
  -e VYKAR_HTTP_ALLOW_PUBLIC=1 \
  -v /path/to/vykar.yaml:/etc/vykar/config.yaml:ro \
  -v vykar-cache:/cache \
  ghcr.io/borgbase/vykar

Environment variables recognised by the daemon:

Variable	Equivalent flag	Purpose
`VYKAR_HTTP_LISTEN`	`--http-listen ADDR`	Bind a read-only HTTP status page (e.g. `0.0.0.0:7575`); unset means disabled
`VYKAR_HTTP_ALLOW_PUBLIC`	`--http-allow-public`	Permit non-loopback bind addresses
`VYKAR_CONFIG`	`--config PATH`	Override config file path
`VYKAR_PASSPHRASE`	—	Repository passphrase (skips interactive prompt)

Notes

Use -it with docker run for interactive commands to get progress bar output (e.g. docker run --rm -it ...)
Set --hostname to a stable name — Docker assigns random hostnames that appear in snapshot metadata
Mount source directories under /data/ and reference them as /data/... in the config
For encryption, use VYKAR_PASSPHRASE env var or Docker secrets via passcommand: "cat /run/secrets/vykar_passphrase"
Use a named volume for /cache to persist the snapshot cache across restarts
The apprise variant (ghcr.io/borgbase/vykar:latest-apprise) includes the Apprise CLI for sending notifications to 100+ services from hooks. See Notifications with Apprise.
The image includes curl, jq, and bash for use in hooks (e.g. monitoring webhooks, JSON payloads). For additional tools, extend the image: dockerfile FROM ghcr.io/borgbase/vykar RUN apk add --no-cache sqlite
Available for linux/amd64 and linux/arm64

Ansible

An official Ansible role is available for automated deployment on Linux servers:

ansible-galaxy role install borgbase.vykar

The vykar_config variable accepts your vykar configuration directly as a YAML dict — since both Ansible and vykar use YAML, the config maps one-to-one:

- hosts: myserver
  roles:
    - role: vykar
      vars:
        vykar_config:
          repositories:
            - url: "/backup/repo"
          encryption:
            passphrase: "mysuperduperpassword"
          sources:
            - "/home"
            - "/etc"
          schedule:
            enabled: true
            every: "24h"

See the borgbase.vykar role for all available variables.

Pre-built binaries

Extract the archive and place the vykar binary somewhere on your PATH:

# Example for Linux/macOS
tar xzf vykar-*.tar.gz
sudo cp vykar /usr/local/bin/

For Windows CLI releases:

Expand-Archive vykar-*.zip -DestinationPath .
Move-Item .\vykar.exe "$env:USERPROFILE\\bin\\vykar.exe"

Add your chosen directory (for example, %USERPROFILE%\bin) to PATH if needed.

Build from source

Requires Rust 1.88 or later.

git clone https://github.com/borgbase/vykar.git
cd vykar
cargo build --release

The binary is at target/release/vykar. Copy it to a directory on your PATH:

cp target/release/vykar /usr/local/bin/

Verify installation

vykar --version

Next steps

Initialize and Set Up a Repository

Desktop GUI

Vykar includes a desktop GUI for managing repositories, running backups, and browsing/restoring snapshots. It is built with Slint and tray-icon.

Installing

macOS

A signed app bundle (Vykar Backup.app) is included in the release archive. Download the latest release from the releases page, extract it, and drag the app to your Applications folder.

Linux

Download the AppImage from the releases page. It bundles most dependencies and runs on x86_64 Linux distributions with glibc 2.39+ (Ubuntu 24.04+, Fedora 40+, Arch, etc.):

chmod +x vykar-gui-*-x86_64.AppImage
./vykar-gui-*-x86_64.AppImage

AppImages require FUSE 2 to run. If you get a FUSE-related error, either install it or use the extract-and-run fallback:

# Install FUSE 2 (Ubuntu 24.04+)
sudo apt install libfuse2t64

# Or run without FUSE
APPIMAGE_EXTRACT_AND_RUN=1 ./vykar-gui-*-x86_64.AppImage

Alternatively, the Intel glibc release archive includes a bare vykar-gui binary. This requires system libraries like libxdo to be installed separately:

# Debian/Ubuntu
sudo apt install libxdo3

To build from source, install the development headers:

sudo apt install libxdo-dev libgtk-3-dev libxkbcommon-dev libayatana-appindicator3-dev
cargo build --release -p vykar-gui

The binary is at target/release/vykar-gui.

Windows

The GUI is included in the Windows release archive. Download the latest release from the releases page and extract vykar-gui.exe.

Initialize and Set Up a Repository

Generate a configuration file

Create a starter config

vykar config

Or write it to a specific path:

vykar config --dest ~/.config/vykar/config.yaml

Encryption

Encryption is enabled by default (mode: "auto"). During init, vykar benchmarks AES-256-GCM and ChaCha20-Poly1305, chooses one, and stores that concrete mode in the repository config. No config is needed unless you want to force a mode or disable encryption with mode: "none".

The passphrase is requested interactively at init time. You can also supply it via:

VYKAR_PASSPHRASE environment variable
passcommand in the config (e.g. passcommand: "pass show vykar")
passphrase in the config

Configure repositories and sources

Set the repository URL and the directories to back up:

repositories:
  - label: "main"
    url: "/backup/repo"

sources:
  - "/home/user/documents"
  - "/home/user/photos"

See Configuration for all available options.

Initialize the repository

vykar init

This creates the repository structure at the configured URL. For encrypted repositories, you will be prompted to enter a passphrase.

If your config has multiple repositories, use --repo / -R to initialize one entry at a time:

vykar init --repo main

Validate

Confirm the repository was created:

vykar info

Run a first backup and check results:

vykar backup
vykar list

Storage Backends

The repository URL in your config determines which backend is used.

Backend	URL example
Local filesystem	`/backups/repo`
S3 / S3-compatible (HTTPS)	`s3://endpoint[:port]/bucket/prefix`
S3 / S3-compatible (HTTP, unsafe)	`s3+http://endpoint[:port]/bucket/prefix`
SFTP	`sftp://host/path`
REST (vykar-server)	`https://host`

Transport security

HTTP transport is blocked by default for remote backends.

https://... is accepted by default.
http://... (or s3+http://...) requires explicit opt-in with allow_insecure_http: true.

repositories:
  - label: "dev-only"
    url: "http://localhost:8484"
    allow_insecure_http: true

Use plaintext HTTP only on trusted local/dev networks.

Local filesystem

Store backups on a local or mounted disk. No extra configuration needed.

repositories:
  - label: "local"
    url: "/backups/repo"

Accepted URL formats: absolute paths (/backups/repo), relative paths (./repo), or file:///backups/repo.

S3 / S3-compatible

Store backups in Amazon S3 or any S3-compatible service (MinIO, Wasabi, Backblaze B2, etc.). S3 URLs must include an explicit endpoint and bucket path.

AWS S3:

repositories:
  - label: "s3"
    url: "s3://s3.us-east-1.amazonaws.com/my-bucket/vykar"
    region: "us-east-1"                    # Default if omitted
    access_key_id: "AKIA..."
    secret_access_key: "..."

S3-compatible (custom endpoint):

The endpoint is always the URL host, and the first path segment is the bucket:

repositories:
  - label: "minio"
    url: "s3://minio.local:9000/my-bucket/vykar"
    region: "us-east-1"
    access_key_id: "minioadmin"
    secret_access_key: "minioadmin"

S3-compatible over plaintext HTTP (unsafe):

repositories:
  - label: "minio-dev"
    url: "s3+http://minio.local:9000/my-bucket/vykar"
    region: "us-east-1"
    access_key_id: "minioadmin"
    secret_access_key: "minioadmin"
    allow_insecure_http: true

S3 configuration options

Field	Description
`region`	AWS region (default: `us-east-1`)
`access_key_id`	Access key ID (required)
`secret_access_key`	Secret access key (required)
`allow_insecure_http`	Permit `s3+http://` URLs (unsafe; default: `false`)
`s3_soft_delete`	Use soft-delete for S3 Object Lock compatibility (default: `false`)

S3 append-only / ransomware protection

When using S3 directly (without vykar-server), a compromised client that has the S3 credentials can delete or overwrite any object in the bucket. S3 Object Lock preserves previous versions of all objects for a configurable retention period, giving you a window to detect and recover from an attack. Vykar’s soft-delete mode (s3_soft_delete) enables prune and compact to work without s3:DeleteObject permission by replacing deletes with zero-byte tombstone overwrites.

For full application-level append-only enforcement (rejects both overwrites and deletes of immutable keys), use vykar-server instead.

Setup

Three components work together:

S3 Object Lock — preserves previous object versions for a retention period
s3_soft_delete — vykar overwrites objects with zero-byte tombstones instead of issuing real DELETEs, so prune and compact work without needing s3:DeleteObject permission
S3 lifecycle rule — automatically cleans up non-current (expired) versions

Step 1: Create a bucket with Object Lock

Object Lock can be enabled on a new or existing bucket (existing buckets must have versioning enabled first).

# New bucket:
# For regions other than us-east-1, add:
#   --create-bucket-configuration LocationConstraint=REGION
aws s3api create-bucket \
  --bucket my-backup-bucket \
  --object-lock-enabled-for-bucket

# Or enable on an existing versioned bucket:
# aws s3api put-object-lock-configuration \
#   --bucket my-backup-bucket \
#   --object-lock-configuration '{"ObjectLockEnabled": "Enabled"}'

# Set a default retention policy (GOVERNANCE mode, 30-day retention)
aws s3api put-object-lock-configuration \
  --bucket my-backup-bucket \
  --object-lock-configuration '{
    "ObjectLockEnabled": "Enabled",
    "Rule": {
      "DefaultRetention": {
        "Mode": "GOVERNANCE",
        "Days": 30
      }
    }
  }'

The retention period is your recovery window. If an attacker overwrites backup data, you have this many days to detect the attack and restore from the previous version. 30 days is a starting point; increase it if you need a longer detection window.

GOVERNANCE vs COMPLIANCE mode:

GOVERNANCE: Users with s3:BypassGovernanceRetention can delete locked objects before retention expires. Recommended for backup repositories.
COMPLIANCE: No one can delete locked objects until retention expires, not even the root account. Use only if regulatory requirements demand it.

Object Lock automatically enables bucket versioning.

Step 2: Add a lifecycle rule for cleanup

Without a lifecycle rule, non-current versions accumulate indefinitely. Add a rule to expire them after the retention period:

aws s3api put-bucket-lifecycle-configuration \
  --bucket my-backup-bucket \
  --lifecycle-configuration '{
    "Rules": [
      {
        "ID": "CleanupExpiredVersions",
        "Status": "Enabled",
        "Filter": {},
        "NoncurrentVersionExpiration": {
          "NoncurrentDays": 30
        },
        "Expiration": {
          "ExpiredObjectDeleteMarker": true
        }
      }
    ]
  }'

Set NoncurrentDays to match your Object Lock retention period. Versions that are still locked will not be deleted — S3 respects the lock.

Step 3: Enable soft-delete in vykar

repositories:
  - label: "s3-locked"
    url: "s3://s3.us-east-1.amazonaws.com/my-backup-bucket/vykar"
    region: "us-east-1"
    access_key_id: "AKIA..."
    secret_access_key: "..."
    s3_soft_delete: true

With s3_soft_delete: true, vykar replaces DELETE calls with zero-byte PUT overwrites. The S3 backend transparently filters out these tombstones — they are invisible to list, get, exists, and size operations. Prune and compact work normally; the “deleted” data is retained as a non-current version until the Object Lock retention period expires and the lifecycle rule removes it.

The backup client needs s3:PutObject, s3:GetObject, and s3:ListBucket — no s3:DeleteObject permission required.

Important: s3_soft_delete must only be used with buckets that have S3 Object Lock and versioning enabled. On a plain bucket without versioning, the zero-byte overwrite is irreversible — the original data is lost.

Recovery after an attack

If a compromised client has overwritten objects with garbage, the original versions are preserved as non-current versions in S3. To recover, restore the pre-attack versions using the AWS CLI.

1. Identify affected objects. List versions of a specific key to find the good version:

aws s3api list-object-versions \
  --bucket my-backup-bucket \
  --prefix "packs/ab/" \
  --query 'Versions[?Key==`packs/ab/PACK_ID`].[VersionId,LastModified,Size]' \
  --output table

Versions with Size: 0 are tombstones from soft-delete. Versions with the expected size from before the attack timestamp are the ones to restore.

2. Restore a specific version by copying it back as the current version:

aws s3api copy-object \
  --bucket my-backup-bucket \
  --key "packs/ab/PACK_ID" \
  --copy-source "my-backup-bucket/packs/ab/PACK_ID?versionId=VERSION_ID"

3. Restore all objects to a point in time. To bulk-restore the latest good version of every object modified after a known-good timestamp:

# For each key, find the most recent non-current version before the attack
# timestamp and copy it back as the current version.
aws s3api list-object-versions \
  --bucket my-backup-bucket \
  --query 'Versions[?LastModified<`2025-01-15T00:00:00Z` && !IsLatest].[Key,VersionId,LastModified]' \
  --output text \
| sort -k1,1 -k3,3r \
| awk '!seen[$1]++ {print $1, $2}' \
| while read -r key version_id; do
    aws s3api copy-object \
      --bucket my-backup-bucket \
      --key "$key" \
      --copy-source "my-backup-bucket/${key}?versionId=${version_id}"
  done

The sort | awk pipeline selects only the latest version per key — it sorts by key then by timestamp (newest first), and awk keeps only the first occurrence of each key.

After restoring, verify the repository with vykar check before restoring data.

The recovery commands require s3:ListBucketVersions (to list versions), s3:GetObjectVersion (to read a specific version via ?versionId=), and s3:PutObject (to copy it back as current). The backup client should not have s3:ListBucketVersions or s3:GetObjectVersion during normal operation — use separate admin credentials for recovery.

Limitations

This setup provides a deletion delay, not strict immutability. A compromised client can still overwrite objects with garbage. The protection is that the previous version is preserved for the retention period, allowing recovery if the attack is detected in time.

For stronger guarantees, use vykar-server –append-only, which rejects both overwrites and deletes of immutable keys at the application layer.

SFTP

Store backups on a remote server via SFTP. Uses a native russh implementation (pure Rust SSH/SFTP) — no system ssh binary required. Works on all platforms including Windows.

Host keys are verified with an OpenSSH known_hosts file. Unknown hosts use TOFU (trust-on-first-use): the first key is stored, and later key changes fail connection.

repositories:
  - label: "nas"
    url: "sftp://backup@nas.local/backups/vykar"
    # sftp_key: "/home/user/.ssh/id_rsa"  # Path to private key (optional)
    # sftp_known_hosts: "/home/user/.ssh/known_hosts"  # Optional known_hosts path
    # sftp_timeout: 30         # Per-request timeout in seconds (default: 30, range: 5–300)

URL format: sftp://[user@]host[:port]/path. Default port is 22.

SFTP configuration options

Field	Description
`sftp_key`	Path to SSH private key (auto-detects `~/.ssh/id_ed25519`, `id_rsa`, `id_ecdsa`)
`sftp_known_hosts`	Path to OpenSSH `known_hosts` file (default: `~/.ssh/known_hosts`)
`sftp_timeout`	Per-request SFTP timeout in seconds (default: `30`, clamped to `5..=300`)

REST (vykar-server)

Store backups on a dedicated vykar-server instance via HTTP/HTTPS. The server provides append-only enforcement, quotas, lock management, and server-side compaction.

repositories:
  - label: "server"
    url: "https://backup.example.com"
    access_token: "my-secret-token"          # Bearer token for authentication

REST configuration options

Field	Description
`access_token`	Bearer token sent as `Authorization: Bearer <token>`
`allow_insecure_http`	Permit `http://` REST URLs (unsafe; default: `false`)

See Server Setup for how to set up and configure the server.

All backends are included in pre-built binaries from the releases page.

Make a Backup

Run a backup

Back up all configured sources to all configured repositories:

vykar backup

By default, Vykar preserves filesystem extended attributes (xattrs). Configure this globally with xattrs.enabled, and override per source in rich sources entries.

If some files are unreadable or disappear during the run (for example, permission denied or a file vanishes), Vykar skips those files, still creates the snapshot from everything else, and returns exit code 3 to indicate partial success.

Sources and labels

In its simplest form, sources are just a list of paths:

sources:
  - /home/user/documents
  - /home/user/photos

When you use multiple simple string entries, vykar groups them into one source and creates one snapshot for that grouped source. If you want separate snapshots per path, use rich entries with explicit labels.

For more complex situations you can add overrides to source groups. Each “rich” source in your config produces its own snapshot. When you use the rich source form, the label field gives each source a short name you can reference from the CLI:

sources:
  - label: "photos"
    path: "/home/user/photos"
  - label: "docs"
    paths:
      - "/home/user/documents"
      - "/home/user/notes"
    exclude: ["*.tmp"]
    hooks:
      before: "echo starting docs backup"

Back up only a specific source by label:

vykar backup --source docs

When targeting a specific repository, use --repo:

vykar backup --repo local --source docs

Ad-hoc backups

You can still do ad-hoc backups of arbitrary folders and annotate them with a label, for example before a system change:

vykar backup --label before-upgrade /var/www

--label is only valid for ad-hoc backups with explicit path arguments. For example, this is rejected:

vykar backup --label before-upgrade

So you can identify it later in vykar list output.

List and verify snapshots

# List all snapshots
vykar list

# List the 5 most recent snapshots
vykar list --last 5

# List snapshots for a specific source
vykar list --source docs

# List files inside a snapshot by ID
vykar snapshot list a1b2c3d4

# Find recent SQL dumps across recent snapshots
vykar snapshot find --last 5 --name '*.sql'

# Find logs from one source changed in the last week
vykar snapshot find --source myapp --since 7d --iname '*.log'

Command dumps

You can capture the stdout of shell commands directly into your backup using command_dumps. This is useful for database dumps, API exports, or any generated data that doesn’t live as a regular file on disk:

sources:
  - label: databases
    command_dumps:
      - name: postgres.sql
        command: pg_dump -U myuser mydb
      - name: redis.rdb
        command: redis-cli --rdb -

Each source with command_dumps produces its own snapshot. An explicit label is required.

Each command runs via sh -c and the captured output is stored as a virtual file under vykar-dumps/ in the snapshot. On restore, these appear as regular files:

vykar-dumps/postgres.sql
vykar-dumps/redis.rdb

If any command exits with a non-zero status, the backup is aborted.

Quick Start
Configuration
Restore a Backup

Restore a Backup

Locate snapshots

# List all snapshots
vykar list

# List the 5 most recent snapshots
vykar list --last 5

# List snapshots for a specific source
vykar list --source docs

Inspect snapshot contents

Snapshot-oriented commands take an exact snapshot ID, or latest.

# List files inside a snapshot
vykar snapshot list a1b2c3d4

# List with details (type, permissions, size, mtime)
vykar snapshot list a1b2c3d4 --long

# Limit listing to a subtree
vykar snapshot list a1b2c3d4 --path src

# Sort listing by size (name, size, mtime)
vykar snapshot list a1b2c3d4 --sort size

Inspect snapshot metadata

vykar snapshot info a1b2c3d4

Find files across snapshots

Use snapshot find to locate files before choosing which snapshot to restore from.

# Find PDFs modified in the last 14 days
vykar snapshot find --name '*.pdf' --since 14d

# Limit search to one source and recent snapshots
vykar snapshot find --source docs --last 10 --name '*.docx'

# Search under a subtree with case-insensitive name matching
vykar snapshot find sub --iname 'report*' --since 7d

# Combine type and size filters
vykar snapshot find --type f --larger 1M --smaller 20M --since 30d

--last must be >= 1.
--since accepts positive spans with suffix h, d, w, m (months), or y (for example: 24h, 7d, 2w, 6m, 1y).
--larger means at least this size, and --smaller means at most this size.

Restore to a directory

# Restore all files from a snapshot
vykar restore a1b2c3d4 /tmp/restored

# Restore the most recent snapshot
vykar restore latest /tmp/restored

Restore applies extended attributes (xattrs) by default. Control this with the top-level xattrs.enabled config setting.

Browse via WebDAV and browser UI (mount)

Browse snapshot contents via a local read-only WebDAV server. The same endpoint also serves a built-in HTML browser UI.

# Serve all snapshots (default: http://127.0.0.1:8080)
vykar mount

# Serve a single snapshot
vykar mount --snapshot a1b2c3d4

# Only snapshots from a specific source
vykar mount --source docs

# Custom listen address
vykar mount --address 127.0.0.1:9090

Quick Start
Make a Backup

Maintenance

Delete a snapshot

# Delete a specific snapshot by ID
vykar snapshot delete a1b2c3d4

Delete a repository

Permanently delete an entire repository and all its snapshots.

# Interactive confirmation (prompts you to type "delete")
vykar delete

# Non-interactive (for scripting)
vykar delete --yes-delete-this-repo

Prune old snapshots

Apply the retention policy defined in your configuration to remove expired snapshots. Optionally compact the repository after pruning.

vykar prune --compact

Verify repository integrity

# Structural integrity check
vykar check

# Full data verification (reads and verifies every chunk)
vykar check --verify-data

Compact (reclaim space)

After delete or prune, blob data remains in pack files. Run compact to rewrite packs and reclaim disk space.

# Preview what would be repacked
vykar compact --dry-run

# Repack to reclaim space
vykar compact

Quick Start
Server Setup (server-side compaction)
Architecture (compact algorithm details)

Backup Recipes

Vykar provides hooks, command dumps, and source directories as universal building blocks. Rather than adding dedicated flags for each database or container runtime, the same patterns work for any application.

These recipes are starting points — adapt the commands to your setup.

Databases

Databases should never be backed up by copying their data files while running. Use the database’s own dump tool to produce a consistent export.

Where possible, use command dumps — they stream stdout directly into the backup without temporary files. For tools that can’t stream to stdout, use hooks to dump to a temporary directory, back it up, then clean up.

PostgreSQL

sources:
  - label: postgres
    command_dumps:
      - name: mydb.dump
        command: "pg_dump -U myuser -Fc mydb"

For all databases at once:

sources:
  - label: postgres
    command_dumps:
      - name: all.sql
        command: "pg_dumpall -U postgres"

If you need to run additional steps around the dump (e.g. custom authentication, pre/post scripts), use hooks instead. Note that this saves the dump to disk instead of reading it directly with the command_dump feature.

sources:
  - label: postgres
    path: /var/backups/postgres
    hooks:
      before: >
        mkdir -p /var/backups/postgres &&
        pg_dump -U myuser -Fc mydb > /var/backups/postgres/mydb.dump
      after: "rm -rf /var/backups/postgres"

MySQL / MariaDB

sources:
  - label: mysql
    command_dumps:
      - name: all.sql
        command: "mysqldump -u root -p\"$MYSQL_ROOT_PASSWORD\" --all-databases"

With hooks:

sources:
  - label: mysql
    path: /var/backups/mysql
    hooks:
      before: >
        mkdir -p /var/backups/mysql &&
        mysqldump -u root -p"$MYSQL_ROOT_PASSWORD" --all-databases
        > /var/backups/mysql/all.sql
      after: "rm -rf /var/backups/mysql"

MongoDB

sources:
  - label: mongodb
    command_dumps:
      - name: mydb.archive.gz
        command: "mongodump --archive --gzip --db mydb"

For all databases, omit --db:

sources:
  - label: mongodb
    command_dumps:
      - name: all.archive.gz
        command: "mongodump --archive --gzip"

SQLite

SQLite can’t stream to stdout, so use a hook. Copying the database file directly risks corruption if a process holds a write lock.

sources:
  - label: app-database
    path: /var/backups/sqlite
    hooks:
      before: >
        mkdir -p /var/backups/sqlite &&
        sqlite3 /var/lib/myapp/app.db ".backup '/var/backups/sqlite/app.db'"
      after: "rm -rf /var/backups/sqlite"

Redis

sources:
  - label: redis
    path: /var/backups/redis
    hooks:
      before: >
        mkdir -p /var/backups/redis &&
        redis-cli BGSAVE &&
        sleep 2 &&
        cp /var/lib/redis/dump.rdb /var/backups/redis/dump.rdb
      after: "rm -rf /var/backups/redis"

The sleep gives Redis time to finish the background save. For large datasets, check redis-cli LASTSAVE in a loop instead.

Docker and Containers

The same patterns work for containerized applications. Use docker exec for command dumps and hooks, or back up Docker volumes directly from the host.

These examples use Docker, but the same approach works with Podman or any other container runtime.

Docker volumes (static data)

For volumes that hold files not actively written to by a running process — configuration, uploaded media, static assets — back up the host path directly.

sources:
  - label: myapp
    path: /var/lib/docker/volumes/myapp_data/_data

Note: The default volume path /var/lib/docker/volumes/ applies to standard Docker installs on Linux. It differs for Docker Desktop on macOS/Windows, rootless Docker, Podman (/var/lib/containers/storage/volumes/ for root, ~/.local/share/containers/storage/volumes/ for rootless), and custom data-root configurations. Run docker volume inspect <n> or podman volume inspect <n> to find the actual path.

Docker volumes with brief downtime

For applications that write to the volume but can tolerate a short stop, stop the container during backup.

sources:
  - label: wiki
    path: /var/lib/docker/volumes/wiki_data/_data
    hooks:
      before: "docker stop wiki"
      finally: "docker start wiki"

Database containers

Use command dumps with docker exec to stream database exports directly from a container.

PostgreSQL in Docker:

sources:
  - label: app-database
    command_dumps:
      - name: mydb.dump
        command: "docker exec my-postgres pg_dump -U myuser -Fc mydb"

MySQL / MariaDB in Docker:

sources:
  - label: app-database
    command_dumps:
      - name: mydb.sql
        command: "docker exec my-mysql mysqldump -u root -p\"$MYSQL_ROOT_PASSWORD\" mydb"

MongoDB in Docker:

sources:
  - label: app-database
    command_dumps:
      - name: mydb.archive.gz
        command: "docker exec my-mongo mongodump --archive --gzip --db mydb"

Multiple containers

Use separate source entries so each service gets its own label, retention policy, and hooks.

sources:
  - label: nginx
    path: /var/lib/docker/volumes/nginx_config/_data
    retention:
      keep_daily: 7

  - label: app-database
    command_dumps:
      - name: mydb.dump
        command: "docker exec my-postgres pg_dump -U myuser -Fc mydb"
    retention:
      keep_daily: 30

  - label: uploads
    path: /var/lib/docker/volumes/uploads/_data

Virtual Machine Disk Images

Virtual machine disk images are an excellent use case for deduplicated backups. Large portions of a VM’s disk remain unchanged between snapshots, so Vykar’s content-defined chunking achieves high deduplication ratios — often reducing storage to a fraction of the raw image size.

Prerequisites

The guest VM must have the QEMU guest agent installed and running, and QEMU must be started with a guest agent socket (e.g. -chardev socket,path=/tmp/qga.sock,server=on,wait=off,id=qga0). Install socat on the host if not already present.

Freeze, Backup, Thaw

Use hooks to freeze the guest filesystem before backing up the disk image, then thaw it afterwards:

sources:
  - label: vm-images
    path: /var/lib/libvirt/images
    hooks:
      before: >
        echo '{"execute":"guest-fsfreeze-freeze"}' |
        socat - unix-connect:/tmp/qga.sock
      finally: >
        echo '{"execute":"guest-fsfreeze-thaw"}' |
        socat - unix-connect:/tmp/qga.sock

The freeze ensures the filesystem is in a clean state while Vykar reads the image. For incremental backups (every run after the first), only changed chunks are processed, so the freeze window is short.

Tips

Raw images dedup better than qcow2. The qcow2 format uses internal copy-on-write structures that can shuffle data, reducing byte-level similarity between snapshots. If practical, convert with qemu-img convert -f qcow2 -O raw.
Multiple VMs in one repo provides cross-VM deduplication. VMs running the same OS share many common chunks.
For environments that cannot tolerate any guest I/O pause, use QEMU external snapshots instead. This redirects writes to an overlay file via QMP blockdev-snapshot-sync, allowing the base image to be backed up with zero interruption. This is the approach used by Proxmox VE and libvirt.

Filesystem Snapshots

For filesystems that support snapshots, the safest approach is to snapshot first, back up the snapshot, then delete it. This gives you a consistent point-in-time view without stopping any services.

Btrfs

sources:
  - label: data
    path: /mnt/.snapshots/data-backup
    hooks:
      before: "btrfs subvolume snapshot -r /mnt/data /mnt/.snapshots/data-backup"
      after:  "btrfs subvolume delete /mnt/.snapshots/data-backup"

The snapshot parent directory (/mnt/.snapshots/) must exist before the first backup. Create it once:

mkdir -p /mnt/.snapshots

ZFS

sources:
  - label: data
    path: /tank/data/.zfs/snapshot/vykar-tmp
    hooks:
      before: "zfs snapshot tank/data@vykar-tmp"
      after:  "zfs destroy tank/data@vykar-tmp"

Important: The .zfs/snapshot directory is only accessible if snapdir is set to visible on the dataset. This is not the default. Set it before using this recipe:
zfs set snapdir=visible tank/data

LVM

sources:
  - label: data
    path: /mnt/lvm-snapshot
    hooks:
      before: >
        lvcreate -s -n vykar-snap -L 5G /dev/vg0/data &&
        mkdir -p /mnt/lvm-snapshot &&
        mount -o ro /dev/vg0/vykar-snap /mnt/lvm-snapshot
      after: >
        umount /mnt/lvm-snapshot &&
        lvremove -f /dev/vg0/vykar-snap

Set the snapshot size (-L 5G) large enough to hold changes during the backup.

Low-Resource Background Backup

If backups should run in the background with minimal impact on interactive work, use conservative resource limits. This will usually increase backup duration.

compression:
  algorithm: lz4

limits:
  threads: 1
  nice: 19
  connections: 1
  upload_mib_per_sec: 2
  download_mib_per_sec: 4

threads: 1 keeps backup transforms mostly sequential.
nice: 19 lowers CPU scheduling priority on Unix; it is ignored on Windows.
connections: 1 minimizes backend parallelism (SFTP pool, upload concurrency, restore readers).
upload_mib_per_sec and download_mib_per_sec cap backend throughput in MiB/s.
If this is too slow, raise upload_mib_per_sec first, then increase connections.

Network-Aware Backups

A before_backup hook that exits non-zero skips the backup. This lets you restrict backups to specific networks without any changes to Vykar itself.

WiFi SSID filtering

Only run backups when connected to a specific WiFi network.

macOS:

hooks:
  before_backup: >-
    networksetup -getairportnetwork en0
    | grep -q 'HomeNetwork'

Linux (NetworkManager):

hooks:
  before_backup: >-
    nmcli -t -f active,ssid dev wifi
    | grep -q '^yes:HomeNetwork$'

Multiple allowed SSIDs:

hooks:
  before_backup: >-
    nmcli -t -f active,ssid dev wifi
    | grep -qE '^yes:(HomeNetwork|OfficeNetwork)$'

Inverted logic — run on any network except a blocklist:

hooks:
  before_backup: >-
    ! nmcli -t -f active,ssid dev wifi
    | grep -q '^yes:CoffeeShopWiFi$'

Metered network detection

Android hotspots and tethered connections advertise metered status via DHCP. Linux network managers read this automatically, so you can skip backups on metered connections without maintaining an SSID list.

NetworkManager:

hooks:
  before_backup: >-
    METERED=$(nmcli -t -f GENERAL.METERED dev show
    | grep -m1 GENERAL.METERED
    | cut -d: -f2);
    [ "$METERED" != "yes" ] && [ "$METERED" != "guess-yes" ]

NetworkManager reports four values: yes (explicitly metered), guess-yes (heuristic, e.g. Android hotspot), no, and unknown. The hook above skips on both yes and guess-yes.

systemd-networkd:

hooks:
  before_backup: >-
    ! networkctl status
    | grep -qi 'metered.*yes'

Note: macOS has no CLI-exposed metered attribute. Use SSID filtering instead.

Monitoring

Vykar hooks can notify monitoring services on success or failure. A curl in an after hook replaces the need for dedicated integrations.

Apprise (multi-service)

Apprise sends notifications to 100+ services (Gotify, Slack, Discord, Telegram, ntfy, email, and more) from the command line. Since vykar hooks run arbitrary shell commands, you can use the apprise CLI directly — no built-in integration needed.

Install it with:

pip install apprise

If you use the Docker image, the apprise variant has it pre-installed — use the latest-apprise tag (or e.g. 0.12.6-apprise). See Docker installation.

hooks:
  after_backup:
    - >-
      apprise -t "Backup complete"
      -b "vykar {command} finished for {repository}"
      "gotify://hostname/token"
      "slack://tokenA/tokenB/tokenC"
  failed:
    - >-
      apprise -t "Backup failed"
      -b "vykar {command} failed for {repository}: {error}"
      "gotify://hostname/token"

Common service URL examples:

Service	URL format
Gotify	`gotify://hostname/token`
Slack	`slack://tokenA/tokenB/tokenC`
Discord	`discord://webhook_id/webhook_token`
Telegram	`tgram://bot_token/chat_id`
ntfy	`ntfy://topic`
Email	`mailto://user:pass@gmail.com`

You can pass multiple URLs in a single command to notify several services at once. See the Apprise wiki for the full list of supported services and URL formats.

Healthchecks

Healthchecks alerts you when backups stop arriving. Ping the check URL after each successful backup.

hooks:
  after: "curl -fsS -m 10 --retry 5 https://hc-ping.com/your-uuid-here"

To report failures too, use separate success and failure URLs:

hooks:
  after: "curl -fsS -m 10 --retry 5 https://hc-ping.com/your-uuid-here"
  failed: "curl -fsS -m 10 --retry 5 https://hc-ping.com/your-uuid-here/fail"

ntfy

ntfy sends push notifications to your phone. Useful for immediate failure alerts.

hooks:
  failed: >
    curl -fsS -m 10
    -H "Title: Backup failed"
    -H "Priority: high"
    -H "Tags: warning"
    -d "vykar backup failed on $(hostname)"
    https://ntfy.sh/my-backup-alerts

Uptime Kuma

Uptime Kuma is a self-hosted monitoring tool. Use a push monitor to track backup runs.

hooks:
  after: "curl -fsS -m 10 http://your-kuma-instance:3001/api/push/your-token?status=up"

Generic webhook

Any service that accepts HTTP requests works the same way.

hooks:
  after: >
    curl -fsS -m 10 -X POST
    -H "Content-Type: application/json"
    -d '{"text": "Backup completed on $(hostname)"}'
    https://hooks.slack.com/services/your/webhook/url

Daemon Mode

vykar daemon runs scheduled backup cycles as a foreground process. Each cycle executes the default actions (backup → prune → compact → check) for all configured repositories, sequentially. The shutdown flag is checked between steps.

Scheduling: sleep-loop with configurable interval (schedule.every, e.g. "6h") or cron expression (schedule.cron, e.g. "0 3 * * *"). Optional random jitter (jitter_seconds) spreads load across hosts.
Passphrase: the daemon validates at startup that all encrypted repos have a non-interactive passphrase source (passcommand, passphrase, or VYKAR_PASSPHRASE env). It cannot prompt interactively.
Scheduler lock: the daemon and GUI share a process-wide scheduler lock under the local config directory so only one scheduler is active at a time. On Unix this uses flock(2) and is released automatically on process exit.

Configuration:

schedule:
  enabled: true
  every: "6h"                  # fixed interval
  # cron: "0 3 * * *"         # OR 5-field cron (mutually exclusive with every)
  on_startup: false
  jitter_seconds: 0

Read-only status page

The daemon can serve a small read-only HTML page that mirrors the GUI overview — repository list, recent snapshots, sources, last cycle outcome, next scheduled run. It is disabled by default; opt in with --http-listen (or the VYKAR_HTTP_LISTEN environment variable):

vykar daemon --http-listen 127.0.0.1:7575

The flag takes a full host:port address. There is no implicit default — passing the flag without a value is an error. Port 7575 is the recommended convention but is not assumed.

What the page shows:

Process info: hostname, pid, version, uptime, next scheduled run
Schedule summary (interval / cron expression / Off)
Per-repository snapshot count, last snapshot time, total stored size
The 10 most recent snapshots across all repositories
Configured sources and their target repositories
Last cycle: started/finished timestamps, duration, outcome (ok / partial / errors)

The page auto-refreshes every 30 seconds via a <meta http-equiv="refresh"> tag — no JavaScript, no external assets, no cache. Data is refreshed at process startup, after every backup cycle, and after a SIGHUP reload.

Endpoints:

GET / — HTML overview
GET /healthz — 200 OK plain text, suitable for Docker / Kubernetes liveness probes
GET /api/status.json — same data as /, JSON-serialized

There are no write actions: no “Run Backup” button, no config edits, no authentication. The page is purely an inspection surface.

Bind safety

Non-loopback bind addresses (anything outside 127.0.0.0/8 and ::1, including 0.0.0.0 and ::) are rejected at startup unless you also pass --http-allow-public (or set VYKAR_HTTP_ALLOW_PUBLIC=1):

vykar daemon --http-listen 0.0.0.0:7575 --http-allow-public

The page exposes repository names, URLs, snapshot identifiers, and source paths — information that is sensitive on most deployments. The two-flag rule prevents accidentally exposing this on a public interface. If you need to expose it beyond the host, terminate TLS and add authentication in a reverse proxy (nginx, Caddy, Traefik) — vykar speaks plain HTTP only.

+----------------+   loopback   +------------+   public TLS   +------+
| vykar daemon   | <----------- | reverse    | <------------- | user |
| 127.0.0.1:7575 |              | proxy      |                +------+
+----------------+              +------------+

Config reload via SIGHUP

Send SIGHUP to the daemon process to reload the configuration file without restarting:

kill -HUP $(pidof vykar)

Reload behavior:

The reload takes effect between backup cycles — a cycle in progress runs to completion first
on_startup is ignored on reload; next_run is recalculated from the schedule relative to now
If the new config is invalid (parse error, empty repositories, schedule.enabled: false, passphrase validation failure), the daemon logs a warning and continues with the previous config
If the new config is valid, repos and schedule are replaced and the next run time is recalculated

Ad-hoc backup via SIGUSR1

Send SIGUSR1 to the daemon to trigger an immediate backup cycle:

kill -USR1 $(pidof vykar)

The cycle runs between scheduled backups — a cycle in progress runs to completion first, then the triggered cycle starts
The existing schedule is preserved when the ad-hoc cycle finishes before the next scheduled slot; if it overruns the slot, the next run is recalculated from the current time (same as after any regular cycle)
With systemd: systemctl kill -s USR1 vykar

Deployment

systemd

Create a unit file at /etc/systemd/system/vykar.service:

[Unit]
Description=Vykar Backup Daemon
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
ExecStartPre=+/bin/mkdir -p %h/.cache/vykar %h/.config/vykar
ExecStart=/usr/local/bin/vykar --config /etc/vykar/config.yaml daemon
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=60

# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=%h/.cache/vykar %h/.config/vykar
# If backing up to a local path, add it here too, e.g.:
# ReadWritePaths=%h/.cache/vykar %h/.config/vykar /mnt/backup/vykar
PrivateTmp=true
PrivateDevices=true

# Passphrase via environment file (optional)
# EnvironmentFile=/etc/vykar/env

[Install]
WantedBy=multi-user.target

Local repositories: the ProtectSystem=strict directive makes the filesystem read-only by default. If any repository target is a local path, add it to ReadWritePaths or the backup will fail with “Read-only file system”.

Then enable and start:

systemctl daemon-reload
systemctl enable --now vykar

Reload configuration after editing the config file:

systemctl reload vykar

Check status and logs:

systemctl status vykar
journalctl -u vykar -f

Docker

The default Docker entrypoint runs vykar daemon. See Installing — Docker for container setup, volume mounts, and Docker Compose examples.

To enable the read-only status page in Docker, set VYKAR_HTTP_LISTEN (and VYKAR_HTTP_ALLOW_PUBLIC=1 if binding to 0.0.0.0) and publish port 7575 — the entrypoint and CMD do not need to change:

docker run -d --name vykar-daemon \
  -p 7575:7575 \
  -e VYKAR_HTTP_LISTEN=0.0.0.0:7575 \
  -e VYKAR_HTTP_ALLOW_PUBLIC=1 \
  -v /etc/vykar:/etc/vykar:ro \
  vykar

Compose equivalent:

services:
  vykar:
    image: vykar
    environment:
      VYKAR_HTTP_LISTEN: "0.0.0.0:7575"
      VYKAR_HTTP_ALLOW_PUBLIC: "1"
    ports:
      - "7575:7575"
    volumes:
      - /etc/vykar:/etc/vykar:ro

To reload configuration in a running container:

docker kill --signal=HUP vykar-daemon
# or with Compose:
docker compose kill -s HUP vykar

To trigger an immediate backup:

docker kill --signal=USR1 vykar-daemon
# or with Compose:
docker compose kill -s USR1 vykar

Configuration

Vykar is driven by a YAML configuration file. Generate a starter config with:

vykar config

Config file locations

Vykar automatically finds config files in this order:

--config <path> flag
VYKAR_CONFIG environment variable
./vykar.yaml (project)
User config dir + vykar/config.yaml:
- Unix: $XDG_CONFIG_HOME/vykar/config.yaml or ~/.config/vykar/config.yaml
- Windows: %APPDATA%\\vykar\\config.yaml
System config:
- Unix: /etc/vykar/config.yaml
- Windows: %PROGRAMDATA%\\vykar\\config.yaml

You can also set VYKAR_PASSPHRASE to supply the passphrase non-interactively.

Override the local cache directory with cache_dir at the top level:

cache_dir: "/tmp/vykar-cache"

Defaults to the platform cache directory when omitted.

Minimal example

A complete but minimal working config. Encryption defaults to auto (init benchmarks AES-256-GCM vs ChaCha20-Poly1305 and pins the repo), so you only need repositories and sources:

repositories:
  - url: "/backup/repo"

sources:
  - "/home/user/documents"

Windows:

repositories:
  - url: 'D:\Backups\repo'

sources:
  - 'C:\Users\me\Documents'

Windows paths and YAML quoting: In YAML, double-quoted strings interpret backslashes as escape sequences — "C:\Users\..." will fail because \U is parsed as a hex escape. Use single quotes or no quotes for Windows paths:
# These work:
- 'C:\Users\me\Documents'
- C:\Users\me\Documents

# This does NOT work:
- "C:\Users\me\Documents"

Repositories

Local:

repositories:
  - label: "local"
    url: "/backups/repo"
    # Windows: url: 'D:\Backups\repo'

S3:

repositories:
  - label: "s3"
    url: "s3://s3.us-east-1.amazonaws.com/my-bucket/vykar"
    region: "us-east-1"
    access_key_id: "AKIA..."
    secret_access_key: "..."

Each entry in the repositories list accepts the following fields. url is the only required one.

Common fields (all backends):

Field	Default	Values	Description
`url`	(required)	string	Repository URL or local path
`label`	—	string	Human label for `--repo` targeting
`allow_insecure_http`	`false`	bool	Allow plaintext HTTP (required for `http://` and `s3+http://` URLs)
`min_pack_size`	32 MiB (33554432)	integer (bytes)	Minimum pack file size
`max_pack_size`	192 MiB (201326592)	integer (bytes)	Maximum pack file size (hard ceiling: 512 MiB)

S3 fields:

Field	Default	Values	Description
`region`	—	string	S3 region (defaults to `us-east-1` at runtime)
`access_key_id`	—	string	S3 access key ID
`secret_access_key`	—	string	S3 secret access key
`s3_soft_delete`	`false`	bool	Use soft delete for S3 Object Lock compatibility

SFTP fields:

Field	Default	Values	Description
`sftp_key`	—	string	Path to SSH private key. Auto-detects `~/.ssh/{id_ed25519, id_rsa, id_ecdsa}` when omitted
`sftp_known_hosts`	—	string	Path to known_hosts file. Defaults to `~/.ssh/known_hosts` at runtime
`sftp_timeout`	—	integer (seconds, 5–300)	Per-request timeout. Defaults to 30s; clamped to 5–300s range

REST server fields:

Field	Default	Values	Description
`access_token`	—	string	Bearer token for REST server auth

Per-repo override sections (optional, replace top-level when set): encryption, compression, retention, limits. Per-repo-only section: retry. Per-repo hooks are additive — both global and repo hooks are kept and executed in the order described in Execution order.

See Storage Backends for all backend-specific options.

For remote repositories, transport is HTTPS-first by default. To intentionally use plaintext HTTP (for local/dev setups), set:

repositories:
  - url: "http://localhost:8484"
    allow_insecure_http: true

For S3-compatible HTTP endpoints, use s3+http://... URLs with allow_insecure_http: true.

Multiple repositories

Add more entries to repositories: to back up to multiple destinations. Top-level settings serve as defaults; each entry can override encryption, compression, retention, and limits.

repositories:
  - label: "local"
    url: "/backups/local"

  - label: "remote"
    url: "s3://s3.us-east-1.amazonaws.com/bucket/remote"
    region: "us-east-1"
    access_key_id: "AKIA..."
    secret_access_key: "..."
    encryption:
      passcommand: "pass show vykar-remote"
    compression:
      algorithm: "zstd"             # Better ratio for remote
    retention:
      keep_daily: 30                 # Keep more on remote
    limits:
      connections: 2
      upload_mib_per_sec: 25

When limits is set on a repository entry, it replaces top-level limits for that repository.

By default, commands operate on all repositories. Use --repo / -R to target a single one:

vykar list --repo local
vykar list -R /backups/local

Retry

Retry settings for transient remote errors. Repo-level only — there is no top-level retry section. Uses exponential backoff with jitter.

repositories:
  - url: "s3://..."
    retry:
      max_retries: 5
      retry_delay_ms: 2000

Field	Default	Values	Description
`max_retries`	`5`	integer	Maximum retry attempts
`retry_delay_ms`	`1500`	integer (ms)	Initial delay between retries
`retry_max_delay_ms`	`60000`	integer (ms)	Maximum delay between retries

Note: The default (5 retries, ~1 minute of cumulative backoff on average with jitter, up to ~90s worst case) is sized to absorb a brief network gap such as WiFi reconnecting after laptop sleep. Raise max_retries further if you run on a flaky link; set it to 0 to fail fast for CI or scripted runs.

3-2-1 backup strategy

Tip: Configuring both a local and a remote repository gives you a 3-2-1 backup setup: three copies of your data (the original files, the local backup, and the remote backup), on two different media types, with one copy offsite. The example above already achieves this.

Sources

Sources define what to back up — filesystem paths, command output, or both. Each source entry produces one snapshot per backup run.

Simple form:

sources:
  - "/home/user/documents"
  - "/home/user/photos"
  # Windows:
  # - 'C:\Users\me\Documents'
  # - 'C:\Users\me\Photos'

Simple entries are grouped into one source. With one simple path, the source label is derived from the directory name. With multiple simple paths, the grouped source label becomes default. Use rich entries if you want separate source labels or one snapshot per path.

Rich form (single path):

sources:
  - label: "docs"
    path: "/home/user/documents"
    exclude: ["*.tmp", ".cache/**"]
    # exclude_if_present: [".nobackup", "CACHEDIR.TAG"]
    # one_file_system: true
    # git_ignore: false
    repos: ["main"]                  # Only back up to this repo (default: all)
    retention:
      keep_daily: 7
    hooks:
      before: "echo starting docs backup"

Each path: entry produces its own snapshot. To group multiple directories into a single snapshot, use paths: (plural) instead — see below.

Rich form (multiple paths):

Use paths (plural) to group several directories into a single source. An explicit label is required:

sources:
  - label: "writing"
    paths:
      - "/home/user/documents"
      - "/home/user/notes"
    exclude: ["*.tmp"]

These directories are backed up together as one snapshot. You cannot use both path and paths on the same entry.

Inside a multi-path source, each path’s contents land in the snapshot under a prefix derived from its full absolute path: leading / stripped on Unix, drive-letter colon dropped and backslashes converted to forward slashes on Windows. For example, /etc lands at etc/…, /var/lib/machines/base/etc lands at var/lib/machines/base/etc/…, and C:\Users\me\docs lands at C/Users/me/docs/…. This lets paths with the same basename — paths: ["/etc", "/var/lib/machines/base/etc"] — coexist in one source without colliding.

Field	Default	Values	Description
`path`	—	string	Single directory to back up (mutually exclusive with `paths`)
`paths`	—	list of strings	Multiple directories as one snapshot (requires `label`)
`label`	derived	string	Source label. Auto-derived from dir name for single path; required for multi-path and dump-only
`exclude`	`[]`	list of strings	Per-source exclude patterns (merged with global `exclude_patterns`)
`exclude_if_present`	—	list of strings	Per-source marker files. Inherits global `exclude_if_present` when omitted; replaces global when set
`one_file_system`	inherited	bool	Override global `one_file_system`
`git_ignore`	inherited	bool	Override global `git_ignore`
`xattrs`	inherited	`{enabled: bool}`	Override global `xattrs`
`repos`	`[]` (all)	list of strings	Restrict to named repositories
`retention`	inherited	object	Per-source retention policy
`hooks`	`{}`	object	Source-level hooks (`before`/`after`/`failed`/`finally` only)
`command_dumps`	`[]`	list	Command dump entries

Per-source overrides

Each source entry in rich form can override global settings. This lets you tailor backup behavior per directory:

sources:
  - label: "docs"
    path: "/home/user/documents"
    exclude: ["*.tmp"]
    xattrs:
      enabled: false                 # Override top-level xattrs setting for this source
    repos: ["local"]                 # Only back up to the "local" repo
    retention:
      keep_daily: 7
      keep_weekly: 4

  - label: "photos"
    path: "/home/user/photos"
    repos: ["local", "remote"]       # Back up to both repos
    retention:
      keep_daily: 30
      keep_monthly: 12
    hooks:
      after: "echo photos backed up"

Per-source fields that override globals: exclude, exclude_if_present, one_file_system, git_ignore, repos, retention, hooks, command_dumps.

Command Dumps

Capture the stdout of shell commands directly into your backup. Useful for database dumps, API exports, or any generated data that doesn’t live as a regular file on disk.

sources:
  - label: databases
    command_dumps:
      - name: postgres.sql
        command: pg_dump -U myuser mydb
      - name: redis.rdb
        command: redis-cli --rdb -

Each source with command_dumps produces its own snapshot. An explicit label is required.

Field	Default	Values	Description
`name`	(required)	string	Virtual filename (no `/` or `\`, no duplicates within source)
`command`	(required)	string	Shell command whose stdout is captured (run via `sh -c`)

Output is stored as virtual files under vykar-dumps/ in the snapshot. On restore they appear as regular files (e.g. vykar-dumps/postgres.sql).

To include command dumps in the same snapshot as filesystem paths, add both to one source entry:

sources:
  - label: server
    paths:
      - /etc
      - /var/www
    command_dumps:
      - name: postgres.sql
        command: pg_dump -U myuser mydb

If a dump command exits with non-zero status, the backup is aborted. Any chunks already uploaded to packs remain on disk but are not added to the index; they are reclaimed on the next vykar compact run.

See Backup — Command dumps for more details and Recipes for PostgreSQL, MySQL, MongoDB, and Docker examples.

Encryption

Encryption is enabled by default (auto mode with Argon2id key derivation). You only need an encryption section to supply a passcommand, force a specific algorithm, or disable encryption.

encryption:
  mode: "chacha20poly1305"
  passphrase: "correct-horse-battery-staple"

Field	Default	Values	Description
`mode`	`"auto"`	`"auto"`, `"aes256gcm"`, `"chacha20poly1305"`, `"none"`	Encryption algorithm. `auto` benchmarks at init
`passphrase`	—	string (quoted)	Inline passphrase (not recommended for production)
`passcommand`	—	string (quoted)	Shell command that prints the passphrase

none mode requires no passphrase and creates no key file. Data is still checksummed via keyed BLAKE2b-256 chunk IDs to detect storage corruption, but is not authenticated against tampering. See Architecture — Plaintext Mode for details.

passcommand runs through the platform shell:

Unix: sh -c
Windows: powershell -NoProfile -NonInteractive -Command

For vykar daemon, encrypted repositories must have a non-interactive passphrase source available (passcommand, passphrase, or VYKAR_PASSPHRASE).

Compression

LZ4 (default) is optimised for speed — even on incompressible data the overhead is negligible, and reduced I/O usually more than compensates. ZSTD gives better compression ratios at the cost of more CPU; level 3 is a good starting point. none disables compression entirely.

compression:
  algorithm: "zstd"
  zstd_level: 6

Field	Default	Values	Description
`algorithm`	`"lz4"`	`"lz4"`, `"zstd"`, `"none"`	Compression algorithm
`zstd_level`	`3`	integer, 1–22	Zstd compression level (only used with `zstd`). 1–3 favours speed, 6–9 balances speed and ratio, 19–22 maximises ratio at significant CPU cost. Most users should stay in the 3–6 range

Use --compression on the CLI to override the configured algorithm for a single backup run:

vykar backup --compression zstd

Chunker

chunker:
  min_size: 524288      # 512 KiB
  avg_size: 2097152     # 2 MiB
  max_size: 8388608     # 8 MiB

Field	Default	Values	Description
`min_size`	512 KiB (524288)	integer (bytes)	Minimum chunk size. Must be ≤ `avg_size`
`avg_size`	2 MiB (2097152)	integer (bytes)	Average chunk size
`max_size`	8 MiB (8388608)	integer (bytes, hard cap: 16 MiB)	Maximum chunk size. Clamped to 16 MiB if set higher

Exclude Patterns

Vykar uses gitignore-style patterns for file exclusion. Patterns can be set globally (exclude_patterns) or per-source (exclude); both lists are merged at runtime.

Basic patterns

Wildcards and exact names match at any depth within a source:

# Global excludes — apply to every source directory
exclude_patterns:
  - "*.tmp"              # any .tmp file, at any depth
  - "*.log"              # any .log file, at any depth
  - ".cache/"            # any directory named .cache (trailing / = dirs only)
  - "__pycache__/"       # same — directories only
  - ".DS_Store"          # exact filename, any depth
  - "Thumbs.db"

Per-source excludes target specific paths within a single source:

sources:
  - path: "/home/user/videos"
    exclude:
      - "/TV"                          # Excludes <source>/TV
  - path: "/home/user/photos"
    exclude:
      - "/thumbnails"                  # Excludes <source>/thumbnails
      - "/My Albums"                   # Spaces in paths work fine

Per-source exclude patterns are added after global exclude_patterns. Both lists use the same matching rules.

Anchoring and depth

Where a pattern matches depends on whether it contains a /:

No slash (e.g., *.tmp, TV): matches at any depth, as if prefixed with **/.
Contains a slash (e.g., logs/debug, /Downloads): anchored to the source root. A leading / is optional — logs/debug and /logs/debug behave identically.
Trailing / (e.g., .cache/): only matches directories.

Important: Patterns are matched against paths relative to each source directory, not against absolute filesystem paths. An absolute path like /home/user/videos/TV will not work — use per-source exclude with relative paths instead:
# WRONG — silently excludes nothing
exclude_patterns:
  - "/home/user/videos/TV"

# CORRECT — anchored to the source root
sources:
  - path: "/home/user/videos"
    exclude:
      - "/TV"

Negation (re-including files)

The ! prefix overrides an earlier exclude, re-including the matched file or directory:

exclude_patterns:
  - "*.log"
  - "!important.log"       # keep important.log despite the *.log rule

Limitation: a negation cannot re-include a file if its parent directory was already excluded. The excluded directory is never traversed, so patterns for files inside it are never evaluated. To work around this, re-include each parent directory explicitly:

exclude_patterns:
  - "log*"                 # excludes logfiles/, logs/, logfile.log, etc.
  - "!logfiles/"           # re-include the directory so it is traversed
  - "!logfiles/logs/"      # same for the nested directory
  - "!logfile.log"         # now this re-includes matching files inside

Other exclusion methods

exclude_if_present:                  # Skip dirs containing any marker file
  - ".nobackup"
  - "CACHEDIR.TAG"
one_file_system: false               # Do not cross filesystem/mount boundaries (default false)
git_ignore: false                    # Respect .gitignore files (default false)
xattrs:                              # Extended attribute handling
  enabled: true                      # Preserve xattrs on backup/restore (default true, Unix-only)

Field	Default	Values	Description
`exclude_if_present`	`[]`	list of strings	Marker filenames — directories containing any of these are skipped
`one_file_system`	`false`	bool	Don’t cross filesystem/mount boundaries
`git_ignore`	`false`	bool	Respect `.gitignore` files in source dirs
`xattrs.enabled`	`true`	bool	Preserve extended file attributes on backup/restore (Unix only)

Hostname

By default, vykar records the short system hostname (everything before the first .) in each snapshot. On macOS, gethostname() returns a network-dependent FQDN (e.g. MyMac.local vs MyMac.fritz.box depending on VPN); truncating at the first dot keeps the hostname stable across network changes. On Linux and Windows, hostnames typically have no dots, so this is a no-op.

To override the hostname recorded in snapshots:

hostname: MyMachine

Field	Default	Values	Description
`hostname`	—	string	Override hostname in snapshots. Defaults to system short hostname at runtime

This only affects snapshot metadata — lock files and session markers always use the raw system hostname.

Retention

All fields optional. At least one should be set for the policy to have effect.

retention:
  keep_daily: 7
  keep_weekly: 4
  keep_monthly: 6
  keep_within: "2d"

Field	Default	Values	Description
`keep_last`	—	integer	Keep N most recent snapshots
`keep_hourly`	—	integer	Keep N hourly snapshots
`keep_daily`	—	integer	Keep N daily snapshots
`keep_weekly`	—	integer	Keep N weekly snapshots
`keep_monthly`	—	integer	Keep N monthly snapshots
`keep_yearly`	—	integer	Keep N yearly snapshots
`keep_within`	—	duration string (`h`/`d`/`w`/`m`/`y`)	Keep all snapshots within this period. Suffixes: `h` = hours, `d` = days, `w` = weeks, `m` = months (30d), `y` = years (365d)

Compact

compact:
  threshold: 30

Field	Default	Values	Description
`threshold`	`20`	number, 0–100	Minimum % unused space to trigger repack. Reset to default if out of range

Check

Control the integrity check step during scheduled/daemon backup cycles. Standalone vykar check always runs a full 100% check regardless of these settings.

check:
  max_percent: 10
  full_every: "30d"

Field	Default	Values	Description
`max_percent`	`0`	integer, 0–100	% of packs/snapshots to verify per scheduled cycle. `0` = skip partial checks
`full_every`	`"60d"`	duration string (`s`/`m`/`h`/`d`) or `null`	Full 100% check interval. Overrides `max_percent` when due. `null` disables periodic full checks

How it works: On each daemon/GUI cycle, vykar checks a local timestamp file to determine whether a full check is due. If full_every is due (or the timestamp is missing/corrupt), a full 100% check runs and the timestamp is updated. Otherwise, if max_percent > 0, a random sample of that percentage of packs and snapshots is verified. If max_percent is 0 and full_every is not yet due, the check step is skipped entirely (no index loaded).

Standalone vykar check always runs at 100% and does not update the daemon’s timer — manual checks don’t reset the schedule.

Limits

limits:
  connections: 4
  upload_mib_per_sec: 50

Field	Default	Values	Description
`connections`	`2`	integer, 1–16	Parallel backend operations; also controls upload/restore concurrency
`threads`	`0`	integer, 0–128	CPU worker threads. `0` = auto: local repos use ceil(cores/2) clamped to [2, 4]; remote repos use min(cores, 12). `1` = mostly sequential. Also available as `--threads` on the `backup` subcommand
`nice`	`0`	integer, -20–19	Unix process niceness. `0` = unchanged. Ignored on Windows
`upload_mib_per_sec`	`0`	integer (MiB/s)	Upload bandwidth cap. `0` = unlimited
`download_mib_per_sec`	`0`	integer (MiB/s)	Download bandwidth cap. `0` = unlimited

limits.connections also controls SFTP connection pool size, backup in-flight uploads, and restore reader concurrency. Internal pipeline knobs are now derived automatically from connections and threads.

Hooks

Shell commands that run at specific points in the vykar command lifecycle. Hooks can be defined at three levels: global (top-level hooks:), per-repository, and per-source.

Global / per-repository hooks support both bare prefixes and command-specific variants:

hooks:                               # Global hooks: run for backup/prune/check/compact
  before: "echo starting"
  after: "echo done"
  # before_backup: "echo backup starting"  # Command-specific hooks
  # failed: "notify-send 'vykar failed'"
  # finally: "cleanup.sh"

Per-source hooks only support bare prefixes (before, after, failed, finally) — command-specific variants like before_backup are not valid at the source level. Source hooks always run for backup since that is the only command that processes sources.

sources:
  - label: immich
    path: /raid1/immich/db-backups
    hooks:
      before: '/raid1/immich/backup_db.sh'  # Correct
      # before_backup: '...'               # NOT valid here — use 'before' instead

Hook types

Hook	Command-specific (global/repo only)	Runs when	Failure behavior
`before`	`before_<cmd>`	Before the command	Aborts the command
`after`	`after_<cmd>`	After success only	Logged, doesn’t affect result
`failed`	`failed_<cmd>`	After failure only	Logged, doesn’t affect result
`finally`	`finally_<cmd>`	Always, regardless of outcome	Logged, doesn’t affect result

Hooks only run for backup, prune, check, and compact. The bare form (before, after, etc.) fires for all four commands. The command-specific form (before_backup, failed_prune, etc.) fires only for that command and is only available at the global and per-repository levels — not in per-source hooks.

Execution order

before hooks run: global bare → repo bare → global specific → repo specific
The vykar command runs (skipped if a before hook fails)
On success: after hooks run (repo specific → global specific → repo bare → global bare) On failure: failed hooks run (same order)
finally hooks always run last (same order)

If a before hook fails, the command is skipped and both failed and finally hooks still run.

Each hook key maps to a shell command (string) or list of commands.

Variable substitution

Hook commands support {variable} placeholders that are replaced before execution. Values are automatically shell-escaped.

Variable	Description
`{command}`	The vykar command name (e.g. `backup`, `prune`)
`{repository}`	Repository URL
`{label}`	Repository label (empty if unset)
`{error}`	Error message (empty if no error)
`{source_label}`	Source label (empty if unset)
`{source_path}`	Source path list (Unix `:`, Windows `;`)

The same values are also exported as environment variables: VYKAR_COMMAND, VYKAR_REPOSITORY, VYKAR_LABEL, VYKAR_ERROR, VYKAR_SOURCE_LABEL, VYKAR_SOURCE_PATH.

{source_path} / VYKAR_SOURCE_PATH joins multiple paths with : on Unix and ; on Windows.

hooks:
  failed:
    - 'notify-send "vykar {command} failed: {error}"'
  after_backup:
    - 'echo "Backed up {source_label} to {repository}"'

See Recipes for practical hook examples: database dumps, filesystem snapshots, network-aware backups, and monitoring notifications.

Schedule

Configure the built-in daemon scheduler for automatic periodic backups. Used with vykar daemon.

schedule:
  enabled: true
  every: "6h"
  on_startup: true

Field	Default	Values	Description
`enabled`	`false`	bool	Enable scheduled backups
`every`	—	duration string (`s`/`m`/`h`/`d`)	Interval between runs. Falls back to `24h` when neither `every` nor `cron` is set. Mutually exclusive with `cron`
`cron`	—	5-field cron expression	Cron schedule. Mutually exclusive with `every`
`on_startup`	`false`	bool	Run backup immediately when daemon starts
`jitter_seconds`	`0`	integer	Random delay 0–N seconds added to each run
`passphrase_prompt_timeout_seconds`	`300`	integer (seconds)	Timeout for interactive passphrase prompts

Interval mode

The every field accepts m (minutes), h (hours), or d (days) suffixes; a plain integer is treated as days. If neither every nor cron is set, the default interval is 24h.

Cron mode

The cron field accepts a standard 5-field cron expression (minute hour dom month dow). Six-field (with seconds) and seven-field expressions are rejected.

schedule:
  enabled: true
  cron: "0 3 * * *"          # daily at 3:00 AM
  jitter_seconds: 60

Common cron examples:

"0 3 * * *" — daily at 3:00 AM
"30 2 * * 1-5" — weekdays at 2:30 AM
"0 */6 * * *" — every 6 hours on the hour
"0 0 * * 0" — weekly on Sunday at midnight

every and cron are mutually exclusive — setting both is a configuration error.

Jitter (jitter_seconds) applies in both modes. In cron mode, jitter is added after the computed cron tick. Keep jitter small relative to the cron cadence to avoid skipping slots.

When multiple repositories are configured, schedule values are merged: enabled and on_startup are OR’d across repos, jitter_seconds and passphrase_prompt_timeout_seconds take the maximum, and every uses the shortest interval.

Environment Variable Expansion

Config files support environment variable placeholders in values:

repositories:
  - url: "${VYKAR_REPO_URL:-/backup/repo}"
    # access_token: "${VYKAR_ACCESS_TOKEN}"

Supported syntax:

${VAR}: requires VAR to be set (hard error if missing)
${VAR:-default}: uses default when VAR is unset or empty

Notes:

Expansion runs on raw config text before YAML parsing.
Variable names must match [A-Za-z_][A-Za-z0-9_]*.
Malformed placeholders fail config loading.
No escape syntax is supported for literal ${...}.
${VAR} in YAML comments is also expanded (since expansion runs before YAML parsing).

Loading `.env` files

Use env_file to load variables from one or more files before expansion. This is useful for Docker-style .env files that store credentials:

env_file: .db.env
# or multiple files:
# env_file:
#   - .db.env
#   - .app.env

repositories:
  - url: /backup/repo

sources:
  - label: databases
    command_dumps:
      - name: db.sql
        command: "mysqldump -u '${DB_USER}' -p'${DB_PASSWORD}' '${DB_DATABASE}'"

Where .db.env contains:

DB_USER=myuser
DB_PASSWORD=s3cret
DB_DATABASE=myapp

Paths are resolved relative to the config file’s directory. The supported .env format is:

KEY=VALUE — plain assignment
export KEY=VALUE — export prefix is stripped
KEY="VALUE" or KEY='VALUE' — quotes are stripped
Blank lines and lines starting with # are skipped

Shell expansion in `command_dumps`

Commands in command_dumps and hooks run via sh -c, so the shell performs its own variable expansion. There are two ways to reference variables:

Syntax	Expanded by	On missing var
`${VAR}`	vykar (at config load)	Hard error
`$VAR`	shell (at runtime)	Empty string (silent)

When using env_file, prefer ${VAR} — vykar loads the file first, then expands the placeholder, giving you an immediate error if the variable is missing.

If you cannot use env_file, you can source the .env file directly in the command:

command_dumps:
  - name: db.sql
    command: ". /path/to/.db.env && mysqldump -u $DB_USER -p$DB_PASSWORD $DB_DATABASE"

This pattern is self-contained and works without any wrapper script, but missing variables will silently produce empty strings.

Command Reference

Below is a list of all available commands. Each command and subcommand provides its own --help output for command-specific options, and vykar --help shows global options.

Command	Description
`vykar`	Run full backup process: `backup`, `prune`, `compact`, `check`. This is useful for automation.
`vykar config`	Generate a starter configuration file
`vykar init`	Initialize a new backup repository
`vykar backup`	Back up files to a new snapshot
`vykar restore`	Restore files from a snapshot
`vykar list`	List snapshots
`vykar snapshot list`	Show files and directories inside a snapshot
`vykar snapshot info`	Show metadata for a snapshot
`vykar snapshot find`	Find matching files across snapshots and show change timeline (`added`, `modified`, `unchanged`)
`vykar snapshot delete`	Delete a specific snapshot
`vykar delete`	Delete an entire repository permanently
`vykar prune`	Prune snapshots according to retention policy
`vykar break-lock`	Remove stale repository locks left by interrupted processes when lock conflicts block operations
`vykar daemon`	Run scheduled backup cycles in the foreground. See Daemon Mode.
`vykar check`	Verify repository integrity (`--verify-data` for full content verification)
`vykar info`	Show repository statistics (snapshot counts and size totals)
`vykar compact`	Free space by repacking pack files after delete/prune
`vykar mount`	Browse snapshots via a local read-only WebDAV server and built-in browser UI

Exit codes

0: Success
1: Error (command failed)
3: Partial success (backup completed, but one or more files were skipped)

vykar backup and the default vykar workflow can return 3 when a backup succeeds with skipped unreadable/missing files.

Design Goals

Vykar synthesizes the best ideas from a decade of backup tool development into a single Rust binary. These are the principles behind its design.

One tool, not an assembly

Configuration, scheduling, monitoring, hooks, and health checks belong in the backup tool itself — not in a constellation of wrappers and scripts bolted on after the fact.

Config-first

Your entire backup strategy lives in a single YAML file that can be version-controlled, reviewed, and deployed across machines. A repository path and a list of sources is enough to get going.

repositories:
  - url: /backups/myrepo
sources:
  - path: /home/user/documents
  - path: /home/user/photos

Universal primitives over specific integrations

Vykar doesn’t have dedicated flags for specific databases or services. Instead, hooks and command dumps let you capture the output of any command — the same mechanism works for every database, container, or workflow.

sources:
  - label: databases
    path: /var/backups/db
    hooks:
      before: "pg_dump -Fc mydb > /var/backups/db/mydb.dump"
      after:  "rm -f /var/backups/db/mydb.dump"

Labels, not naming schemes

Snapshots get auto-generated IDs. Labels like personal or databases represent what you’re backing up and group snapshots for retention, filtering, and restore — without requiring unique names or opaque hashes.

vykar list -S databases --last 5
vykar restore --source personal latest

Encryption by default

Encryption is always on. Vykar auto-selects AES-256-GCM or ChaCha20-Poly1305 based on hardware support. Chunk IDs use keyed hashing to prevent content fingerprinting against the repository.

The repository is untrusted

All data is encrypted and authenticated before it leaves the client. The optional REST server enforces append-only access and quotas, so even a compromised client cannot delete historical backups.

Browse without dependencies

vykar mount starts a built-in WebDAV server and web interface. Browse and restore snapshots from any browser or file manager — on any platform, in containers, with zero external dependencies.

Performance through Rust

No GIL bottleneck, no garbage collection pauses, predictable memory usage. FastCDC chunking, parallel compression, and streaming uploads keep the pipeline saturated. Built-in resource limits for threads, backend connections, and upload/download bandwidth let Vykar run during business hours.

Discoverability in the CLI

Common operations are short top-level commands. Everything targeting a specific snapshot lives under vykar snapshot. Flags are consistent everywhere: -R is always a repository, -S is always a source label.

vykar backup
vykar list
vykar snapshot find -name "*.xlsx"
vykar snapshot diff a3f7c2 b8d4e1

No lock-in

The repository format is documented, the source is open under GPL-3.0 license, and the REST server is optional. The config is plain YAML with no proprietary syntax.

Architecture

Technical reference for vykar’s cryptographic, chunking, compression, concurrency, and repository-layout design decisions.

Cryptography

Encryption

AEAD with 12-byte random nonces (AES-256-GCM or ChaCha20-Poly1305).

Rationale:

Authenticated encryption with modern, audited constructions
auto mode benchmarks AES-256-GCM vs ChaCha20-Poly1305 at init and stores one concrete mode per repo
Strong performance across mixed CPU capabilities (AES acceleration and non-AES acceleration)
32-byte symmetric keys (simpler key management than split-key schemes)
AEAD AAD always includes the 1-byte type tag; for identity-bound objects it also includes a domain-separated object context (for example: index, snapshot ID, chunk ID, filecache, or snapshot_cache)

Key usage model: The master encryption_key is used directly as the AEAD symmetric key for all encryption operations throughout the lifetime of the repository. There is no per-session or per-snapshot key derivation. Cryptographic isolation between objects relies on random 12-byte nonces (unique per encryption call) and domain-separated AAD (binding ciphertext to object type and identity). With 96-bit random nonces, the birthday-bound collision threshold is approximately 2^48 encryptions under a single key — well beyond realistic backup workloads.

Plaintext Mode (`none`)

When encryption is set to none, vykar uses a PlaintextEngine — an identity transform where encrypt() and decrypt() return data unchanged. AAD is ignored (there is no AEAD construction to bind it to). The format layer detects plaintext mode via is_encrypting() == false and uses the shorter wire format: [1-byte type_tag][plaintext] (1-byte overhead instead of 29 bytes).

This mode does not provide authentication or tamper protection — it is designed for trusted storage where confidentiality is unnecessary. Data integrity against accidental corruption is still provided via keyed BLAKE2b-256 chunk IDs (see Hashing / Chunk IDs below).

Key Derivation

The master key (64 bytes: 32-byte encryption key + 32-byte chunk ID key) is generated from OS entropy (OsRng) at repository init. It is never derived from the passphrase. Instead, the passphrase is used to derive a Key Encryption Key (KEK) via Argon2id, and the KEK wraps the master key with AES-256-GCM. The encrypted master key blob is stored at keys/repokey alongside the KDF parameters (algorithm, memory/time/parallelism costs, salt) and the wrapping nonce. Changing the passphrase re-wraps the same master key without re-encrypting any repository data.

Rationale:

Two-layer scheme (random data key, passphrase-derived wrapping key) separates key strength from passphrase quality
Argon2id is a modern memory-hard KDF recommended by OWASP and IETF
Resists both GPU and ASIC brute-force attacks

In none mode no passphrase or key file is needed. The chunk_id_key is deterministically derived as BLAKE2b-256(repo_id). Since repo_id is stored unencrypted in the repo config, this key is not secret — it exists only so that the same keyed hashing path is used in all modes. No keys/repokey file is created.

Hashing / Chunk IDs

Keyed BLAKE2b-256 MAC using a chunk_id_key derived from the master key.

Rationale:

Prevents content confirmation attacks (an adversary cannot check whether known plaintext exists in the backup without the key)
BLAKE2b is faster than SHA-256 in pure software implementations (on CPUs with hardware SHA-256 acceleration — SHA-NI on x86, SHA extensions on ARM — hardware SHA-256 can be faster; BLAKE2b was chosen for consistent performance across all architectures without requiring hardware-specific instruction sets)
Trade-off: keyed IDs prevent dedup across different encryption keys (acceptable for vykar’s single-key-per-repo model)

In none mode the same keyed BLAKE2b-256 construction is used, but the key is derived from the public repo_id rather than a secret master key. The MAC therefore acts as a checksum for corruption detection, not as authentication against tampering. vykar check --verify-data recomputes chunk IDs and compares them to detect bit-rot or storage corruption — this works identically across all encryption modes.

Content Processing

Chunking

FastCDC (content-defined chunking) via the fastcdc v3 crate.

Default parameters: 512 KiB min, 2 MiB average, 8 MiB max (configurable in YAML). chunker.max_size is hard-capped at 16 MiB during config validation.

Rationale:

Newer algorithm, benchmarks faster than Rabin fingerprinting
Good deduplication ratio with configurable chunk boundaries

Compression

Per-chunk compression with a 1-byte tag prefix. Supported algorithms: LZ4, ZSTD, and None. The tag identifies the codec only, not the compression level — the ZSTD level is a repo-wide configuration setting. Recompression at a different level requires decompressing and recompressing every chunk.

Rationale:

Per-chunk tags allow mixing algorithms within a single repository
LZ4 for speed-sensitive workloads, ZSTD for better compression ratios. LZ4 is recommended over None for most workloads — even on incompressible data the overhead is negligible, and the reduced I/O and transfer size typically more than compensate
No repository-wide format version lock-in for compression choice
ZSTD compression reuses a thread-local compressor context per level, reducing allocation churn in parallel backup paths
Decompression enforces a hard output cap (32 MiB) to bound memory usage and mitigate decompression-bomb inputs

Deduplication

Content-addressed deduplication uses keyed ChunkId values (BLAKE2b-256 MAC). Identical plaintext produces the same ChunkId, so the second copy is not stored; only refcounts are incremented.

vykar supports three index modes for dedup lookups:

Full index mode — in-memory ChunkIndex (HashMap<ChunkId, ChunkIndexEntry>)
Dedup-only mode — lightweight DedupIndex (ChunkId -> stored_size) plus IndexDelta for mutations
Tiered dedup mode — TieredDedupIndex:
- session-local HashMap for new chunks in the current backup
- Xor filter (xorf::Xor8) as probabilistic negative check
- mmap-backed on-disk dedup cache for exact lookup

During backup, enable_tiered_dedup_mode() is used by default. If the mmap cache is missing/stale/corrupt, vykar safely falls back to dedup-only HashMap mode.

Two-level dedup check (in Repository::bump_ref_if_exists):

Persistent dedup tier — full index, dedup-only index, or tiered dedup index (depending on mode)
Pending pack writers — blobs buffered in data/tree PackWriters that have not yet been flushed

This prevents duplicates both across backups and within a single backup run.

Serialization

All persistent data structures use msgpack via rmp_serde. Structs serialize as positional arrays (not named-field maps) for compactness. This means field order matters — adding or removing fields requires careful versioning, and #[serde(skip_serializing_if)] must not be used on Item fields (it would break positional deserialization of existing data).

RepoObj Envelope

Every repo object and local encrypted cache blob uses the same RepoObj envelope (repo/format.rs). The wire format depends on the encryption mode:

Encrypted:  [1-byte type_tag][12-byte nonce][ciphertext + 16-byte AEAD tag]
Plaintext:  [1-byte type_tag][plaintext]

The type tag identifies the object kind via the ObjectType enum:

Tag	ObjectType	Used for
0	Config	Repository configuration (stored unencrypted)
1	Manifest	Legacy manifest object tag (unused in v2 repositories)
2	SnapshotMeta	Per-snapshot metadata
3	ChunkData	Compressed file/item-stream chunks
4	ChunkIndex	Encrypted `IndexBlob` stored at `index`
5	PackHeader	Reserved legacy tag (current pack files have no trailing header object)
6	FileCache	Local file-level cache (inode/mtime skip)
7	PendingIndex	Transient crash-recovery journal
8	SnapshotCache	Local snapshot-list cache

The type tag byte is always included in AAD (authenticated additional data). For identity-bound objects, AAD also includes a domain-separated object context, binding ciphertext to both object type and identity (for example, ChunkData to its ChunkId, SnapshotMeta to snapshot ID, ChunkIndex to b"index", FileCache to b"filecache", and SnapshotCache to b"snapshot_cache").

Repository Format

On-Disk Layout

RepoConfig.version = 2 describes the current repository layout.

<repo>/
|- config                    # Repository metadata (unencrypted msgpack)
|- keys/repokey              # Encrypted master key (Argon2id-wrapped; absent in `none` mode)
|- index                     # Encrypted IndexBlob { generation, chunks }
|- index.gen                 # Unencrypted advisory u64 generation hint
|- snapshots/<id>            # Encrypted snapshot metadata; source of truth for snapshot listing
|- sessions/<id>.json        # Session presence markers (concurrent backups)
|- sessions/<id>.index       # Per-session crash-recovery journals (absent after clean backup)
|- packs/<xx>/<pack-id>      # Pack files containing compressed+encrypted chunks (256 shard dirs)
`- locks/                    # Advisory lock files

Local Optimization Caches (Client Machine)

These files live under a per-repo local cache root. By default this is the platform cache directory + vykar (for example, ~/.cache/vykar/<repo_id_hex>/... on Linux, ~/Library/Caches/vykar/<repo_id_hex>/... on macOS). If cache_dir is set in config, that path becomes the cache root. These are optimization artifacts, not repository source of truth.

<cache>/<repo_id_hex>/
|- filecache                 # File metadata -> cached ChunkRefs
|- snapshot_list             # Snapshot ID -> SnapshotEntry cache
|- dedup_cache               # Sorted ChunkId -> stored_size (mmap + xor filter)
|- restore_cache             # Sorted ChunkId -> pack_id, pack_offset, stored_size (mmap)
`- full_index_cache          # Sorted full index rows for local rehydration/cache rebuilds

The index caches are validated against the current index generation. The authenticated source of truth is IndexBlob.generation inside index; index.gen is only an advisory hint used to avoid unnecessary remote index downloads on read paths. A stale or missing sidecar causes cache misses or full-index fallback, not correctness issues.

The snapshot_list cache is separate: on open/refresh, the client lists snapshots/, removes stale local entries, loads only new snapshot blobs, and persists the resulting snapshot list locally. This avoids O(n) snapshot metadata GETs on every open.

The same per-repo cache root is also used as the preferred temp location for intermediate files (e.g. cache rebuilds).

Repository And Cache Topology

flowchart LR
    subgraph Repo["Repository (authoritative)"]
        direction TB
        config["config"]
        repokey["keys/repokey"]
        index["index"]
        indexgen["index.gen"]
        snapshots["snapshots/‹id›"]
        packs["packs/‹xx›/‹id›"]
        sessions["sessions/‹id›.json"]
        journal["sessions/‹id›.index"]
        locks["locks/*.json"]
    end

    subgraph Cache["Local cache (best-effort)"]
        direction TB
        filecache["filecache"]
        snapshotlist["snapshot_list"]
        dedupcache["dedup_cache"]
        restorecache["restore_cache"]
        fullindex["full_index_cache"]
    end

    index --> dedupcache
    index --> restorecache
    index --> fullindex
    snapshots --> snapshotlist
    filecache -. reuse .-> index
    indexgen -. hint .-> index

Key Data Structures

IndexBlob — the encrypted object stored at the index key. It combines the current cache-validity token with the chunk index.

Field	Type	Description
generation	u64	Authenticated cache-validity token rotated when the index changes
chunks	ChunkIndex	Full chunk-to-pack mapping

ChunkIndex — HashMap<ChunkId, ChunkIndexEntry>, persisted inside IndexBlob. The central lookup table for deduplication, restore, and compaction.

Field	Type	Description
refcount	u32	Number of snapshots referencing this chunk
stored_size	u32	Size in bytes as stored (compressed + encrypted)
pack_id	PackId	Which pack file contains this chunk
pack_offset	u64	Byte offset within the pack file

Manifest — runtime-only in-memory snapshot list derived from snapshots/ and the local snapshot_list cache. It is not persisted to repository storage.

Field	Type	Description
version	u32	Format version (currently 1)
timestamp	DateTime	Last modification time
snapshots	Vec<SnapshotEntry>	One entry per snapshot

SnapshotListCache — local encrypted map from snapshot ID hex to SnapshotEntry. It is refreshed incrementally from snapshots/ and exists only to avoid repeatedly downloading every snapshot blob on open.

Each SnapshotEntry contains: name, id (32-byte random), time, source_label, label, source_paths, hostname.

SnapshotMeta — per-snapshot metadata stored at snapshots/<id>.

Field	Type	Description
name	String	User-provided snapshot name
hostname	String	Machine that created the backup
username	String	User that ran the backup
time / time_end	DateTime	Backup start and end timestamps
chunker_params	ChunkerConfig	CDC parameters used for this snapshot
comment	String	Optional snapshot comment field; currently written as `""` by backup flows
item_ptrs	Vec<ChunkId>	Chunk IDs containing the serialized item stream
stats	SnapshotStats	File count, original/compressed/deduplicated sizes
source_label	String	Config label for the source
source_paths	Vec<String>	Directories that were backed up
label	String	Legacy compatibility field; new snapshots currently write `""`

SnapshotStats — per-snapshot counters stored inside SnapshotMeta.stats.

Field	Type	Description
nfiles	u64	Number of backed-up regular files plus command-dump virtual files
original_size	u64	Total plaintext bytes before compression/dedup
compressed_size	u64	Total bytes after compression
deduplicated_size	u64	Bytes newly stored after deduplication
errors	u64	Number of soft file-read errors skipped during backup

deduplicated_size records the bytes newly stored at the time the snapshot was created. It depends on the global repository state at that moment and becomes stale if other snapshots are later deleted — a snapshot that originally shared all its chunks (showing deduplicated_size ≈ 0) may become the sole owner of those chunks after the other snapshot is removed. Treat this field as a creation-time accounting metric, not a durable measure of a snapshot’s unique storage footprint.

Item — a single filesystem entry within a snapshot’s item stream.

Field	Type	Description
path	String	Relative path within the backup
entry_type	ItemType	`RegularFile`, `Directory`, or `Symlink`
mode	u32	Unix permission bits
uid / gid	u32	Owner and group IDs
user / group	Option<String>	Owner and group names
mtime	i64	Modification time (nanoseconds since epoch)
atime / ctime	Option<i64>	Access and change times
size	u64	Original file size
chunks	Vec<ChunkRef>	Content chunks (regular files only)
link_target	Option<String>	Symlink target
xattrs	Option<HashMap>	Extended attributes

ChunkRef — reference to a stored chunk, used in Item.chunks:

Field	Type	Description
id	ChunkId	Content-addressed chunk identifier
size	u32	Uncompressed (original) size
csize	u32	Stored size (compressed + encrypted)

csize is stored per-reference so the restore path can pass it as a size hint to the ZSTD bulk decompressor, avoiding the overhead of a streaming decoder. Without it, each chunk decompression would either need an index lookup or fall back to the slower streaming path.

Pack Files

Chunks are grouped into pack files (~32 MiB) instead of being stored as individual files. This reduces file count by 1000x+, critical for cloud storage costs (fewer PUT/GET ops) and filesystem performance (fewer inodes).

Pack File Format

[8B magic "VGERPACK"][1B version=1]
[4B blob_0_len LE][blob_0_data]
[4B blob_1_len LE][blob_1_data]
...
[4B blob_N_len LE][blob_N_data]

Per-blob length prefix (4 bytes): enables forward scanning of all blobs from byte 9 to EOF
Each blob is a complete RepoObj envelope: [1B type_tag][12B nonce][ciphertext+16B AEAD tag]
Each blob is independently encrypted (can read one chunk without decrypting the whole pack)
No trailing per-pack header object — the chunk index already records which blobs reside in which pack at which offset, making a per-pack blob manifest redundant. Pack analysis for compaction enumerates blobs by forward-scanning length prefixes. Trade-off: if the index is lost, rebuilding requires a full sequential scan of all pack data (reading every byte); a trailing header would allow reading just the last N bytes per pack. In practice index loss is rare (single encrypted blob, written atomically) and check --verify-data already performs a full pack scan
Pack ID = unkeyed BLAKE2b-256 of entire pack contents, stored at packs/<shard>/<hex_pack_id>

Data Packs vs Tree Packs

Two separate PackWriter instances:

Data packs — file content chunks. Dynamic target size. Assembled in heap Vec<u8> buffers.
Tree packs — item-stream metadata. Fixed at min(min_pack_size, 4 MiB) and assembled in heap Vec<u8> buffers.

Dynamic Pack Sizing

Pack sizes grow with repository size. Config exposes floor and ceiling:

repositories:
  - url: /backups/repo
    min_pack_size: 33554432     # 32 MiB (floor, default)
    max_pack_size: 201326592    # 192 MiB (default)

Data pack sizing formula:

target = clamp(min_pack_size * sqrt(num_data_packs / 50), min_pack_size, max_pack_size)

max_pack_size has a hard ceiling of 512 MiB. Values above that are rejected at repository init/open.

Data packs in repo	Target pack size
< 50	32 MiB (floor)
200	64 MiB
800	128 MiB
1,800+	192 MiB (default cap)

If you raise max_pack_size, target size can grow further, up to the 512 MiB hard ceiling.

num_data_packs is computed at open() by counting distinct pack_id values in the ChunkIndex (zero extra I/O). During a backup session, the target is recalculated after each data-pack flush, so the first large backup benefits from scaling immediately.

Data Flow

Backup Pipeline

The backup runs in two phases so multiple clients can upload concurrently (see Concurrent Multi-Client Backups).

Phase 1: Upload (no exclusive lock)

flowchart LR
    register["Register session"] --> recover["Recover journal"]
    recover --> upload["Upload packs"]
    upload --> journal["Refresh journal"]
    journal --> stage["Stage SnapshotMeta"]

generate session_id (128-bit random hex)
register_session() → write sessions/<session_id>.json, probe for active lock
open repo (full index loaded once)
begin_write_session(session_id) → journal key = sessions/<session_id>.index
  → prune stale local file-cache entries
  → recover own sessions/<session_id>.index if present (batch-verify packs, promote into dedup structures)
  → enable tiered dedup mode (mmap cache + xor filter, fallback to dedup HashMap)
  → derive upload/pipeline limits from `limits.connections` + `limits.threads`
  → execute `command_dumps` first:
    → stream each command's stdout directly into chunk storage
    → add virtual items under `vykar-dumps/` to the item stream
    → abort backup on non-zero exit or timeout
  → walk sources with excludes + one_file_system + exclude_if_present
    → cache-hit path: reuse cached ChunkRefs and bump refs
    → cache-miss path:
      → pipeline path (if effective worker threads > 1):
        → walk emits regular files and segmented large files
          (segmentation applies when file_size > 64 MiB;
           segment size is min(64 MiB, pipeline_buffer_bytes))
        → worker threads read/chunk/hash and classify each chunk:
          - xor prefilter says "maybe present" → hash-only chunk
          - xor prefilter miss (or no filter) → compress + encrypt prepacked chunk
        → sequential consumer validates segment order, performs dedup checks
          (persistent dedup tier + pending pack writers), commits new chunks,
          and handles xor false positives via inline transform
        → ByteBudget enforces pipeline_buffer_bytes as a hard in-flight memory cap
          (64 MiB × effective threads, clamped to 64 MiB..1 GiB)
      → sequential fallback path (effective worker threads == 1)
  → serialize items incrementally into item-stream chunks (tree packs)
  → pack SnapshotMeta in memory (do not write snapshots/<id> yet)

Phase 2: Commit (exclusive lock, brief)

flowchart LR
    lock["Acquire lock"] --> refresh["Refresh snapshots"]
    refresh --> reconcile["Reconcile delta"]
    reconcile --> persist["Persist index"]
    persist --> commit["Write snapshot<br/>commit point"]
    commit --> cleanup["Cleanup + unlock"]

acquire_lock_with_retry(10 attempts, 500ms base, exponential backoff + jitter)
commit_concurrent_session():
  → flush packs/pending uploads (pack flush triggers: target size, 10,000 blobs, or 300s age)
  → refresh snapshot list from snapshots/ (via local snapshot cache diff)
  → check snapshot name uniqueness against fresh list
  → if delta is non-empty:
      → reload full index from storage
      → delta.reconcile(fresh_index): new_entries already present → refcount bumps;
        missing bump targets → Err(StaleChunksDuringCommit)
      → verify_delta_packs on reconciled delta
      → apply reconciled delta to fresh index
      → persist IndexBlob + advisory index.gen
  → if delta is empty but local dedup caches need rebuilding:
      → reload full index from storage for cache rebuild
  → write snapshots/<id> (commit point)
  → rebuild local dedup/restore/full-index caches as needed
  → update in-memory manifest
  → persist local file cache
deregister_session() → delete sessions/<session_id>.json (while holding lock)
release_lock()
clear sessions/<session_id>.index

Error Paths

  → on VykarError::Interrupted (Ctrl-C):
    → flush_on_abort(): seal partial packs, join upload threads, write final sessions/<id>.index
    → deregister_session(), release advisory lock, exit code 130
  → on soft file error (PermissionDenied / NotFound before commit):
    → skip file, increment snapshot.stats.errors, continue
    → exit code 3 (partial success) if any files were skipped

Snapshot refresh uses two modes:

open() uses resilient refresh: listed-but-missing snapshots and GET failures are warned and skipped
commit-time refresh uses strict I/O: listed-but-missing snapshots and GET failures abort the commit so a transient error cannot hide an existing snapshot name

Decrypt and deserialize failures are warned and skipped in both modes. Snapshot names are only available after successful decrypt + deserialize, so the implementation chooses availability over letting one garbage blob brick all future opens or commits in append-only mode.

Restore Pipeline

flowchart LR
    open["Open repo<br/>no index"] --> resolve["Resolve snapshot"]
    resolve --> cache{"Restore cache<br/>valid?"}
    cache -- yes --> items1["Load items<br/>via cache"]
    cache -- no --> items2["Load full index<br/>+ items"]
    items1 --> decode["Stream-decode<br/>two passes"]
    items2 --> decode
    decode --> plan["Plan coalesced<br/>read groups"]
    plan --> read["Parallel reads<br/>decrypt + write"]
    read --> meta["Restore metadata"]

open repository without index (`open_without_index`)
  → resolve snapshot
  → try mmap restore cache (validated by index_generation)
  → load item stream:
    → preferred: lookup tree-pack chunk locations via restore cache
    → fallback: load full index and read item stream normally
  → stream-decode items in two passes:
    → pass 1 create directories
    → pass 2 create symlinks and plan file chunk writes
  → build coalesced pack read groups via the full index
  → parallel coalesced range reads by pack/offset
    (merge when gap <= 256 KiB and merged range <= 16 MiB)
    → `limits.connections` reader workers fetch groups, decrypt + decompress-with-size-hint chunks
    → validate plaintext size and write to all targets (max 16 open files per worker)
  → restore file metadata (mode, mtime, optional xattrs)

Item Stream

Snapshot metadata (the list of files, directories, and symlinks) is not stored as a single monolithic blob. Instead:

Items are serialized one-by-one as msgpack and appended to an in-memory buffer
When the buffer reaches ~128 KiB, it is chunked and stored as a tree pack chunk (with a finer CDC config: 32 KiB min / 128 KiB avg / 512 KiB max)
The resulting ChunkId values are collected into item_ptrs in the SnapshotMeta

This design means the item stream benefits from deduplication — if most files are unchanged between backups, the item-stream chunks are mostly identical and deduplicated away.

Command dumps participate in this same item stream. A source with command_dumps produces a synthetic vykar-dumps/ directory entry plus one regular-file Item per dump, so restores treat dump output like ordinary files.

Restore now also consumes item streams incrementally (streaming deserialization) instead of materializing full Vec<Item> state up front. When the mmap restore cache is valid, item-stream chunk lookups can avoid loading the full chunk index. File-data read-group planning still uses the full index after planning, avoiding unrecoverable stale-location failures.

Operations

Locking

vykar uses a two-tier locking model to allow concurrent backup uploads while serializing commits and maintenance.

Session Markers (shared, non-exclusive)

During the upload phase of a backup, a lightweight JSON marker is written to sessions/<session_id>.json. Multiple backup clients can coexist in this tier simultaneously — session markers do not block each other.

Each marker contains: hostname, PID, registered_at, and last_refresh. On registration, the client probes for an active advisory lock (3 retries, 2 s base delay, exponential backoff + 25 % jitter). If the lock is held (maintenance in progress), the session marker is deleted and the backup aborts with Locked.

Each client owns a dedicated heartbeat thread (SessionGuard with a 15-minute timer) that refreshes the marker independently of the upload pipeline — so a host that happens to have very long-running uploads still keeps its marker fresh. Markers whose last_refresh is strictly older than 45 minutes (three missed heartbeats) are treated as stale.

Advisory Lock (exclusive)

Lock files at locks/<timestamp>-<uuid>.json
Each lock contains: hostname, PID, and acquisition timestamp
Oldest-key-wins: after writing its lock, a client lists all locks — if its key isn’t lexicographically first, it deletes its own lock and returns an error
Stale cleanup: locks older than 6 hours are automatically removed before each acquisition attempt
Recovery: vykar break-lock forcibly removes stale lock objects when interrupted processes leave lock conflicts

The advisory lock is used for:

Backup commit phase: acquired with acquire_lock_with_retry (10 attempts, 500 ms base delay, exponential backoff + 25 % jitter). Held only for the brief commit — typically seconds.
Maintenance commands (delete, prune, compact): acquired via with_maintenance_lock(), which additionally cleans stale session markers (>45 min since last_refresh) while preserving their companion .index journals so the next backup can recover uploaded-but-uncommitted chunks. Orphaned .index files from a prior cleanup run (marker already gone) are removed. Then maintenance checks for remaining active sessions — if any non-stale sessions exist (including malformed markers that cannot be proven stale), the lock is released and VykarError::ActiveSessions is returned, preventing compaction from deleting packs that upload-phase backups depend on.

Command Summary

Command	Upload phase	Commit/mutate phase
`backup`	Session marker only (shared)	Advisory lock (exclusive, brief)
`delete`, `prune`, `compact`	—	Maintenance lock (exclusive + session check)
`list`, `restore`, `check`, `info`	—	No lock (read-only)

When using vykar-server, the same lock and session objects are stored through the REST backend under locks/* and sessions/*; there is no separate lock-specific server API.

Signal Handling

Two-stage signal handling applies to all commands:

First SIGINT/SIGTERM sets a global shutdown flag; iterative loops (backup, prune, compact) check it and return VykarError::Interrupted
Second signal restores the default handler (immediate kill)
SIGHUP (daemon only): sets a reload flag; the daemon re-reads the config file between backup cycles. Invalid config is logged and ignored — the daemon continues with the previous config.
SIGUSR1 (daemon only): sets a trigger flag; the daemon runs an immediate backup cycle between scheduled runs. The existing schedule is preserved unless the ad-hoc cycle overruns the scheduled slot.
On backup abort: flush_on_abort() seals partial packs, joins upload threads, writes final sessions/<id>.index journal for recovery
Advisory lock is released before exit; CLI exits with code 130

Refcount Lifecycle

Chunk refcounts track how many snapshots reference each chunk, driving the dedup → delete → compact lifecycle:

flowchart TD
    backup["Backup<br/>new chunk or dedup hit"] --> refs["ChunkIndex refcount updated"]
    refs --> delete["Delete / prune<br/>remove snapshot first"]
    delete --> zero["Refcount reaches 0<br/>index entry removed"]
    delete -. crash here .-> inflated["Inflated refcounts<br/>safe, space not reclaimed yet"]
    zero --> orphan["Dead bytes remain in pack files"]
    orphan --> compact["Compact rewrites or deletes packs"]
    compact --> reclaimed["Space reclaimed"]

Backup — store_chunk() adds a new entry with refcount=1, or increments an existing entry’s refcount on dedup hit
Delete / Prune — delete snapshots/<id> first, then decrement chunk refs in the index and save it
Crash window — if the process dies after snapshot deletion but before index save, refcounts stay inflated; this is safe and only keeps chunks live longer than necessary
Orphaned blobs — after delete/prune commits, the encrypted blob data remains in pack files (the index no longer points to it, but the bytes are still on disk)
Compact — rewrites packs to reclaim space from orphaned blobs

This design means delete is fast (just index updates), while space reclamation is deferred to compact.

Crash Recovery

If a backup is interrupted after packs have been flushed but before commit, those packs would be orphaned. The pending index journal prevents re-uploading their data on the next run:

During backup, every 8 data-pack flushes, vykar writes a sessions/<session_id>.index blob to storage containing pack→chunk mappings for all flushed packs in this session
On the next backup with the same session ID, if the journal exists, packs are batch-verified by listing shard directories (avoiding per-pack HEAD requests on REST/S3 backends)
Verified chunks are promoted into the dedup structures so subsequent dedup checks find them
After a successful commit, the sessions/<session_id>.index blob is deleted
flush_on_abort() writes a final journal before exiting, maximizing recovery coverage

If a backup process crashes or is killed without clean shutdown, its session marker (sessions/<id>.json) remains on storage. Maintenance commands (compact, delete, prune) will see it via list_session_entries() and refuse to run until the marker ages out. cleanup_stale_sessions() removes markers whose last_refresh is strictly older than 45 minutes (three missed 15-minute heartbeats). Companion .index journals are preserved across this cleanup so the next backup can recover any uploaded-but-uncommitted chunks; only .index files whose .json marker was already absent (orphaned from a prior run) are deleted. Malformed .json markers (unparseable JSON or bad timestamps) are left in place and reported as blocking sessions — they require operator intervention (break-lock --sessions).

Concurrent Multi-Client Backups

Multiple machines or scheduled jobs can back up to the same repository concurrently. The expensive work (walking files, compressing, encrypting, uploading packs) runs in parallel across all clients without coordination. Only the brief index+snapshot commit requires mutual exclusion.

Session Lifecycle

Each backup client registers a session marker at sessions/<session_id>.json before opening the repository. After the repository is open, a dedicated SessionGuard heartbeat thread refreshes the marker every 15 minutes, independent of the upload pipeline. At commit time, the client acquires the exclusive advisory lock, commits its changes, deregisters the session (while still holding the lock — the guard’s Drop stops and joins the heartbeat thread before deleting the marker, so no refresh can race with the delete), then releases the lock.

Each session’s crash-recovery journal is co-located at sessions/<session_id>.index, keeping all per-session state in a single directory.

Why Sessions Block Maintenance but Not Each Other

Two concurrent backups do not block each other during upload — each operates on a private IndexDelta and private sessions/<id>.index journal. Maintenance commands (compact, delete, prune) must block on active sessions because compaction can delete packs that upload-phase clients are still referencing. with_maintenance_lock() acquires the advisory lock, cleans stale sessions, then fails with ActiveSessions if any remain.

IndexDelta Reconciliation

Each backup session accumulates index mutations in an IndexDelta: new_entries (newly uploaded chunks) and refcount_bumps (dedup hits on existing chunks). At commit time, the delta is reconciled against the current on-storage index:

If the delta is non-empty, the full index is reloaded from storage and the delta is reconciled against it:
- new_entries for chunks already present in the fresh index (another client uploaded the same chunk) are converted to refcount_bumps
- refcount_bumps referencing chunks no longer in the index (deleted by a concurrent maintenance operation) cause StaleChunksDuringCommit — the backup must be retried
Pack verification (verify_delta_packs) runs after reconciliation to avoid false negatives when chunks were absorbed as refcount bumps.
If the delta is empty, no remote index write is needed. The client only reloads the full index when local dedup caches need rebuilding.

Index Then Snapshot Commit Point

The index is always written before snapshots/<id>. A crash between these two writes leaves orphan entries in the index (no snapshot references them) — harmless, cleaned up by the next compact. Once snapshots/<id> is written, the backup is committed. Delete/prune intentionally invert this ordering: snapshot object first, then index save, so crashes leave inflated refcounts instead of visible snapshots whose chunks were already removed from the index.

Compact

After delete or prune, chunk refcounts are decremented and entries with refcount 0 are removed from the ChunkIndex — but the encrypted blob data remains in pack files. The compact command rewrites packs to reclaim this wasted space.

Algorithm

flowchart TB
    subgraph Phase1["Phase 1: Analysis (read-only)"]
        direction LR
        enum["Enumerate packs"] --> size["Query pack sizes"]
        size --> live["Compute live/dead bytes"]
        live --> filter["Filter by threshold"]
    end

    subgraph Phase2["Phase 2: Repack"]
        direction LR
        repack["Read live blobs"] --> write["Write new pack"]
        write --> save["Save index"]
        save --> delete["Delete old pack"]
    end

    Phase1 --> Phase2

Phase 1 — Analysis (read-only, no pack downloads):

Enumerate all pack files across 256 shard dirs (packs/00/ through packs/ff/)
Query each pack’s size via metadata-only calls (HEAD/stat), parallelized from limits.connections (remote: min(connections*3, 24), local: min(connections, 8))
Compute live bytes per pack from the ChunkIndex: live_bytes = Σ(4 + stored_size) for each indexed blob in that pack
Derive dead_bytes = (pack_size - PACK_HEADER_SIZE) - live_bytes; packs where live_bytes > pack_payload are marked corrupt
Compute unused_ratio = dead_bytes / pack_size per pack
Track pack health counters (packs_corrupt, packs_orphan) in addition to live/dead bytes
Filter packs where unused_ratio >= threshold

Phase 2 — Repack: For each candidate pack (most wasteful first, respecting --max-repack-size cap):

If backend supports server_repack, send a repack plan and apply returned pack remaps
Otherwise run client-side repack:
- If all blobs are dead → delete the pack file directly
- Else validate pack header (magic + version) via get_range(0..9) and cross-check each on-disk blob length prefix against the index’s stored_size
- Read live blobs as encrypted passthrough (no decrypt/re-encrypt cycle), write a new pack, update index mappings
Persist index updates before old pack deletion (save_state())
Delete old pack(s)

Crash Safety

The crash-safety invariant is visible in the Phase 2 ordering above: the index never points to a deleted pack. Sequence: write new pack → save index → delete old pack. A crash between steps leaves an orphan old pack (harmless, cleaned up on next compact).

CLI

vykar compact [--threshold N] [--max-repack-size 2G] [-n/--dry-run]

Parallel Pipeline

Backup uses a bounded pipeline:

flowchart LR
    walk["Walk<br/>(sequential)"] --> workers["Workers ×N<br/>read / chunk / hash"]
    workers --> consumer["Consumer<br/>(sequential)<br/>dedup + commit"]
    consumer --> uploads["Uploads<br/>bounded concurrency"]
    budget["ByteBudget"] -. caps in-flight bytes .-> workers
    budget -. caps in-flight bytes .-> consumer

Sequential walk stage emits file work
Parallel workers in a crossbeam-channel pipeline read/chunk/hash files and classify chunks (hash-only vs prepacked)
A ByteBudget enforces a hard cap on in-flight pipeline bytes (derived from limits.threads)
Consumer stage commits chunks and updates dedup/index state sequentially (including segment-order validation for large files)
Pack uploads run in background with bounded in-flight upload concurrency

Large files are split into fixed-size 64 MiB segments and processed through the same worker pool. Segmentation applies only when file_size > 64 MiB, and the effective segment size is clamped to the derived pipeline byte budget.

Configuration:

limits:
  threads: 4                       # backup transform workers (0 = auto: local ceil(cores/2)∈[2,4], remote min(cores,12))
  connections: 2                   # backend/upload/restore concurrency (1-16)
  nice: 10                         # Unix nice value
  upload_mib_per_sec: 100          # upload bandwidth cap (MiB/s, 0 = unlimited)
  download_mib_per_sec: 0          # download bandwidth cap (MiB/s, 0 = unlimited)

Internal backup pipeline knobs are derived automatically:

threads_effective = threads == 0 ? (local ? ceil(cores/2)∈[2,4] : min(cores, 12)) : threads
pipeline_depth = max(connections, 2)
pipeline_buffer_bytes = clamp(threads_effective * 64 MiB, 64 MiB..1 GiB)
segment_size = 64 MiB, transform_batch = 32 MiB, max_pending_actions = 8192

Testing

A single bug in serialization, encryption, or refcount tracking can silently destroy data. vykar’s testing strategy uses layered tiers so that each tier catches a different class of defect — from logic errors in individual functions through emergent failures in multi-step workflows across storage backends.

Unit Tests

~190 tests across 21 modules in vykar-core, covering each subsystem in isolation. Fast feedback (seconds), deterministic, no I/O side effects beyond tempdirs.

Category	Focus
Format & serialization	RepoObj envelope round-trips, pack header parsing, item serde
Chunk index	Add/remove/refcount/generation, dedup-only and tiered modes
Locking	Advisory lock acquire/release, stale cleanup, fence detection
Prune & retention	Policy evaluation (keep-last, keep-daily/weekly/monthly/yearly)
Repair	Plan-only vs apply modes, post-repair cleanliness assertions
Compact	Pack analysis, repack candidate selection, crash-safety ordering
Snapshot lifecycle	Manifest operations, delete ordering, multi-source configs

Property Tests

7 proptest blocks, each running 1000 random cases. These catch edge cases that hand-written examples miss — off-by-one in chunk boundaries, subtle serde field-order regressions, or nonce/context-binding failures in AEAD. Regressions are reproducible via proptest’s seed persistence.

Property	Invariant verified
Encryption round-trip	`decrypt(encrypt(P, ctx), ctx) == P` for both AES-256-GCM and ChaCha20-Poly1305; wrong-context decryption fails
Item serde round-trip	Arbitrary files, directories, and symlinks survive msgpack positional encode/decode
ChunkIndex serde round-trip	Varying refcounts, pack offsets, and generation numbers survive encode/decode
Chunker completeness & determinism	No gaps or overlaps; same input always produces same boundaries; size bounds respected; stream and slice APIs agree
Backup-restore round-trip	Arbitrary nested file trees (empty, small, large files; nested directories) restore byte-identical
Compression round-trip	`decompress(compress(codec, data)) == data` for all codecs (None, LZ4, ZSTD); output within size bound
IndexDelta state-machine	Refcount conservation after apply and after reconcile-then-apply with concurrent overlaps

Matrix Tests

9 corruption types tested against detection, repair, and resilience paths using test-case parametrization. Each corruption is applied to a known-good repository, then check, repair, and follow-up backup are run with assertions on the outcome.

Corruption	`check` detects	`repair` fixes	Backup succeeds after
BitFlipInPack	yes (Ok + errors)	yes	yes
BitFlipInBlob	yes (Ok + errors)	yes	yes
TruncatePack	yes (Ok + errors)	yes	yes
ZeroFillRegion	yes (Ok + errors)	yes	yes
DeletePack	yes (Ok + errors)	yes	yes
CorruptSnapshot	yes (Ok + errors)	yes	yes
DeleteIndex	yes (Ok + errors)	—	—
TruncateIndex	yes (Err)	not possible (Err)	—
CorruptConfig	yes (Err)	not possible (Err)	—

Fuzz Tests

7 coverage-guided fuzz targets via cargo-fuzz (libFuzzer). Each target feeds adversarial byte sequences into a parser, deserializer, or decrypt path, mutating from committed corpus seeds toward crashes, hangs, and OOM. Complements proptest by running for hours/days and optimizing for code-path coverage rather than round-trip invariants.

Target	Function under test	Risk surface
`fuzz_pack_scan`	`scan_pack_blobs_bytes`	Integer overflow in length fields, truncated frames
`fuzz_decompress`	`decompress` + `decompress_metadata`	Decompression bombs, corrupt LZ4/Zstd frames
`fuzz_msgpack_snapshot_meta`	`from_slice::<SnapshotMeta>`	Large collection size declarations
`fuzz_msgpack_index_blob`	`from_slice::<IndexBlob>`	Massive chunk index allocation
`fuzz_item_stream`	`for_each_decoded_item`	Streaming framing via `Deserializer::position()`, EOF handling
`fuzz_file_cache_decode`	`FileCache::decode_from_plaintext`	Manual msgpack marker parsing, allocation cap, legacy fallback
`fuzz_unpack_object`	`unpack_object` + `unpack_object_expect_with_context`	AEAD envelope parse, nonce extraction, context/AAD wiring, tag authentication

Corpus seeds are committed and deterministic. CI runs each target for 300 seconds weekly on nightly (make fuzz-check replays the corpus without new fuzzing for fast regression checks).

Integration Tests

End-to-end tests at two levels:

In-process (vykar-core/tests/, ~2600 lines): init → backup → list → restore → delete → prune → compact → check cycles exercising the commands API directly. Covers encryption modes, multi-source configs, lifecycle transitions, concurrent session logic, and crash-recovery journal round-trips.
CLI-level (vykar-cli/tests/, ~1300 lines): spawn the vykar binary and assert on exit codes, stdout/stderr, and restored file content. Covers config parsing, multi-repo selection, and end-to-end command syntax.
Memory regression: backup and restore of a controlled corpus with RSS sampling; asserts peak RSS stays below fixed caps (512 MiB backup, 384 MiB restore) to catch memory regressions in the pipeline.

Scenario & Stress Tests

YAML-driven scenario runner (scripts/testbench) executes multi-phase workflows against all four storage backends (local, REST, S3/MinIO, SFTP).

Scenarios: configurable corpus (mixed file types, sizes up to 2 GB), phases including init → backup → verify (restore + diff) → check → churn → prune → compact → cleanup. Churn simulation applies configurable adds, deletes, and modifications with growth caps to test incremental backup and dedup correctness over time.
Stress mode: up to 1000 iterations of backup → list → restore → verify → delete → compact → prune with periodic check and optional check --verify-data. Catches state-accumulation bugs (leaking refcounts, index bloat, stale cache entries) that only manifest after many cycles.
Multi-backend coverage: ensures storage-abstraction bugs do not hide behind the local filesystem.

Roadmap

Planned

Feature	Description	Priority
GUI Config Editing	Structured editing of the config in the GUI, currently only via YAML	High
Linux GUI packaging	Native `.deb`/`.rpm` packages and a repository for streamlined installation	High
Windows GUI packaging	MSI installer and/or winget package for first-class Windows support	High
Snapshot filtering	By host, tag, path, date ranges	Medium
Async I/O	Non-blocking storage operations	Medium
JSON output mode	Structured JSON output for all CLI commands to enable scripting and integration with monitoring tools	Medium
Per-token permissions	Expand permissions from full/append-only to also limit reading and maintenance	Medium
Hardlink & special file support	Extend `ItemType` with `Hardlink`, `BlockDevice`, `CharDevice`, `Fifo`, `Socket`; inode tracking during walk; `link()`/`mknod` during restore	Medium
macOS dataless file modes	Per-source `dataless: skip\|hydrate\|hydrate-evict` to control whether cloud-only files (iCloud Drive, Dropbox, OneDrive, etc.) are skipped, hydrated for backup, or hydrated then evicted via `NSFileProviderManager`. v1 ships skip-with-parent-reuse as the hardcoded default.	Medium
Nominal snapshot timestamp	Add optional `time_nominal` to `SnapshotMeta` for the data’s real-world timestamp (e.g. ZFS snapshot time), distinct from backup start/end times	Low

Implemented

Feature	Description
Pack files	Chunks grouped into ~32 MiB packs with dynamic sizing, separate data/tree packs
Retention policies	`keep_daily`, `keep_weekly`, `keep_monthly`, `keep_yearly`, `keep_last`, `keep_within`
snapshot delete command	Remove individual snapshots, decrement refcounts
prune command	Apply retention policies, remove expired snapshots
check command	Structural integrity + optional `--verify-data` for full content verification
Type-safe PackId	Newtype for pack file identifiers with `storage_key()`
compact command	Rewrite packs to reclaim space from orphaned blobs after delete/prune
REST server	axum-based backup server with auth, append-only enforcement, quotas, freshness tracking, and server-side compaction
REST backend	`StorageBackend` over HTTP with range-read support
Tiered dedup index	Backup dedup via session map + xor filter + mmap dedup cache, with safe fallback to HashMap dedup mode
Restore mmap cache	Restore-cache-first item-stream lookup with safe fallback to the full index when cache entries are stale or incomplete
Append-only repository layout v2	Snapshot listing derived from immutable `snapshots/<id>` blobs; `index` stores authenticated generation and `index.gen` is an advisory cache hint
Bounded parallel pipeline	Byte-budgeted pipeline with bounded worker/upload concurrency derived from `limits.threads` and `limits.connections`
Heap-backed pack assembly	Pack writers use heap-backed buffers after the mmap path was removed for reliability on some systems
cache_dir override	Configurable root for file cache, dedup/restore/full-index caches, and preferred mmap temp-file location
Parallel transforms	rayon-backed compression/encryption within the bounded pipeline
break-lock command	Forced stale-lock cleanup for backend/object lock recovery
Compact pack health accounting	Compact analysis reports/tracks corrupt and orphan packs in addition to reclaimable dead bytes
File-level cache	inode/mtime/ctime skip for unchanged files — avoids read, chunk, compress, encrypt. Keys are 16-byte BLAKE2b path hashes (with transparent legacy migration). Stored locally under the per-repo cache root (default platform cache dir + `vykar`, or `cache_dir` override).
Daemon mode	`vykar daemon` runs scheduled backup→prune→compact→check cycles with two-stage signal handling
Server-side pack verification	`vykar check` delegates pack integrity checks to vykar-server when available; `--distrust-server` opts out
Upload integrity	REST `PUT` includes `X-Content-BLAKE2b` header; server verifies during streaming write
vykar-protocol crate	Shared wire-format types and pack/protocol version constants between client and server
Type-safe SnapshotId	Newtype for snapshot identifiers with `storage_key()` for `snapshots/<id>` objects

Setup

Vykar includes a dedicated backup server for secure, policy-enforced remote backups. TLS is typically handled by a reverse proxy such as nginx or Caddy.

Why a dedicated REST server instead of plain S3

Dumb storage backends (S3, WebDAV, SFTP) work well for basic backups, but they cannot enforce policy or do server-side work. vykar-server adds capabilities that object storage alone cannot provide.

Capability	S3 / dumb storage	vykar-server
Append-only mode	S3 Object Lock + soft-delete preserves previous versions for a configurable retention period; overwrites are not blocked but are recoverable within the retention window	Rejects deletes and overwrites of immutable keys; only `index`, `index.gen`, `locks/`, and `sessions/` remain mutable
Server-side compaction	Client must download and re-upload all live blobs	Server repacks locally on disk from a compact plan
Quota enforcement	Requires external bucket policy/IAM setup	Built-in byte quota checks on writes
Backup freshness monitoring	Requires external polling and parsing	Tracks `last_backup_at` on new snapshot writes
Upload integrity	Relies on backend checksums only	Verifies `X-Content-BLAKE2b` during uploads
Structural health checks	Client has to fetch data to verify structure	Server validates repository shape directly

All data remains client-side encrypted. The server never has the encryption key and cannot read backup contents.

Install

Download a binary for your platform from the releases page.

Server configuration

All settings are passed as CLI flags. The authentication token is read from the VYKAR_TOKEN environment variable so it does not appear in process arguments.

CLI flags

Flag	Default	Description
`-l, --listen`	`localhost:8585`	Address to listen on
`-d, --data-dir`	`/var/lib/vykar`	Root directory where repositories are stored
`--append-only`	`false`	Reject `DELETE` and overwriting immutable keys (config, keys, snapshots, packs). Mutable keys (index, index.gen, locks, sessions) remain writable.
`--log-format`	`pretty`	Log output format: `json` or `pretty`
`--quota`	auto-detect	Storage quota (`500M`, `10G`, plain bytes). If omitted, the server detects filesystem quota or falls back to free space
`--network-threads`	`4`	Async threads for handling network connections
`--io-threads`	`6`	Threads for blocking disk I/O (reads, writes, hashing)
`--debug`	`false`	Enable debug logging

Environment variables

Variable	Required	Description
`VYKAR_TOKEN`	Yes	Shared bearer token for authentication

Start the server

export VYKAR_TOKEN="some-secret-token"
vykar-server --data-dir /var/lib/vykar --append-only --quota 10G

Run as a systemd service

Create an environment file at /etc/vykar/vykar-server.env with restricted permissions:

sudo mkdir -p /etc/vykar
echo 'VYKAR_TOKEN=some-secret-token' | sudo tee /etc/vykar/vykar-server.env
sudo chmod 600 /etc/vykar/vykar-server.env
sudo chown vykar:vykar /etc/vykar/vykar-server.env

Create /etc/systemd/system/vykar-server.service:

[Unit]
Description=Vykar backup REST server
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=vykar
Group=vykar
EnvironmentFile=/etc/vykar/vykar-server.env
ExecStart=/usr/local/bin/vykar-server --data-dir /var/lib/vykar --append-only
Restart=on-failure
RestartSec=2
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=full
ProtectHome=true
ReadWritePaths=/var/lib/vykar

[Install]
WantedBy=multi-user.target

Then reload and enable:

sudo systemctl daemon-reload
sudo systemctl enable --now vykar-server.service
sudo systemctl status vykar-server.service

Reverse proxy

vykar-server listens on HTTP and expects a reverse proxy to handle TLS. Pack uploads can be up to 512 MiB, so the proxy must allow large request bodies.

Nginx

server {
    listen 443 ssl http2;
    server_name backup.example.com;

    ssl_certificate     /etc/letsencrypt/live/backup.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/backup.example.com/privkey.pem;

    client_max_body_size    600m;
    proxy_request_buffering off;

    location / {
        proxy_pass http://127.0.0.1:8585;
    }
}

Caddy

backup.example.com {
    request_body {
        max_size 600MB
    }
    reverse_proxy 127.0.0.1:8585
}

Client configuration (REST backend)

repositories:
  - label: "server"
    url: "https://backup.example.com"
    access_token: "some-secret-token"

encryption:
  mode: "auto"

sources:
  - "/home/user/documents"

All standard repository commands (init, backup, list, info, restore, delete, prune, check, compact) work over REST without changing the CLI workflow.

Health check

# No auth required
curl http://localhost:8585/health

Returns JSON like:

{"status":"ok","version":"0.1.0"}

Server Internals

Technical reference for vykar-server: crate layout, REST API surface, authentication, policy enforcement, and server-side maintenance helpers.

For deployment and configuration, see Setup.

Crate Layout

Component	Location	Purpose
vykar-server	`crates/vykar-server/`	axum HTTP server and admin operations
vykar-protocol	`crates/vykar-protocol/`	Shared wire-format types, pack format constants, and transport validation (no I/O or crypto)
RestBackend	`crates/vykar-storage/src/rest_backend.rs`	`StorageBackend` implementation over HTTP

REST API

The server exposes normal storage-object routes plus a small set of admin query endpoints. Repository state still lives as ordinary keys under the configured data_dir.

Storage object routes

Method	Path	Maps to	Notes
`GET`	`/{*path}`	`get(key)`	Returns `200` + body or `404`. With a `Range` header, this becomes a ranged read and returns `206`.
`HEAD`	`/{*path}`	`exists(key)`	Returns `200` with metadata or `404`.
`PUT`	`/{*path}`	`put(key, data)`	Raw bytes body. REST clients send `X-Content-BLAKE2b`; the server verifies it while streaming the write.
`DELETE`	`/{*path}`	`delete(key)`	Returns `204` or `404`. Rejected with `403` in append-only mode.
`GET`	`/{*path}?list`	`list(prefix)`	Returns a JSON array of matching keys.
`POST`	`/{*path}?mkdir`	`create_dir(key)`	Creates directory scaffolding.

Admin routes

Method	Path	Description
`POST`	`/?init`	Create repo directory scaffolding (`keys`, `snapshots`, `locks`, `packs/00..ff`)
`POST`	`/?batch-delete`	Delete a JSON list of keys
`POST`	`/?batch-delete&cleanup-dirs`	Delete keys and try to remove now-empty parent directories
`POST`	`/?repack`	Server-side pack repack using a client-supplied plan
`POST`	`/?verify-packs`	Server-side pack verification using a client-supplied plan
`GET`	`/?stats`	Repository size, object count, pack count, `last_backup_at`, and quota info
`GET`	`/?verify-structure`	Structural repository validation
`GET`	`/?list`	List all keys in the repository
`GET`	`/health`	Unauthenticated liveness endpoint returning `status` and `version`

There are no dedicated /locks endpoints. Clients store lock and session objects through the normal object API (locks/*, sessions/*).

Authentication

All routes except GET /health require Authorization: Bearer <token>. The token comes from the VYKAR_TOKEN environment variable and is checked with a constant-time comparison.

Append-Only Enforcement

When append_only = true:

DELETE on any object path returns 403 Forbidden
PUT to an existing key returns 403 unless the key is on the mutable-allowlist
Mutable-allowlist: index, index.gen, locks/*, sessions/* — these may be overwritten freely
All other keys (config, keys/*, snapshots/*, packs/*) are immutable once written
/?batch-delete is rejected
/?repack operations that delete old packs are rejected

This protects existing history from a compromised client while still allowing normal backup commits. In particular, snapshot blobs under snapshots/ are immutable — a compromised client cannot hide historical backups by overwriting or deleting them.

Quota Enforcement

Quota is enforced on writes. If --quota is omitted, the server auto-detects a limit from filesystem quota information or free space. If a write would exceed the active limit, the request is rejected before or during upload.

The stats response includes:

{
  "total_bytes": 1073741824,
  "total_objects": 234,
  "total_packs": 42,
  "last_backup_at": "2026-02-11T14:30:00Z",
  "quota_bytes": 5368709120,
  "quota_used_bytes": 1073741824,
  "quota_source": "Explicit"
}

Backup Freshness Monitoring

The server updates last_backup_at when it observes a new snapshots/* key being written for the first time. This marks the completion of a backup commit.

Server-Side Verify Packs

vykar check can offload pack verification to the server when the backend is REST and the server supports /?verify-packs.

The client sends a verification plan describing packs and expected blob boundaries. The server validates:

pack header magic and version
blob boundaries and length-prefix structure
BLAKE2b hash of pack contents

If the user passes vykar check --distrust-server, the client falls back to downloading and verifying data locally.

Server-Side Repack

vykar compact can use /?repack to rewrite packs server-side without downloading encrypted blobs to the client.

High-level flow:

The client opens the repo and analyzes pack liveness from the index.
The client sends a repack plan describing source packs and live blob offsets.
The server copies the referenced encrypted blobs into new pack files, preserving the pack wire format.
The server returns new pack keys and offsets so the client can update the chunk index.

This is encrypted passthrough: the server never decrypts chunk payloads.

Structure Checks

GET /?verify-structure validates repository shape without needing encryption keys. It checks:

required directories and expected key layout
pack shard naming and pack header magic/version
malformed or obviously invalid pack files

This complements client-side vykar check, which still owns full cryptographic verification.

RestBackend

crates/vykar-storage/src/rest_backend.rs implements StorageBackend with ureq. In addition to the trait surface, it exposes helper methods used by client commands:

batch_delete()
stats()
verify_packs()
repack()

It also sends X-Content-BLAKE2b on PUT requests and validates Content-Range on ranged reads.

Client config:

repositories:
  - label: server
    url: https://backup.example.com
    access_token: "secret-token-here"

Related: Setup, Architecture

Keyboard shortcuts

Vykar Backup Documentation