🔗 files.link
guides

Git LFS Alternatives: When Large File Storage Becomes a Problem

Git LFS pain points — bandwidth caps, storage costs, CI/CD issues — and practical alternatives: files.link, DVC, git-annex, and direct S3.

What Git LFS Does and Why Teams Adopt It

Git was designed for source code — small text files that diff efficiently. When teams need to track large binary files (design assets, machine learning models, video, compiled binaries), Git struggles. Every clone downloads the entire history of every file, and large binaries bloat the repository size permanently.

Git LFS (Large File Storage) solves this by replacing large files in your repository with small pointer files. The actual file content lives on a separate LFS server, and Git downloads only the versions you check out. Your repository stays small, clones are fast, and git log works normally.

On paper, it is elegant. In practice, teams run into real problems — and those problems get worse as the project grows.

The Pain Points of Git LFS

Bandwidth limits and storage quotas

GitHub includes 1 GB of free LFS storage and 1 GB of bandwidth per month. After that, you buy data packs at $5/month for 50 GB of bandwidth and 50 GB of storage. This sounds reasonable until your CI/CD pipeline clones the repository 50 times a day, each time pulling LFS files. A team of 10 developers plus 3 CI runners can burn through the bandwidth quota in a week.

GitLab offers 10 GB of LFS storage on free plans, but bandwidth is counted against your overall transfer limit. Bitbucket's limits are similar. Self-hosted solutions avoid these limits but introduce the operational cost of running your own LFS server.

CI/CD complications

Every CI/CD pipeline that clones the repository needs LFS credentials. This means:

  • Configuring git lfs install in your CI environment
  • Storing LFS credentials as CI secrets
  • Handling LFS authentication failures (which produce cryptic error messages)
  • Paying bandwidth costs for every CI run

Many teams end up adding GIT_LFS_SKIP_SMUDGE=1 to skip LFS downloads in CI, then selectively fetching only the files they need. This works but adds complexity to every pipeline.

# Common CI workaround: skip LFS, then selectively fetch
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/org/repo.git
cd repo
git lfs pull --include="assets/needed-file.bin"

Migration pain

Once LFS is set up, migrating existing large files into LFS requires rewriting Git history with git lfs migrate. This changes every commit hash, breaks open pull requests, and forces every team member to re-clone the repository. It is a one-way door that affects the entire team.

# Migrate existing files to LFS — rewrites ALL history
git lfs migrate import --include="*.psd,*.zip" --everything
# Every team member must now re-clone

LFS locking

Binary files cannot be merged. If two people edit the same Photoshop file simultaneously, one person's work is lost. Git LFS has a file locking feature to prevent this, but it requires explicit git lfs lock and git lfs unlock commands. Teams frequently forget to lock files, and locked files that are not unlocked block other team members.

Storage costs at scale

LFS stores every version of every tracked file. If you have a 500 MB machine learning model that gets updated weekly, after a year you have 26 GB of LFS storage for that single file. At GitHub's pricing, that is meaningful. Deleting old LFS objects requires running git lfs prune, which only works locally — the server-side storage is harder to reclaim.

Alternative 1: Dedicated File Storage Service

The most direct alternative is to stop storing large files in Git entirely. Use a dedicated file storage service and reference files by URL in your codebase.

How it works with [files.link](/git-lfs-storage):

# Upload the asset to files.link
curl -X POST https://api.files.link/v1/files \
  -H "Authorization: YOUR_API_KEY" \
  -F "[email protected]" \
  -F "folderId=ASSETS_FOLDER_ID"

# Response includes a permanent CDN URL
# https://cdn.files.link/p/abc123/model-weights.bin

Store the URL in a configuration file or environment variable. Your CI/CD pipeline downloads assets via HTTP — no LFS credentials, no Git authentication, no bandwidth quotas.

# In your CI pipeline — download assets via HTTP
curl -O https://cdn.files.link/p/abc123/model-weights.bin

Advantages:

  • No Git history bloat — large files never enter the repository
  • No bandwidth quotas on Git hosting
  • CDN delivery — assets are served from edge locations, not pulled from a Git server
  • CI/CD pipelines just use curl — no LFS setup required
  • Version management is explicit (upload a new file, get a new URL)

Trade-offs:

  • File versions are not tied to Git commits (you manage the mapping yourself)
  • Requires a separate service account and API key
  • No automatic diffing or history in git log

This approach works well for CI/CD artifact storage where build outputs, test fixtures, and deployment assets need to be shared across pipelines without polluting the Git repository.

Alternative 2: DVC (Data Version Control)

DVC is designed specifically for machine learning workflows. It tracks large files, datasets, and model artifacts outside Git, using cloud storage (S3, GCS, Azure Blob) as the backend.

How it works:

# Initialize DVC in your Git repository
dvc init

# Track a large file — creates a .dvc pointer file
dvc add data/training-set.tar.gz

# Push the actual data to your configured remote (S3, GCS, etc.)
dvc push

# Another developer pulls the data
dvc pull

DVC creates .dvc pointer files (similar to LFS pointer files) that are committed to Git. The actual data lives on your own cloud storage.

Advantages:

  • You control the storage backend — use your own S3 bucket with your own pricing
  • Pipeline tracking — DVC can define data processing pipelines with dependency graphs
  • Experiment tracking — built-in support for ML experiment comparison
  • No bandwidth quotas from Git hosting providers

Trade-offs:

  • Another tool to install and learn (Python-based, pip install)
  • You manage the cloud storage infrastructure yourself
  • Team onboarding requires DVC setup on every developer's machine
  • Not widely adopted outside of ML/data science workflows

Alternative 3: git-annex

git-annex is the oldest and most flexible large file solution for Git. It predates Git LFS and supports a wide range of storage backends — local drives, SSH remotes, S3, rsync, even BitTorrent.

How it works:

# Initialize git-annex
git annex init "my laptop"

# Add a large file
git annex add large-video.mp4

# Sync with a remote
git annex sync --content

Advantages:

  • Extremely flexible storage backends
  • Content can exist on multiple remotes simultaneously
  • Fine-grained control over which files are present locally
  • Works fully offline — sync when ready

Trade-offs:

  • Complex — the learning curve is steep compared to Git LFS
  • Limited support on Windows
  • Not well-supported by GitHub, GitLab, or Bitbucket (no built-in UI integration)
  • Sparse community compared to Git LFS or DVC

Alternative 4: Direct S3 with Version Tags

The simplest alternative is to skip any Git integration entirely. Upload large files to S3 (or any object storage), and reference them by URL or version tag in your codebase.

# Upload with a version tag
aws s3 cp model-v2.3.bin s3://my-assets/models/model-v2.3.bin

# Reference in your config file
echo "MODEL_URL=https://my-assets.s3.amazonaws.com/models/model-v2.3.bin" >> .env

Advantages:

  • Zero Git tooling overhead
  • Full control over storage, pricing, and retention
  • Works with any CI/CD system via standard AWS CLI

Trade-offs:

  • Manual version management (you choose the naming convention)
  • No integration with Git history — git log shows nothing about file changes
  • Requires AWS credentials management
  • No built-in CDN (unless you add CloudFront yourself)

Comparison of Alternatives

Each alternative optimizes for a different use case. Here is how they compare across the dimensions that matter most.

DimensionGit LFSfiles.linkDVCgit-annexS3 Direct
Git integrationNativeNone (URL reference)Pointer filesDeep integrationNone
Storage backendGit host or custom serverManaged (global CDN included)S3, GCS, Azure (your account)Any (S3, SSH, local, etc.)S3 (your account)
CI/CD setupLFS credentials requiredHTTP download (curl)dvc pull + cloud credentialsgit-annex + remote configAWS CLI + credentials
Bandwidth costsGit host quotas applyIncluded in creditsYour cloud storage costsYour remote costsS3 transfer costs
Version trackingTied to Git commitsManual (new URL per version)Tied to Git commitsTied to Git commitsManual (naming convention)
CDN deliveryNoYes (450+ locations)NoNoOptional (add CloudFront)
Learning curveLowLowMediumHighLow
Best forSmall teams, moderate file sizesCI/CD assets, web appsML/data science pipelinesComplex multi-remote setupsSimple, full control

When Git LFS Is Still the Right Choice

Despite its pain points, Git LFS is the right tool in specific scenarios:

  • Your large files change infrequently. If you track a handful of binary assets that update once a month, the bandwidth costs are negligible and the Git integration is convenient.
  • Your team expects Git semantics. Designers and artists who use Git want git pull to give them the latest assets without learning a second tool.
  • You are on a self-hosted Git server. Running your own GitLab or Gitea instance with LFS eliminates bandwidth quotas entirely. The storage costs are just disk space on your server.
  • You need file versions tied to code commits. If reverting to commit abc123 must also revert the binary assets to their state at that commit, LFS (or DVC/git-annex) provides this automatically.

The honest recommendation: start by asking whether large files belong in your Git repository at all. If they do, Git LFS is the simplest starting point. If they do not — if they are build artifacts, deployment assets, media files, or ML models that change on a different cadence than your code — a dedicated file storage service or direct cloud storage is a cleaner architecture.