guides

Git LFS Alternatives: When Large File Storage Becomes a Problem

Git LFS pain points — bandwidth caps, storage costs, CI/CD issues — and practical alternatives: files.link, DVC, git-annex, and direct S3.

March 9, 2026

What Git LFS Does and Why Teams Adopt It

Git was designed for source code — small text files that diff efficiently. When teams need to track large binary files (design assets, machine learning models, video, compiled binaries), Git struggles. Every clone downloads the entire history of every file, and large binaries bloat the repository size permanently.

Git LFS (Large File Storage) solves this by replacing large files in your repository with small pointer files. The actual file content lives on a separate LFS server, and Git downloads only the versions you check out. Your repository stays small, clones are fast, and git log works normally.

On paper, it is elegant. In practice, teams run into real problems — and those problems get worse as the project grows.

The Pain Points of Git LFS

Bandwidth limits and storage quotas

GitHub includes 10 GB of free LFS storage and 10 GB of bandwidth per month on Free and Pro plans. After that, you buy data packs for additional capacity. This sounds reasonable until your CI/CD pipeline clones the repository 50 times a day, each time pulling LFS files. A team of 10 developers plus 3 CI runners can burn through the bandwidth quota in a week.

GitLab offers 10 GB of LFS storage on free plans, but bandwidth is counted against your overall transfer limit. Bitbucket's limits are similar. Self-hosted solutions avoid these limits but introduce the operational cost of running your own LFS server.

CI/CD complications

Every CI/CD pipeline that clones the repository needs LFS credentials. This means:

Configuring git lfs install in your CI environment
Storing LFS credentials as CI secrets
Handling LFS authentication failures (which produce cryptic error messages)
Paying bandwidth costs for every CI run

Many teams end up adding GIT_LFS_SKIP_SMUDGE=1 to skip LFS downloads in CI, then selectively fetching only the files they need. This works but adds complexity to every pipeline.

# Common CI workaround: skip LFS, then selectively fetch
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/org/repo.git
cd repo
git lfs pull --include="assets/needed-file.bin"

Migration pain

Once LFS is set up, migrating existing large files into LFS requires rewriting Git history with git lfs migrate. This changes every commit hash, breaks open pull requests, and forces every team member to re-clone the repository. It is a one-way door that affects the entire team.

# Migrate existing files to LFS — rewrites ALL history
git lfs migrate import --include="*.psd,*.zip" --everything
# Every team member must now re-clone

LFS locking

Binary files cannot be merged. If two people edit the same Photoshop file simultaneously, one person's work is lost. Git LFS has a file locking feature to prevent this, but it requires explicit git lfs lock and git lfs unlock commands. Teams frequently forget to lock files, and locked files that are not unlocked block other team members.

Storage costs at scale

LFS stores every version of every tracked file. If you have a 500 MB machine learning model that gets updated weekly, after a year you have 26 GB of LFS storage for that single file. At GitHub's pricing, that is meaningful. Deleting old LFS objects requires running git lfs prune, which only works locally — the server-side storage is harder to reclaim.

Alternative 1: Dedicated File Storage Service

The most direct alternative is to stop storing large files in Git entirely. Use a dedicated file storage service and reference files by URL in your codebase.

How it works with [files.link](/git-lfs-storage):

# Upload the asset to files.link
curl -X POST https://api.files.link/v1/files \
  -H "Authorization: YOUR_API_KEY" \
  -F "file=@model-weights.bin" \
  -F "folderId=ASSETS_FOLDER_ID"

# Response includes a permanent CDN URL
# https://cdn.files.link/p/abc123/model-weights.bin

Store the URL in a configuration file or environment variable. Your CI/CD pipeline downloads assets via HTTP — no LFS credentials, no Git authentication, no bandwidth quotas.

# In your CI pipeline — download assets via HTTP
curl -O https://cdn.files.link/p/abc123/model-weights.bin

Advantages:

No Git history bloat — large files never enter the repository
No bandwidth quotas on Git hosting
CDN delivery — assets are served from edge locations, not pulled from a Git server
CI/CD pipelines just use curl — no LFS setup required
Version management is explicit (upload a new file, get a new URL)

Trade-offs:

File versions are not tied to Git commits (you manage the mapping yourself)
Requires a separate service account and API key
No automatic diffing or history in git log

This approach works well for CI/CD artifact storage where build outputs, test fixtures, and deployment assets need to be shared across pipelines without polluting the Git repository.

Alternative 2: DVC (Data Version Control)

DVC is designed specifically for machine learning workflows. It tracks large files, datasets, and model artifacts outside Git, using cloud storage (S3, GCS, Azure Blob) as the backend.

How it works:

# Initialize DVC in your Git repository
dvc init

# Track a large file — creates a .dvc pointer file
dvc add data/training-set.tar.gz

# Push the actual data to your configured remote (S3, GCS, etc.)
dvc push

# Another developer pulls the data
dvc pull

DVC creates .dvc pointer files (similar to LFS pointer files) that are committed to Git. The actual data lives on your own cloud storage.

Advantages:

You control the storage backend — use your own S3 bucket with your own pricing
Pipeline tracking — DVC can define data processing pipelines with dependency graphs
Experiment tracking — built-in support for ML experiment comparison
No bandwidth quotas from Git hosting providers

Trade-offs:

Another tool to install and learn (Python-based, pip install)
You manage the cloud storage infrastructure yourself
Team onboarding requires DVC setup on every developer's machine
Not widely adopted outside of ML/data science workflows

Alternative 3: git-annex

git-annex is the oldest and most flexible large file solution for Git. It predates Git LFS and supports a wide range of storage backends — local drives, SSH remotes, S3, rsync, even BitTorrent.

How it works:

# Initialize git-annex
git annex init "my laptop"

# Add a large file
git annex add large-video.mp4

# Sync with a remote
git annex sync --content

Advantages:

Extremely flexible storage backends
Content can exist on multiple remotes simultaneously
Fine-grained control over which files are present locally
Works fully offline — sync when ready

Trade-offs:

Complex — the learning curve is steep compared to Git LFS
Limited support on Windows
Not well-supported by GitHub, GitLab, or Bitbucket (no built-in UI integration)
Sparse community compared to Git LFS or DVC

Alternative 4: Direct S3 with Version Tags

The simplest alternative is to skip any Git integration entirely. Upload large files to S3 (or any object storage), and reference them by URL or version tag in your codebase.

# Upload with a version tag
aws s3 cp model-v2.3.bin s3://my-assets/models/model-v2.3.bin

# Reference in your config file
echo "MODEL_URL=https://my-assets.s3.amazonaws.com/models/model-v2.3.bin" >> .env

Advantages:

Zero Git tooling overhead
Full control over storage, pricing, and retention
Works with any CI/CD system via standard AWS CLI

Trade-offs:

Manual version management (you choose the naming convention)
No integration with Git history — git log shows nothing about file changes
Requires AWS credentials management
No built-in CDN (unless you add CloudFront yourself)

Comparison of Alternatives

Each alternative optimizes for a different use case. Here is how they compare across the dimensions that matter most.

Dimension	Git LFS	files.link	DVC	git-annex	S3 Direct
Git integration	Native	None (URL reference)	Pointer files	Deep integration	None
Storage backend	Git host or custom server	Managed (global CDN included)	S3, GCS, Azure (your account)	Any (S3, SSH, local, etc.)	S3 (your account)
CI/CD setup	LFS credentials required	HTTP download (curl)	dvc pull + cloud credentials	git-annex + remote config	AWS CLI + credentials
Bandwidth costs	Git host quotas apply	Included in credits	Your cloud storage costs	Your remote costs	S3 transfer costs
Version tracking	Tied to Git commits	Manual (new URL per version)	Tied to Git commits	Tied to Git commits	Manual (naming convention)
CDN delivery	No	Yes (450+ locations)	No	No	Optional (add CloudFront)
Learning curve	Low	Low	Medium	High	Low
Best for	Small teams, moderate file sizes	CI/CD assets, web apps	ML/data science pipelines	Complex multi-remote setups	Simple, full control

When Git LFS Is Still the Right Choice

Despite its pain points, Git LFS is the right tool in specific scenarios:

Your large files change infrequently. If you track a handful of binary assets that update once a month, the bandwidth costs are negligible and the Git integration is convenient.
Your team expects Git semantics. Designers and artists who use Git want git pull to give them the latest assets without learning a second tool.
You are on a self-hosted Git server. Running your own GitLab or Gitea instance with LFS eliminates bandwidth quotas entirely. The storage costs are just disk space on your server.
You need file versions tied to code commits. If reverting to commit abc123 must also revert the binary assets to their state at that commit, LFS (or DVC/git-annex) provides this automatically.

The honest recommendation: start by asking whether large files belong in your Git repository at all. If they do, Git LFS is the simplest starting point. If they do not — if they are build artifacts, deployment assets, media files, or ML models that change on a different cadence than your code — a dedicated file storage service or direct cloud storage is a cleaner architecture.