Git LFS Alternatives: When Large File Storage Becomes a Problem
Git LFS pain points — bandwidth caps, storage costs, CI/CD issues — and practical alternatives: files.link, DVC, git-annex, and direct S3.
What Git LFS Does and Why Teams Adopt It
Git was designed for source code — small text files that diff efficiently. When teams need to track large binary files (design assets, machine learning models, video, compiled binaries), Git struggles. Every clone downloads the entire history of every file, and large binaries bloat the repository size permanently.
Git LFS (Large File Storage) solves this by replacing large files in your repository with small pointer files. The actual file content lives on a separate LFS server, and Git downloads only the versions you check out. Your repository stays small, clones are fast, and git log works normally.
On paper, it is elegant. In practice, teams run into real problems — and those problems get worse as the project grows.
The Pain Points of Git LFS
Bandwidth limits and storage quotas
GitHub includes 1 GB of free LFS storage and 1 GB of bandwidth per month. After that, you buy data packs at $5/month for 50 GB of bandwidth and 50 GB of storage. This sounds reasonable until your CI/CD pipeline clones the repository 50 times a day, each time pulling LFS files. A team of 10 developers plus 3 CI runners can burn through the bandwidth quota in a week.
GitLab offers 10 GB of LFS storage on free plans, but bandwidth is counted against your overall transfer limit. Bitbucket's limits are similar. Self-hosted solutions avoid these limits but introduce the operational cost of running your own LFS server.
CI/CD complications
Every CI/CD pipeline that clones the repository needs LFS credentials. This means:
- Configuring
git lfs installin your CI environment - Storing LFS credentials as CI secrets
- Handling LFS authentication failures (which produce cryptic error messages)
- Paying bandwidth costs for every CI run
Many teams end up adding GIT_LFS_SKIP_SMUDGE=1 to skip LFS downloads in CI, then selectively fetching only the files they need. This works but adds complexity to every pipeline.
# Common CI workaround: skip LFS, then selectively fetch
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/org/repo.git
cd repo
git lfs pull --include="assets/needed-file.bin"Migration pain
Once LFS is set up, migrating existing large files into LFS requires rewriting Git history with git lfs migrate. This changes every commit hash, breaks open pull requests, and forces every team member to re-clone the repository. It is a one-way door that affects the entire team.
# Migrate existing files to LFS — rewrites ALL history
git lfs migrate import --include="*.psd,*.zip" --everything
# Every team member must now re-cloneLFS locking
Binary files cannot be merged. If two people edit the same Photoshop file simultaneously, one person's work is lost. Git LFS has a file locking feature to prevent this, but it requires explicit git lfs lock and git lfs unlock commands. Teams frequently forget to lock files, and locked files that are not unlocked block other team members.
Storage costs at scale
LFS stores every version of every tracked file. If you have a 500 MB machine learning model that gets updated weekly, after a year you have 26 GB of LFS storage for that single file. At GitHub's pricing, that is meaningful. Deleting old LFS objects requires running git lfs prune, which only works locally — the server-side storage is harder to reclaim.
Alternative 1: Dedicated File Storage Service
The most direct alternative is to stop storing large files in Git entirely. Use a dedicated file storage service and reference files by URL in your codebase.
How it works with [files.link](/git-lfs-storage):
# Upload the asset to files.link
curl -X POST https://api.files.link/v1/files \
-H "Authorization: YOUR_API_KEY" \
-F "[email protected]" \
-F "folderId=ASSETS_FOLDER_ID"
# Response includes a permanent CDN URL
# https://cdn.files.link/p/abc123/model-weights.binStore the URL in a configuration file or environment variable. Your CI/CD pipeline downloads assets via HTTP — no LFS credentials, no Git authentication, no bandwidth quotas.
# In your CI pipeline — download assets via HTTP
curl -O https://cdn.files.link/p/abc123/model-weights.binAdvantages:
- No Git history bloat — large files never enter the repository
- No bandwidth quotas on Git hosting
- CDN delivery — assets are served from edge locations, not pulled from a Git server
- CI/CD pipelines just use
curl— no LFS setup required - Version management is explicit (upload a new file, get a new URL)
Trade-offs:
- File versions are not tied to Git commits (you manage the mapping yourself)
- Requires a separate service account and API key
- No automatic diffing or history in
git log
This approach works well for CI/CD artifact storage where build outputs, test fixtures, and deployment assets need to be shared across pipelines without polluting the Git repository.
Alternative 2: DVC (Data Version Control)
DVC is designed specifically for machine learning workflows. It tracks large files, datasets, and model artifacts outside Git, using cloud storage (S3, GCS, Azure Blob) as the backend.
How it works:
# Initialize DVC in your Git repository
dvc init
# Track a large file — creates a .dvc pointer file
dvc add data/training-set.tar.gz
# Push the actual data to your configured remote (S3, GCS, etc.)
dvc push
# Another developer pulls the data
dvc pullDVC creates .dvc pointer files (similar to LFS pointer files) that are committed to Git. The actual data lives on your own cloud storage.
Advantages:
- You control the storage backend — use your own S3 bucket with your own pricing
- Pipeline tracking — DVC can define data processing pipelines with dependency graphs
- Experiment tracking — built-in support for ML experiment comparison
- No bandwidth quotas from Git hosting providers
Trade-offs:
- Another tool to install and learn (Python-based, pip install)
- You manage the cloud storage infrastructure yourself
- Team onboarding requires DVC setup on every developer's machine
- Not widely adopted outside of ML/data science workflows
Alternative 3: git-annex
git-annex is the oldest and most flexible large file solution for Git. It predates Git LFS and supports a wide range of storage backends — local drives, SSH remotes, S3, rsync, even BitTorrent.
How it works:
# Initialize git-annex
git annex init "my laptop"
# Add a large file
git annex add large-video.mp4
# Sync with a remote
git annex sync --contentAdvantages:
- Extremely flexible storage backends
- Content can exist on multiple remotes simultaneously
- Fine-grained control over which files are present locally
- Works fully offline — sync when ready
Trade-offs:
- Complex — the learning curve is steep compared to Git LFS
- Limited support on Windows
- Not well-supported by GitHub, GitLab, or Bitbucket (no built-in UI integration)
- Sparse community compared to Git LFS or DVC
Alternative 4: Direct S3 with Version Tags
The simplest alternative is to skip any Git integration entirely. Upload large files to S3 (or any object storage), and reference them by URL or version tag in your codebase.
# Upload with a version tag
aws s3 cp model-v2.3.bin s3://my-assets/models/model-v2.3.bin
# Reference in your config file
echo "MODEL_URL=https://my-assets.s3.amazonaws.com/models/model-v2.3.bin" >> .envAdvantages:
- Zero Git tooling overhead
- Full control over storage, pricing, and retention
- Works with any CI/CD system via standard AWS CLI
Trade-offs:
- Manual version management (you choose the naming convention)
- No integration with Git history —
git logshows nothing about file changes - Requires AWS credentials management
- No built-in CDN (unless you add CloudFront yourself)
Comparison of Alternatives
Each alternative optimizes for a different use case. Here is how they compare across the dimensions that matter most.
| Dimension | Git LFS | files.link | DVC | git-annex | S3 Direct |
|---|---|---|---|---|---|
| Git integration | Native | None (URL reference) | Pointer files | Deep integration | None |
| Storage backend | Git host or custom server | Managed (global CDN included) | S3, GCS, Azure (your account) | Any (S3, SSH, local, etc.) | S3 (your account) |
| CI/CD setup | LFS credentials required | HTTP download (curl) | dvc pull + cloud credentials | git-annex + remote config | AWS CLI + credentials |
| Bandwidth costs | Git host quotas apply | Included in credits | Your cloud storage costs | Your remote costs | S3 transfer costs |
| Version tracking | Tied to Git commits | Manual (new URL per version) | Tied to Git commits | Tied to Git commits | Manual (naming convention) |
| CDN delivery | No | Yes (450+ locations) | No | No | Optional (add CloudFront) |
| Learning curve | Low | Low | Medium | High | Low |
| Best for | Small teams, moderate file sizes | CI/CD assets, web apps | ML/data science pipelines | Complex multi-remote setups | Simple, full control |
When Git LFS Is Still the Right Choice
Despite its pain points, Git LFS is the right tool in specific scenarios:
- Your large files change infrequently. If you track a handful of binary assets that update once a month, the bandwidth costs are negligible and the Git integration is convenient.
- Your team expects Git semantics. Designers and artists who use Git want
git pullto give them the latest assets without learning a second tool. - You are on a self-hosted Git server. Running your own GitLab or Gitea instance with LFS eliminates bandwidth quotas entirely. The storage costs are just disk space on your server.
- You need file versions tied to code commits. If reverting to commit
abc123must also revert the binary assets to their state at that commit, LFS (or DVC/git-annex) provides this automatically.
The honest recommendation: start by asking whether large files belong in your Git repository at all. If they do, Git LFS is the simplest starting point. If they do not — if they are build artifacts, deployment assets, media files, or ML models that change on a different cadence than your code — a dedicated file storage service or direct cloud storage is a cleaner architecture.