University Polytechnic of Timișoara Timișoara, 2017 Page 2 Table of Contents 1. Binary File Storage …………………………….. [619685]

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 2
Table of Contents
1. Binary File Storage ………………………….. ………………………….. ……………………….. 4
1.1. Binary File Storage Problem ………………………….. ………………………….. ……… 4
1.2. Alternatives for Binary File Storage Problem ………………………….. …………… 5
1.3. Git LFS (Git Large File Storage) ………………………….. ………………………….. .. 8
1.4. Working with Git LFS from Scratch ………………………….. ……………………….. 8
2. Introduction to Git ………………………….. ………………………….. ……………………….. 10
2.1. Git, a Distributed Version Control System ………………………….. …………….. 10
2.2. Git Object Model ………………………….. ………………………….. ……………………. 11
2.3. GitHub Enterprise ………………………….. ………………………….. …………………… 13
2.4. GitFlow ………………………….. ………………………….. ………………………….. …….. 14
3. Applying Git LFS ………………………….. ………………………….. ………………………… 16
3.1. Target Repositories ………………………….. ………………………….. …………………. 16
4. Repository Analysis Proposed by Lars Schneider ………………………….. ………… 18
4.1. General Considerations ………………………….. ………………………….. …………… 18
4.2. Analysis Methods ………………………….. ………………………….. …………………… 20
4.3. Remarks ………………………….. ………………………….. ………………………….. …….26
5. Repository Analysis ………………………….. ………………………….. …………………….. 29
5.1. General Considerations ………………………….. ………………………….. …………… 29
5.2. Analysis Methods ………………………….. ………………………….. …………………… 30
5.3. Analysis Result ………………………….. ………………………….. ………………………. 36
6. Migration to Git LFS ………………………….. ………………………….. ……………………. 39
6.1. Migration Strategies ………………………….. ………………………….. ……………….. 39
6.2. Migration of Target Repositories to Git LFS ………………………….. ………….. 42
7. Test ………………………….. ………………………….. ………………………….. ……………….. 44

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 3
7.1.1. Test Medium Size Repository ………………………….. …………………………. 44
7.1.2. Test Big Size Repository ………………………….. ………………………….. …….47
8. Conclusion ………………………….. ………………………….. ………………………….. ……… 51
Acknowledgemen t ………………………….. ………………………….. ………………………….. …53
References ………………………….. ………………………….. ………………………….. ……………. 54

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 4
1. Binary File Storage

1.1. Binary File Storage Problem

Git, like most of the version control systems, is designed to track the source code. However, it is
almost impossible to build a sof tware system only with text files and avoid including binary files.
The real problem comes when the binary files need to be updated constantly. When updating a
binary file, Git cannot store the delta (the list of modifications from a version to another), so it
stores the entire content of the file. By this, the repository size increases. Just taking into
consideration that a single binary file is edited and committed for 10 times, the Git repository
size will increase with the equivalent of 10 such files p lus the size of changes from a version to
another.
Practically, software systems contain plenty binary files which are updated often. It is
recommended to separate as much as possible the code from the binary content, but it is
impossible to reach a 100% separation. A reliable example is a system which contains a lot of
graphic content. If the recommendation is taken by the book, it would imply to store the text
source code in Git and to have another version control system for graphic content only. This
leads to new challenges such as keeping the traceability between version control systems and
having a strong build system which helps integrate everything. If the continuous integration
factor is added to the discussion, then a new challenge appears when set ting up the entire
environment. Shortly said, when talking about large software systems which include not only
text files, but also binary content, it is impossible to reach a perfect separation of data and, in the
same time, to have everything else runnin g. Separating text from binary improves the
performance when working with Git, but it would imply a chain of challenges when taking into
consideration the entire process the software development faces, from keeping the traceability
requirements – source co de to building the entire system and preparing the environment for
continuous integration. In such situations, the separation of text content from the binary one is
more time and cost consuming than maintaining a large repository.
The large binary content storage is a problem all companies which develop software face
nowadays. The inconvenience is not only the maintenance of big size repositories, but also the
slow performance of all commands to such repositories. Multiplying this drawback by the
number of people working on a project and by the number of projects a company is involved in,
the challenge to solve the problem becomes even bigger.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 5
1.2. Alternatives for Binary File Storage Problem

Checking large binary files into a distributed version control system is a bad idea because
repository size quickly becomes unmanageable. Numerous operation s take longer to complete
and cloning a repository turns into an infinite process .
Since the binary file storage problem is a daily trouble, there are multiple third par ty
implementations that try to solve this. In the following paragraphs I give a short overview about
the alternatives.
A. git-annex
In git-annex , tracked binary files are stored separately; in a different location from the one the
source code is stored. In th e original repository, a symbolic link to the key of file in the new
location is created.
In their blog, workingconcept.com describes git-annex like an “ experienced librarian waiting at
the information desk ” because “ you need to walk up to ask your questio n or return your book,
but the librarian can help you in a variety of ways by getting books moved around, checking up
on things, and generally being a pro at cataloging stuff ”.
The following example explains how git-annex can be used in a Git repository.
Step 1 : Considering an empty G it repository, it has to be configured in order to use git-annex by
running git annex init .
Step 2 : Files are added normally in the working directory and staged with git add –all. In
order to start tracking with git-annex , the files are transformed into symbolic links by running
git annex add –all. Once the files are turned into symlinks, the relative objects can be
found in .git/annex /objects directory.
Step 3 : The remote repository is updated with git push origin . The important thing is that
files are not pushed to the origin, but only the symbolic links that point to them. The only
location the files exist is the local machine where the repository has been initially created. In
order to push upstream the files the mselves, the content on the server has to be synchronized by
running git annex synch –content .
Concluding, git-annex is a bit less focused in its approach. Git-annex uses its own commands
for working with files , which makes its learning curve bit steeper than other alternatives that rely
on filters. Git-annex has been written in H askell, and the majority of it is licensed under the

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 6
GPL, version 3 or higher. Because git-annex uses symlinks, Windows users are forced to use a
special direct mode that makes us age more unintuitive.
B. git-fat
With git-fat, large files which are going to be tracked are specified in .gitattributes . It
uses normal G it commands to interact with the repository without thinking about which files are
fat and non -fat. The fat files are treated specially. Git-fat allows the users to separate the storage
of large files from the source while still having them in the working directory of the project.
The Python Software Foundation gives a good starting example about how to work with git-fat:
Step 1 : Files to be tracked by git -fat have to be specified in the .gitattributes file.
Step 2 : Configure the remote used for storing b inary files in .gitfat file. Optionally, the ssh
user and pot can be specified if non -standard and also an http remote for anonymous clones.
Step 3 : git-fat has to be initialized and then files which match the pattern in .gitattributes
will be track.
Step 4: Fat files can be pushed to remote and pulled from the remote, git fat push and git
fat pull can be used.
Git-fat is licensed under BSD 2 license. Git-fat is developed in Python which creates more
dependencies for Windows users to install. However the i nstallation itself is straightforward with
pip.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 7
C. git-media
Licensed under MIT license and supporting similar workflow as the above mentioned alternative
git-fat, git-media is probably the oldest of the solutions available. Git-media uses the similar
filter approach and it supports Amazon's S3, local filesystem path, SCP, atmos and WebDAV as
backend for storing large files. Git-media is written in Ruby which makes installation on
Windows not so straightforward. The main drawback is that it is not developed anymore; it is
hardly understandable because it provides ambiguous commands and, moreover, it i s not fully
Windows compatible.
The following chart (Fig 1.2.4) relates the trends in this area; it shows the worldwide searches in
the last 5 years. Git LFS leads in trends compared to git-annex , git-fat and git-media .

Other alternatives which are not as popular as the ones presented above are:
 git-bigfiles : It is a fork of Git project which may make them non -compatible. Even so,
this project is dead.
 git-bigstore : It was initially implemented as an alternative for git-media .
 git-sym: It is the newest project in this area, but it provides complex commands and the
integrity of data is questionable at the moment.
Fig. 1.2.4. – Trends in the binary files topic

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 8
1.3. Git LFS (Git Large File Storage)

Git LFS is an alternative for the binary file storage problem, developed by Atlassian, GitHub and
some open source contributors. Its main purpose is to reduce the impact of la rge binary files in
the repository by downloading the relevant file version at checkout process rather than during
cloning or fetching. Git LFS replaces the specified files in the repository with pointer files which
are never visible in the daily work, the y are handled automatically by Git LFS.
There are the following scenarios when working with Git LFS:
Scenario 1 : Whenever a file which is tracked by LFS is added, Git LFS replaces its content with
a pointer file and stores the original content in the local LFS cache.
Scenario 2 : Whenever a push is performed, all files which are referenced in the local repositories
are pushed from the local LFS cache to the LFS location which is linked to the original remote
repository.
Scenario 3 : Whenever a checkout operat ion is performed, the files tracked by LFS, which are
initially pointers, are replaced with the content itself either from the local LFS cache or from the
remote LFS storage.

1.4. Working with Git LFS from Scratch

When introducing Git LFS in a new repository , there are some things which shall be taken into
consideration. Before starting to work with it, Git LFS shall be available on the local machine.
There are two options to do it: either install Git LFS separately or install the latest version of Git
(great er than 2.12) which includes LFS.
In a new repository, LFS is enabled by running git lfs install . Objects tracked by LFS are
specified in the .gitattributes file. It is the responsibility of the project manager to decide
what is going to be tracked in the next project. Whenever it is decided to track a certain file or
extension, git lfs track what_to_track is used. The effect of this command is that
.gitattributes is updated with the specified glob and, when adding a file which matched the
glob, Git LFS will replace its content with a pointer and will store the content in the local Git
LFS cache. When pushing the repository to the server, the LFS objects will be st ored separately,
so the actual size of the repository will be built up based on the non -LFS objects.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 9
When working with Git LFS, users have to ensure that the pre-push hook is enabled. The pre –
push hook is a script stored in .git/hooks location in each rep ository. Its purpose is to transfer
Git LFS object to the server during the push operation.
There are two options to clone a repository which contains files tracked by LFS. The first one is
to use the normal git clone , which brings the repository locally a nd during the checkout it
downloads the LFS objects from the LFS server. The second option is to use git lfs clone ,
command which spends up the cloning process. It waits until the checkout is complete and then
downloads the LFS objects as a batch. Atlassia n explains this performance improvement as
following: “ This takes advantage of parallelized downloads, and dramatically reduces the
number of HTTP requests and processes spawned (which is especially important for im proving
performance on Windows) ”. When cl oning a repository which contains files tracked by LFS, the
pre-push hook is automatically enabled.
In order to update the local repository with the new changes on the remote, git pull or git
lfs pull can be used. Just like clone, the performance improvement of git lfs pull relies
on the download of LFS objects as a batch.
Migrating a Git LFS repository can be done by performing the following steps:
Step 1 : Create a bare clone of the repository: git clone –bare …
Step 2 : Download the LFS objects: git lfs fetch –all
Step 3 : Mirror push the local repository to the new location: git push –mirror …
Step 4 : Push the LFS objects to the new location using the HTTP repository URL: git lfs
push –all …
Observations: during the migration process, when pushing the LFS objects to the new location,
some files are marked to be skipped. The reason is that server already contains the objects. An
error has been encountered when trying to push the LFS objects usi ng the SSH URL; a quick
solution is to use the HTTP one.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 10
2. Introduction to Git

2.1. Git, a Distributed Version Control S ystem

Git is a version control system used for tracking file modification and coordinating work on
those files which are updated by differ ent persons in the same time. It is mainly used for software
development, but it can also track changes is any set of files. As a distributed version control
system, its strengths are: speed, data integrity and support for non -linear workflows.
The one who put the basis of Git is Linus Torvalds. He created Git in 2005 for the development
of Linux kernel, with other kernel developers contributing to its initial development. Since 2005,
Junio Hamano took the responsibility of maintaining Git.
As with most ot her distributed version control systems, and unlike most client –server systems,
every Git directory on every computer is a full -fledged repository with complete history and full
version tracking abilities, independent of network access or a central server.
The major difference between Git and any other version control systems is the way data is stored
in Git. “Conceptually, most other systems store information as a list of file -based changes. These
systems (CVS, Subversion, Perforce, Bazaar, and so on) thin k of the information they keep as a
set of files and the changes made to each file over time ”.(Pro Git, 1st Edition 2009)
In the Pro Git book, the way of storing data in Git is compared to a snapshot of a miniature
filesystem: “every time a commit is creat ed or the state of the project is saves, it basically takes a
picture of what all the files look like at that moment and stores a reference to that snapshot. To
be efficient, if files have not changed, Git does not store the file again, just a link to the previous
identical file it has already stored ”.
Centralized VCS Distributed VSC
History of changes on a central server (single
repository for all developers) ; Local copy of the entire work’s history ;
Version checkout of the project ; No partial checkout of the project ;
Updates are performed directly onto the
central repository ; Updates go from the working copy to the
developers local repository;
An additional synchronization step is needed
to exchange content between the distributed
repositor ies;
Repository can be used only if connected to it ; It is not necessary to be online to change
revisions or add changes ;

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 11
2.2. Git Object Model

A Git repository is nothing more than a collection of objects, each object having an identifier
called SHA. The hash is generated based on the content of the file and it is used to uniquely point
to a specific version of a file.
Every object is defined with these three things: type, size and content. The size represents the
size of the contents and the contents depe nd on what type of the object. T here are fo ur different
types of objects: blob , tree, commit and tag.
A blob object is nothing but a chunk of binary data. It does not refer to anything else or have
attributes of any kind, not even a file name. Since the bl ob is entirely defined by its data, if two
files in a directory tree (or in multiple different versions of the repository) have the same
contents, they will share the same blob object. The object is totally independent of its location in
the directory tree , and renaming a file does not change the object that file is associated with.
A tree can be compared to a directory – it references a bunch of other trees a nd/or blobs (i.e. files
and sub directories) . A tree object contains a list of entries, each with a mode, object type, SHA1
name, and name, sorted by name. It represents the conte nts of a single directory tree.
An object referenced by a tree may be blob, representing the contents of a file, or another tree,
representing the contents of a subdirectory. Si nce trees and blobs, like all other objects, are
named by the SHA1 hash of their contents, two trees have the same SHA1 name if and only if
their co ntents (including, recursively, the contents of all subdirectori es) are identical. This allows
Git to quickl y determine the differences between two related tree objects, since it can ignore any
entries with identical object names.
A commit points to a single tree, marking it as what the project looked like at a certain point in
time. It contains meta -information about that point in time, such as a timestamp, the author of the
changes since the last commit, a pointer to the previous commit(s), etc.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 12
A commit is defined by:
 A tree: The SHA1 name of a tree object, representing the contents of a directory at a
certa in point in time.
 Parent(s): The SHA1 name of some number of commits which represent the immediately
previous step(s) in the history of the project.
 An author: The name of the person responsible for this change, together with its date.
 A committer: The na me of the person who actually created the commit, with the date it
was done. This may be different from the author; for example, if the author wrote a patch
and emailed it to another person who used the patch to create the commit.
 A comment describing this commit.
Note that a commit does not itself contain any information about what actually changed; all
changes are calculated by comparing the contents of the tree referred to by this commit with the
trees associated wi th its parents. In particular, G it does not attempt to record file renames
explicitly, though it can identify cases where the existence of the same file data at changing paths
suggests a rename.
A tag is a way to mark a specific commit as special in some way. It is normally used to tag
certain commits as specific releases or something along those lines. A tag obj ect contains an
object name , object type, tag name, th e name of the person who created the tag, and a message,
which may contain a signature .
The Git Object Model is very well depicted i n Git Book. For a given repository structure (Fig.
2.2.1.) , the following objects (Fig. 2.2.2.) are presented:

Fig. 2.2.1. – Project structure

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 13

2.3. GitHub Enterprise

GitHub is a web -based Git or version control repository and Internet hosting service. It offers all
of the distributed version control and source cod e management functionalities of Git as well as
adding its own features. It provides access control and several collaboration features such as b ug
tracking, feature requests, task management, and wikis for every project.
GitHub Enterprise is similar to GitHub's public service but is designed for use by large -scale
enterprise software development teams where the enterprise wishes to host their repo sitories
behind a corporate firewall.

Fig. 2.2.2. – Object Model based on Fig. 2.2.1

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 14
2.4. GitFlow

“GitFlow is a branching model for Git created by Vincent Driessen. It has attracted a lot of
attention because it is very well suited to collaboration and scaling the development team. ”
This branching model underlines the following advantages:
 Parallel development: the development is isolated from the released one. In the same
time, new development is done on feature branches without interfering with the existing
runnable code. When the feature is ready to be integrated, the feature branch is merged
into the main body of the code.
 Collaboration: feature branches encourage the collaboration because by this the features
are clearly separated and independently developed, which is easy for collaborators to
follo w.
 Release staging a rea: development is done in a separate branch, where all feature
branches are merged. By this, develop branch is considered a staging area for all the
components to be released.
 Support for emergency fixes: when a correction is required on a version which has been
released, it is done in hotfix branch – branched off of the release branch. By this, the
interference with the current development is avoided, the correction is performed only on
the released version, then it can be integrated in the develop branch in order to make the
fix available for the next release.
New development is done in feature branches which are branched off of the develop branch.
When the feature is fully implemented, it is merged into the develop branch, so it is r eady for
release. Releases are kept out of the development, so they are done in release branches –
branched from develop branch. The code in the release branch is not yet given to the customer,
but it follows the deploy > test >fix > redeploy > retest cycle until the code meets the required
quality standards. The final version is merged into master and into develop. The revision on
master is the one delivered to the customer. Hotfixes are done in hotfix branches and merged
back into master and develop to en sure the hotfix is not accidentally lost when the next release in
planned.
Vincent Driessen depicted the GitFlow as following – Fig. 2.4.1 .

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 15

Fig. 2.4.1. – GitFlow

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 16
3. Applying Git LFS

3.1. Target Repositories

For the analysis, I propose two repositories which differ in size and also in the contained files.
The size of the first repository (hud2gen) is 1.17GB and the size of the second one (fpke) is
10GB.
The purpose is to analyze the entire history of the repos itories, to observe different types of files
and different project structures, so at the end a list of files which could be tracked by LFS can be
provided.
The development in the proposed repositories is based on GitFlow . Git is a non -linear version
contro l system. In such a version control system, the history of the project evolves on branches.
The GitFlow is recommended in large projects because the release of the project can be
controlled easily .
Since the development of project is done on several branches (Fig. 3.1.1. and Fig. 3.1.2.) , a full
repository analysis is needed in order to identify the proper candidates for Git LFS. The goal is
to use these candidates in the migration to L FS process and afterwards to observe the
improvements.
Fig. 3.1.1. – Medium size repository – branch development

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 17

The analysis has to be performed on a stable version. The proposed repositories are developed
and integrated continuously , so the need to transfer them f rom the productive serv er to a test
server is evident . After the migration to the test server, the repositories can be locked, so no other
users can access them and modify their content.

Fig. 3.1.2. – Large size repository – branch development

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 18
4. Repository Analysis P roposed by Lars Schneider

4.1. General C onsiderations

Lars Schneider has documented the impediments users can face when their repositories become
large. He extended the notion of large repository to 5 dimensions. Quoting, Lars says that
“exceeding certain thresholds for these dimension s can impact the performance of your local Git
operations dramatically ”.
The first notion he explains is the repository large by file size . Since Git has been designed for
source code, text files are handled very well. This means that Git runs a compressin g algorithm
on each file and stores the deltas from a modification to another. The problems occur when
binary files are introduced in the repository: for binary files, Git cannot store the deltas, which
means that for each modification performed on a certa in binary file, a new object with the new
size will be stored in the repository.
For example, starting with a 1GB binary file and considering the file is updated 5 times, every
time its size increases with 100MB, in the end the initial binary file will oc cupy approximately
6GB. Comparing this to the behavior of Git when handling text files, since only deltas are saved,
for an initial 1G B text file, after 5 modifications, the file will occupy less than 1.5GB in the end .
Due to the distributed nature of Git , a repository contains the entire history. If at some point, a
binary file was introduced in the repository and was modified several times, the cumulated size
of the file is being transferred every time the repository is cloned. Either the binary file is still
present in repository or has been deleted, it occurs at some point in the history, which leads to the
behavior explained above.
Lars underlines the idea that file size is not the only factor which influences the Git operations on
a repository. He comes with the following example: “ A set of 100 image files with a size of 0.3
MB each would usually have no significant impact on your repository size. However, if your
hard -working designer changes them every day, then your repository could grow in a sin gle year
to the uncomfortable size of 10 GB (0.3 MB x 100 files x 365 days). Larger files and even
smallish files that change often will always increa se your repository size ”.
Analyzing these examples , the two most important factors which increase the size of a repository
are: initial large binary files and plenty small binary files which change often. Depending on
proje ct, “large binary file ” and “plenty small binary files which change often” have different
mean ings: the thresholds for “large file”, “p lenty” and “change often” have different values.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 19
“Rule of thumb: Repositories with files smaller than 500KB are fine. ” (Lars Schneider)
The second notion Lars Schneider explains in terms of large repository is the repository large
by file count . A reposito ry can face slow performance of Git operations if there is a large
number of files in the head commit. “A common mistake that leads to this problem is to add the
source code of externa l libraries to a Git repository ” Lars points out. This affects negativel y not
only the performance of Git operations, but also the versioning of external libraries is lost and by
this the eventual bug fixing or corrections are not so easily manageable. If the repository “ has a
large number of files for legitimate reason, then a sparse checkout can be a viable option to
address the performance problems ”, but this is out of scope for this thesis.
“Rule of thumb: Repositories with less than a 100k files are fine. ” (Lars Schneider)
The third notion in the Lars’ article refers to repository large by number of commits . A long
running project with a deep history impacts the daily work of users. Performance problems occur
in this situation also when performing Git commands on the local repository. A solution is to
shallow clone; in ot her words, to clone a certain number of commits, to specify the depth of the
project. This method is not 100% viable if some operations which need the entire history are
desired.
“Rule of thumb: Repositories with less than a 100k commits are fine. ” (Lars Schneider)
Lars Schneider shortly explains the term of repository large by branch/tag count . This is
similar to the gigantic history where cloning and local Git operations can be slow.
“Rule of thumb: Repositories with less than a 10k branches and tags are fine.” (Lars Schneider)
The last notion is the repository large by submodules count . It does not make the subject of
this thesis.
“Rule of thumb: Re positories with less than a 25 s ubmodules are fine .” (Lars Schneider)
The focus is on repositories with lar ge files.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 20
4.2. Analysis Methods

Lars Schneider has developed a collection of shell and python sc ripts in order to analyze a Git
repository. The purpose of his scripts is to identify potential candidates which could decrease the
size of the repository in question. Lars proposes the following:
A. git-find-deleted -files.sh
B. git-find-dirs-many -files.sh
C. git-find-dirs-unwanted.sh
D. git-find-large -files.sh
E. git-find-lfs-extensions.py
In the coming paragraphs I am detailing each of the scripts.

A. git-find-deleted -files.s h
The script parses the entire history of a Git repository and counts the number of deleted files per
directory. The script must be called in the root of the target repository (root of the working
directory).
The o utput has the format :
[number_of_deleted_files][space][present|deleted][space][parent_dir_path]
In the latest commit on the master branch (2f28538) in the original repository, the following
steps are performed by the script:
Step 1 : identify deleted files in the entire history of the repository by running the command:
git -c diff.renameLimit=10000 log –diff-filter=D –summary | grep '
delete mode …… ' | sed 's/ delete mode …… //'
Step 2 : each path identified in the first ste p is treated as following:
if the parent directory still exists in the workin g directory
then its path is printed out preceded by the "present" message
else {
do {
directory_path = parent directory of directory_path;
} while(directory_path doesn't exist in the workig directory ||
directory_path is root);

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 21
print out directory_path preceded by the "deleted" message
}

Example:
dir/dir1/dir2/dir3/dir4/file has been detected as a deleted one in the first step ;
directory_path = dir/dir1/dir2/dir3/dir4
Does directory_path exist in working directory? => NO => directory_path =
dir/dir1/dir2/dir3
Does directory_path exist in working directory? => NO => directory_path =
dir/dir1/dir2
Does directory_path exist in working directory? => YES => print out " deleted
dir/dir1/dir2 "

Step 3 : sort the information by applying uniq -c, so in the end the output has the following
format :
[number_of_deleted_files][space][present|deleted][space][parent_dir_pa
th]

B. git-find-dirs-many -files.sh
The script parses the working directory of a repository and identifies directories having a number
of files which exceed a certain value. The threshold can be specified as parame ter; the default
value is 100. The script must be called in the root of the target repository (root of the working
directory).
The o utput has the format: [number_of_files][space][path]
In the latest commit on master branch (2f28538) in the original repository, the identification of
directories containing a large number of files is done as following:
Step1 :
DIRS=$(find . -type d -not -path "./.git/*" -exec bash -c
'COUNT=$(find "$0" -type f | wc -l); echo "$COUNT $0"' {} \; | sort –
r)
 find all directories, except directories under .git/;
 count the files in each directory (by going recursively in all sub -directories) ;

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 22
 for each entry in the list of directories, save the following information:
[number_of_files][space][relative_path_of_dir]
 sort the entries alphabetically ;

Step2 : for each entry, get the number of corresponding files; if this number exceeds the
threshold, the n break
for DIR in $DIRS; do
if [ $(($(echo $DIR | sed 's/ \..*//'))) -le $FILE_COUNT ]; then
break
fi
echo $DIR
done

C. git-find-dirs-unwanted.sh
The script searches in the entire history of a Git repository for potential unwanted directories.
The script must be called in the root of the target repository (root of the working directory).
The output has the format: [number_of_files][space][path] .
In the latest commit on master branch (2f28538) in the original repos itory, the identification of
unwanted directories is done as following:
Step1 : a list of unwanted directories is specified at the beginning of the script, as part of the
grep command; e.g. -e prv
Step2 : for each path which is identified as an "unwanted" one, if it still exists in the working
directory, then it is printed out preceded by the number of files (counting recursively in all sub –
directories); else (if it has been already deleted), the path is printed out preceded by "deleted"
message

D. git-find-large -files.sh
The script li sts the largest files from a Git reposit ory (compressed object size in G it). The
threshold (to print out only files greater than a certain size) can be specified as parame ter; the
default value is 500KB . The script must be called in the root of the target repository (root of the
working directory).

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 23
The o utput is displayed in a table format with the following columns:
 Size: the size o f the compressed object in the G it repository (KB) ;
 Head: specify if object belongs to head revision:
o Y – object belongs to the latest commit (by date) ;
o P – path of the object is in the latest commit (by date) ;
o N – neither object nor its path is in the latest commit (by date) ;
 Location: the location of the file in the git repository ;
Example:
Size Head Location
63932 N _out/10271953_VW_F/MQBVW14S1FGC044100_VW_04410001_RES_F.prg
63932 N _out/10244624_VW/MQBVW14S1FGC044100_VW_04410001_RES.prg
37152 N pkg/videowc/mdl/videowc.zip
..
1350 Y ide/core/doc/MSVC_ SCRIPTS_USER_MANUAL.chm
1345 N pkg/hmimodel/mdl/hmi/HMI_Model_unpacked_VW.zip
1343 N pkg/hmimodel/mdl/hmi/HMI_Model_unpacked_VW.zip
1330 N pkg/fls/core/HUDCM16S1C021200.mhx
1302 P pkg/dio/tool/DIOGen/out/PRT_MLBevo_HUD2G_v1.0.xlsm
1301 P pkg/dio/tool/DIOGe n/out/PRT_MLBevo_HUD2G_v1.0.xlsm

The script performs the following steps:
Step1 : list and sort all packed objects from .git/objects/pack
The official Git documentation explains the content of pack folder: “The initial format in which
Git sav es objects on disk is called a loose object format. However, Git occasionally packs up
several of these objects into a single binary file called a packfile in order to save space and be
more efficient. Git does this if you have too many loose objects around, if you run the git gc
command manually, or if you push to a remote server. The packing is done in
.git/objects/pack and it co ntains a packfile and an index ”:
 .git/objects/pack/pack -*.pack : file containing the contents of all the objects that
have been removed from the filesystem.
 .git/objects/pack/pack -*.idx : file containing offsets into the packfile.
If a file has several versions (objects), the latest one is stored intact, whereas the original version
is stored as a delta . The reason for this is the high probabilit y to access the newest version of a
file rather than an older one.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 24
In order to access the packed objects, Git provides the command git verify -pack -v
.git/objects/pack/pack -*.idx. The c ommand has been run on a small repository for
testing purpose, with the coming result:
2431da676938450a4d72e260db3bf7b0f587bbc1 commit 223 155 12
69bcdaff5328278ab1c0812ce0e07fa7d26a96d7 commit 214 152 167
80d02664cb23ed55b226516648c7ad5d0a3deb90 commit 214 145 319
43168a18b7613d1281e5560855a83eb8fde3d687 commit 213 146 464
092917823486a802e94d727c820a9024e14a1fc2 commit 214 146 610
702470739ce72005e2edff522fde85d52a65df9b commit 165 118 756
d368d0ac0678cbe6cce505be58126d3526706e54 tag 130 122 874
fe879577cb8cffcdf25441725141e310dd7d239b tree 136 136 996
d8329fc1cc938780ffdd9 f94e0d364e0ea74f579 tree 36 46 1132
deef2e1b793907545e50a2ea2ddb5ba6c58c4506 tree 136 136 1178
d982c7cb2c2a972ee391a85da481fc1f9127a01d tree 6 17 1314 1 \
deef2e1b793907545e50a2ea2ddb5ba6c58c4506
3c4e9cd789d88d8d89c1073707c3585e41b0e614 tree 8 19 1331 1 \
eef2e1b793907545e50a2ea2ddb5ba6c58c4506
0155eb4229851634a0f03eb265b69f5a2d56f341 tree 71 76 1350
83baae61804e65cc73a7201a7252750c76066a30 blob 10 19 1426
fa49b077972391ad58037050f2a75f74e3671e92 blob 9 18 1445
b042a60ef7dff760008df33cee372b945b6e884e blob 22054 5799 1463
033b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5 blob 9 20 7262 1 \
b042a60ef7dff760008df33cee372b945b6e884e
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a blob 10 19 7282
non delta: 15 objects
chain length = 1: 3 objects
.git/objects/pack/pack -978e03944f5c581011e6998cd0e9e30000905586.pack:
ok

The verify-pack provides the data in a table format: [SHA-1] [type] [size] [size-in-packfile ]
[offset -in-packfile ] [depth ] [base-SHA-1])
Step 2 : filter the output of git verify -pack in order to keep only the blobs (files), remove the
chain length and sort by the compressed size: grep blob | grep -v chain | sort –
k4nr
Step 3 : then, for each object entry , process the following information :
 extract the compressed size and return it in KB ; if it is lower than 500KB (or the
specified threshold), the entry is skipped ;

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 25
 extract the SHA -1 (blob identifier )
 identify the locatio n in the repository tree: git rev-list –all –objects |
grep $SHA | sed "s/$SHA //"
 identify if obj ect and the path are in the latest commit (by date) : git rev-list –all
–objects –max-count=1 | grep $SHA >/dev/null;
Step 4: print out the generated information (size, if in head revision, location in repo sitory ) for
each blob.

E. git-find-lfs-extensions. py
The script parses the working directory of a repository and identifies file extensions that could be
tracked by LFS. The threshold can be specified as parameter – size in KB; the default value is 0.5
MB. The script must be called in the root of the target repository (root of the working directory).
The output is displayed in a table format with the following columns:
 Type: file type: all, binary, text, binary w/o ext, text w/o ext;
 Extension: file extension;
 LShare: percentage of files with the curre nt extension that are larger than the threshold;
 LCount: number of files with the current extension that are larger than the threshold;
 Count: total number of files with the current extension;
 Size: [MB] size of all files with the current extension;
 Min: [MB] size of the smallest file with the current extension;
 Max: [MB] size of the largest file with the current extension;
In the end, it prints out a recommendation for files which could be tracked by LFS.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 26
4.3. Remarks

After investigating the proposal of Lar s Schneider, I came to the conclusion that the repository
analysis is insufficient and it needs improvements . My remarks lead to the necessity to create
new scripts having as base the p roposal of Lars, ideas which are detailed in the next chapter.
In the coming paragraphs I explain the inconsistencies I have found.
The major remark is related to git-find-large -files.sh . In the identification of files which are
present in the HEAD commit, Lars Schneider uses git rev -list –max-count=1
command. The inconsis tency is that git-rev-list –max-count=1 analyzes the latest
commit by date, not the HEAD commit. The problem comes when further an alysis is required
starting from the output of git-find-large -files.sh .
There are two main approaches when tracking files with LFS: track files by extension and track
files by relative path . Taking into consideration the second option, track files by relative path, a
deep analysis of paths in repository is necessary. This impl ies the observation of a specific
commit, the project structure and the files included in some directories.
The challenge is to obtain an accurate analysis of the branches where the development is active.
As I explained in the previous chapter, Target Repositories , the repositories in question follow
the GitFlow, which means the develop branch ( develop is a generic name; it can differ, but its
purpose is the same) is going to be observed. Since the development is done on feature branches,
there will always be new commits on the feature branches compared to the latest commit on
develop branch. The path analysis consists in the observation of the latest commit on the develop
branch.
The git-find-large -files.sh script pa rses the entire history of a repository and prints out objects
which are greater in size than a specific threshold followed by the information related to their
presence in the HEAD commit , information which reflects in reality the presence in the latest
commit by date . This value is not valid anymore when the latest commit by date differs from the
HEAD commit, scenario encountered all the time.
The second major remark is related to the behavior of git rev-list command. It is exp lained
by the following two use cases :
Use Case 1 : create a file, update its content and move it to another folder. Perform the operations
on different branches and observe the output of git rev-list .
Step 1 : in an empty local repo sitory , on master branch, create file FileA and commit ;

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 27
Output of rev-list –objects –all:
blob of FileA : 7898192261 3b2afb6025042ff6bd878ac1994e85 FileA
Step 2 : on master branch, update FileA and commit ;
Output of rev-list –objects –all:
blob of FileA from S tep 1: 7898192261 3b2afb6025042ff6bd878ac1994e85 FileA
blob of the new FileA : 204dd92b61 ad8d300efe79df681287af1074b3eb FileA
Step 3 : on master branch, move FileA to folder/FileA and commit ;
Output of rev-list –objects –all:
blob of FileA from S tep 1: 7898192261 3b2afb6025042ff6bd878ac1994e85 FileA
blob of FileA from S tep 2, but with updated path:
204dd92b61ad8d300efe79df681287af1074b3eb folder/ FileA
Step 4 : on branch, move folder/FileA to folder/subfolder/ FileA and commit ;
Output of rev-list –objects –all:
blob of FileA from S tep 1: 7898192261 3b2afb6025042ff6bd878ac1994e85 FileA
blob of FileA from S tep 2, but with updated path:
204dd92b61ad8d300efe79df681287af10 74b3eb folder/subfolder/FileA
Step 5 : checkout master and perform no changes ;
Output of rev-list –objects –all:
blob of FileA from S tep 1: 7898192261 3b2afb6025042ff6bd878ac1994e85 FileA
blob of FileA from S tep 2, but with updated path:
204dd92b61ad8d300efe79df681 287af1074b3eb folder/subfolder/FileA
Remark: th e path in working dir ectory is folder/FileA !
Use Case 2 : add files with the same content, change the location of files and observe the
behavior of git rev-list.
Step 1 : on master branch, add FileA and FileB with the same content and commit ;
Output of rev-list –objects –all:
blob of FileA: a718f2e0247887 686429e1c7c3965b6efbb6338e FileA

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 28
Remark: the path of FileB is missing!
Step 2 : on master branch, move FileB to folder/FileB and commit ;
Output of rev-list –objects –all:
blob of FileA : a718f2e024788768642 9e1c7c3965b6efbb6338e FileA
Remark: the path of FileB is still missing !
Step 3: on master branch, add FileC with same content as FileA and commit ;
Output of rev-list –objects –all:
blob of FileA : a718f2e0247887686429e1c7c3965b6efbb6338e FileA
Remark: the path of FileB and FileC are missing!
Step 4 : on master branch, add File1 (alphabetically before FileA ) with same content as FileA
and commit ;
Output of rev-list –objects –all:
blob of File1 : a718f2e0247887 686429e1c7c3965b6efbb6338e File1
Remark: the path of FileA , FileB and FileC are missing!

After following the above two use cases, I have reached the conclusion that blob object always
keeps the latest path. When checking out an older revision (where the same blob has another
path) and running git rev-list , the path of t he blob in question is the last updated one, not
the one present in HEAD revision. If multiple paths exist in the latest commit all pointing to the
same blob object (that means they have the same content) , then only the alphabetically first path
is in the rev-list .

A third minor remark is related to git-find-lfs-extensions.py . The original script checks for
existence of arguments instead of checking for threshold argument. First argument represents the
script itself, second arg ument represents the threshol d. If the script is called without the threshold
argument , then it throws an error. The fix implies the following: change if len(sys.argv) to
if len(sys.argv)>1
This fix is included in the pull req uest https://github.com/larsxschneider/git -repo-analysis/pull/7

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 29
5. Repository Analysis

5.1. General Considerations

The repository analysis proposed by Lars Schneider is the starting point for this chapter. The
analysis methods presented in the sext subchapter came after inspecting and understanding the
concept of repository analysis and the necessity to have this information in order to prepare the
repository for FLS.
Another factor which contr ibuted to the apparition of new scripts is the execution time of Lars’
scripts. The shell scripts have been run on Windows, in Git Bash environment, which lead to a
bad execution time .
The execution time have to be drastically reduced in the idea that seve ral analysis steps are
needed. It is hardly acceptable to start the analysis and to get the output in several days,
especially when a repetitive analysis may be required. On the other hand, the current analysis is
incomplete, so any extension of the initia l scripts will lead to a worse execution tim e in the
presented environment.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 30
5.2. Analysis Methods

In order to gather accurate information about the repository content and its evolution in time, I
propose the following :
A. git-find-large -files.pl (improved)
B. git-extension -metrics.pl
C. git-path-metrics.pl
D. git-change -rate-metrics.pl
There is a wrapper available, git-lfs-analysis.pl , which performs the full repository analysis by
calling the scripts in the proper order. The scripts can be also run independently.
The order of scripts execution is as listed above. The git-find-large -files.pl is the first one
because it generates the information related to the entire content of repository, output which is
going to be used by the subsequent scripts. Besides the output of git -find-large -files.pl, git-
extension -metrics.pl , git-path -metrics.pl and git-change -rate-metrics.pl need additional
information which is taken from a configuration file.
The following parameters can be configured in the configuration file:
 OBJECT_THR ESHOLD : Threshold of compressed object size i n bytes to see a file as large ;
 SHARE_NUM_THRESHOLD : Threshold in % for share of large files count = (number of
files with object size greater than OBJECT_THRESHOLD ) / all files [only files matching
the path] ;
 SHARE_SIZE _THRESHOLD : Threshold in % for share of large files size = (sum of object
size of files with object size greater than OBJECT_SIZE ) / sum of object size of all files
[only files matching the path] ;
 CSV_INPUT_FILE : Comma separate values (*.csv) inp ut file to be analyzed ;
 CSV_SEPARATOR : Separator for *.csv output file. Accepted values: , (comma) and ;
(semicolon). The value must be quoted .
 CHANGE_RATE_THRESHOLD : Threshold in % for number of changes ;

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 31
The script depend encies are depicted below (Fig. 5.2.1.) .
Fig. 5.2.1. – LFS Analysis

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 32
A. git-find-large -files.pl
The initial script need s to be updated in order to collect the following information in a table
format: file size, file compresses size, information related to the existence in HEAD commit,
blob SHA, file path and file extension.
The improved script generates a file containing size and path informatio n for each file in the
current G it repository. Intermediate outputs are generated into the cache direct ory. If 'cacheonly '
is specified then al l Git commands are skipped, but the content of the cache directory is used
instead. Lars Schneider generated the information related to packed objects and stored it in an
internal variable. For big repositories (greater in size than 10GB), the process is c onsiderably
slowed down because a huge amount of data is stored internally. By caching information, the
execution time is reduced. Moreover, if another round of analysis is required on the same
reposit ory, then the interrogation of G it objects can be skipp ed.
The script accepts three parameters: output file, the separator which is used to delimitate the
information in the output file and a cache directory.
The output file is displayed in a table format with the following columns:
 Column A: Object Size;
 Colu mn B: Object Compressed Size;
 Column C: Head: Y or N – information related to the path presence in the HEAD
revision;
 Column D: Object SHA;
 Column E: Location: Object Path;
 Column F: File Extension;
 Column G: Repository Size [Bytes] plus additional comments;
The output generation is done in the following steps:
Step 1 : Cache information related to packed, all objects in repository and objects in HEAD:
VerifyPackFile : git verify -pack -v $GitDir/objects/pack/pac k-*.idx , where
$GitDir represents the .git folder.
RevListFile: git rev-list –all –objects
HeadListFile: git ls-tree -r $Tree , where $Tree represents the HEAD commit.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 33
Step 2 : Get the repository size in bytes ; it is needed by subsequent scripts. It is preferred to get
this information internally, rather than to get it manually and pass it forward as parameter.
Step 3 : Read the RevListFile into a hash: [ object_sha ] [object_path ]
Step 4 : Read the HeadListFile into a hash: [ object_sha ] [object_path ]
Step 5 : Iterate over all objects in the repository (information stored in VerifyPac kFile) and
generate the output file .
Remark: If multiple paths (mov es) point to the same blob then only the latest committed path
will be mentioned in RevList File. If multiple pat hs (copies) point to the same blob then only the
alphabetically first path will be mentioned in RevList File.

B. git-extension -metrics.pl
The script perfo rms an extension analysis on a G it repository by analyzing the output of git-find-
large -files.pl and computing different metrics for extensions . The metrics are displayed in a
table format with the following columns:
 Column A: Extension;
 Column B: File type: BIN (binary file) or TXT (text file);
 Column C: Large Count Share: Share of large objects ( greater than
OBJECT_THRESHOLD ) count compared to all objects count [%] ;
 Column D: Criterion 1 – Large Count Share > SHARE_NUM_THRESHOLD : YES or NO;
 Column E: Large Size Share: Share of large objects (greater than OBJECT_THRESHOLD )
size compared to all objec ts size [%] ;
 Column F: Criterion 2 – Large Size Share > SHARE_SIZE _THRESHOLD : YES or NO;
 Column G: Packed size of all objects [MB] ;
 Column H: Unpacked size of all objects [MB] ;
 Column I: Repo Packed Size Share [%]: Packed size of all objects / repository s ize [%] ;
 Column J: Additional comments;
 Column K: Large count: Number of large objects (objects greater than
OBJECT_THRESHOLD );
 Column L: All Count: Number of all objects
 Column M: Large Packed Size [MB] : Packed size of large objects (objects greater than
OBJECT_THRESHOLD );
 Column N: Large Packed Size [Bytes];

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 34
 Column O: All Packed size[Bytes]: Packed size of all objects;
 Column P: All size [Bytes]: Unpacked size of all objects;
 Column Q: Repository Size [MB];
 Column R: Repository Size [Bytes];
The columns Q and R store general information, so they are not filled for each extension. The
information related to r epository size is stored in Q2 and R2, whereas the metrics information
start with line 3 (the first 2 lines in the output file r epresent the header and contain general
information).
The challenge is to determine the file type for files which are not present on the disk. In order to
obtain this info rmation, the algorithm used by G it to determine the file type has been
implemented . The algorithm consists in subtracting the first 8000 characters from the blob
content and searching for NULL character; if NULL is found in the chunk, then the file is
considered to be binary, else it is a text file. Since G it provides a method to access th e file
content even if it is not physically present on the disk ( git cat-file –p [blobSHA] ), the
identification of file type can be implemented easily.

C. git-path -metrics.pl
The script performs a path analysis on a G it repository by analyzing the output o f git-find-large –
files.pl and computing different metrics for paths . The metrics are displayed in the same table
format as the output of git-extension -metrics.pl ; the only difference is that column A holds the
path.
Besides the configuration file, the scr ipt needs as input a list of paths to be investi gated. This list
of path has to be generated manually, there is no automatic process which prints out this
information. In order to create the list of paths to be passed to git-path -metrics.pl , I propose the
TreeSize tool. TreeSize is a disk space manager for Windows which helps not only in
visualizing the disk space usage and giving a detailed analysis to the lowest directory levels, but
also in exporting numerous reports. This software can be used in order t o investigate the folders
population in a G it repository and the allocated disk space for each path in repository.
The path metrics are computed as following:
Step 1 : Iterate over the output of git-find-large -files.pl and, for each entry, try to match it with
the value stored in the Location column.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 35
Step 2 : If there is a match between the current entry in the output of git-find-large -files.pl and a
path from the given list, store the needed information in an internal hash table.
Step 3 : When the iteration over the objects in the repository is done (Step 1), then print out the
information st ored in the internal hash table.

D. git-change -rate-metrics.pl
The script generates a table of change rates of blobs in a G it repository. It analyzes each path
present in the output of git-find-latge -files.pl and displays the following information:
 Column A: Path: path of the blob relative to the repository;
 Column B: file type: TXT or BIN;
 Column C: Criterion: Change Rate greater or equal than CHANGE_RATE_THRESHOLD ;
 Column D: HEAD: Y or N – information related to the presence of the path in the HEAD
revision;
 Column E: Change Rate [%]
 Column F: Last Compressed Size [MB]
 Column G: Last Compressed Size [Bytes]
 Column H: Number of Changes;
 Column I: Number of all commits;
Lars Schneider explains the problem of small files which change often and their impact to the
repository size. He underlines the fact that not only large files contribute to the performance
decrease, but also a large number of small files which are permane ntly updated.
The concept of Lars led to git-change -rate-metrics.pl script. The purpose of this script is to
observe the evolution of files during the development of the project. It counts the number of
changes and it report this value to the total numbe r of commits. The idea is to find files which
change often by comparing the change rate to the threshold defined in the configuration file.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 36
5.3. Analysis R esult

The target repositories have been analyzed using the proposed scripts, so in the end a list of
candidates for LFS has been provided.
For the analysis of both repositories, the medium size one (hud2gen) and the large size one
(fpke) , the following values have been set in the configuration file:
OBJECT_THRESHOLD =512000 , SHARE_NUM_THRESHOLD =20, SHARE_SIZE_THRESHOLD =50,
CHANGE_RATE_THRESHOLD =15.
The medium size repositories contains 54023 blobs, the largest one is a mhx file with the size of
30511129 bytes, the compressed size of the specific blob is 8037192 bytes. The list of objects is
printed out in t he large-files.csv file (the output of git-find-large-files.pl ). A
sample of large-files.csv for the medium size repository is annexed to the thesis –
ANNEX B .
The highest change rate in the medium size repository belongs to a zip file, 11%; the file has
been modified 366 times out of 3481 total commits in the repository. Even so, this value does not
exceeds the CHANGE_RATE_THRESHOLD . A sample of change-rate-metrics.csv for the
medium size repository is annexed to the thesis – ANNEX C .
The extension metrics suggest the following extensions as candidates for LFS in the medium size
repository:
*.7z => consumes 21.37 MB of repository (Unpacked size = 40.12 MB)
*.abs => consumes 4.35 MB of repository (Unpacked size = 12.38 MB)
*.bin => consumes 22.56 MB of repository (Unpacked size = 74.17 MB)
*.bpl => consumes 0.65 MB of repository (Unpacked size = 2.04 MB)
*.chm => consumes 2.38 MB of repository (Unpacked size = 2.66 MB)
*.ncb => consumes 2.52 MB of repository (Unpacked size = 8.00 MB)
*.pdb => consumes 4.10 MB of repository (Unpacked size = 35.03 MB)
*.pdf => consumes 3.63 MB of repository (Unpacked size = 5.96 MB)
*.ppt => consumes 3.67 MB of repository (Unpacked size = 8.11 MB)
*.prg => consumes 182.08 MB of repository (Unpacked size = 573.44 MB)
*.ttf => consumes 2.47 MB of repository (Unpacked size = 4.94 MB)
*.udb => consumes 30.87 MB of repository (Unpacked size = 119.90 MB)
*.vfc => consumes 0.73 MB o f repository (Unpacked size = 3.39 MB)
*.xlsm => consumes 5.58 MB of repository (Unpacked size = 6.82 MB)
*.zip => consumes 498.28 MB of repository (Unpacked size = 6903.38 MB)
=> 785.26 MB (= 65%) of 1202.47 MB (repository size) could be t racked by Git
LFS (Unpacked size of LFS objects = 7800.32 MB)

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 37
A sample of extension -metrics.csv for the medium size repository is annexed to the thesis
– ANNEX D .
The path analysis in the medium size repository has been performed on a list of paths provide d a
priori. A sample of the path-metrics.csv for the medium size repository is annexed to the
thesis – ANNEX E .
The big size repository has been analyzed in the same manner as the medium size one. The first
step was to generate the list of objects in large-files.csv . A sample of large -files.csv for the
big size repository is annexed to the thesis – ANNEX F . The big size repositories contains
184655 blobs, the largest one is a dat file with the size of 1874441 bytes, the compressed size of
the specific blob is 690 bytes.
The highest change rate in the big size repository belongs to a txt file, 3%; the file has been
modified 1225 times out of 35107 total commits in the repository. Even so, this value does not
exceeds the CHANGE_RATE_THRESHOLD . A sample of change-rate-metrics.csv for the
gib size repository is annexed to the thesis – ANNEX G.
The extension metrics suggest the following extensions as candidates for LFS in the big size
repository:
*.7z => consumes 159.55 MB of repository (Unpacked size = 159.50 MB)
*.abs => consumes 0.96 MB of repository (Unpacked size = 3.20 MB)
*.bin => consumes 434.22 MB of repository (Unpacked size = 880.99 MB)
*.bpl => consumes 0.65 MB of repository (Unpacked size = 2.04 MB)
*.chm => consumes 173.87 MB of repository (Unpacked size = 175.14 MB)
*.dla => consumes 5.66 MB of repository (Unpacked size = 25.21 MB)
*.gz => consumes 1.10 MB of repository (Unpacked size = 1.10 MB)
*.ilk => consumes 10.58 MB of repository (Unpacked size = 59.67 MB)
*.ipl => consumes 6.02 MB of repository (Unpacked size = 21.61 MB)
*.jar => consumes 260.56 MB of repository (Unpacked size = 513.11 MB)
*.mer => consumes 257.67 MB of repository (Unpacked size = 528.12 MB)
*.mes => consumes 167.09 MB of repository (Unpacked size = 5179.73 MB)
*.nupkg => consumes 0.57 MB of repository (Unpacked size = 0.88 MB)
*.pdb => consumes 18.39 MB of repository (Unpacked size = 139.75 MB)
*.pdf => consumes 71.34 MB of repository (Unpacked size = 91.10 MB)
*.ppe => consumes 39.45 MB of repository (Unpacked size = 103.17 MB)
*.pptx => consumes 7.15 MB of repository (Unpacked size = 9.35 MB)
*.sdf => consumes 133.40 MB o f repository (Unpacked size = 1608.75 MB)
*.swf => consumes 9.63 MB of repository (Unpacked size = 9.64 MB)
*.uboot => consumes 4.32 MB of repository (Unpacked size = 4.34 MB)
*.uimage => consumes 3.53 MB of repository (Unpacked si ze = 3.54 MB)
*.whl => consumes 7.50 MB of repository (Unpacked size = 7.55 MB)
*.zip => consumes 5819.34 MB of repository (Unpacked size = 45987.38 MB)

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 38
=> 7592.54 MB (= 79%) of 9557.88 MB (repository size) could be tracked by Git
LFS (Unpacked size of LFS objects = 55514.85 MB)

A sample of extension -metrics.csv for the big size repository is annexed to the thesis –
ANNEX H.
The path analysis in the medium size repository has been performed on a list of paths provided a
priori. A sample of the path-metrics.csv for the big size repository is annexed to the thesis –
ANNEX I.
A sample of the logfile for both, medium and big, repositories has been attached to the thesis in
the ANNEX A .

The result of this analysis is used further to migrate the repositories to LFS and to confirm the
success of the analysis in the testing phase. This is described in the next chapters.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 39
6. Migration to Git LFS

6.1. Migration Strategies

When deciding to introduce LFS in the development of an already started project, there are
different ways to do it. I propose three strategies which can be applied when users seek for
performance improvements.

A. Start tracking files with LFS from current configuration
In order to continue the development in the current repository, but including Git LFS, the
following steps need to be performed:
Step 1 : Stop the current development by locking the repository on server and create a baseline.
Step 2 : Analyze the repository and identify candidates for LFS.
Step 3 : Unlock the repository and u pdate the .gitattributes file with the extensions to be
tracked by LFS.
Step 4 : Ensure all users use the proper version of Git (which includes LFS).
Step 5 : Continue the development.
When choosing this alternative when starting to work with LFS, the main a dvantage is the
project history which remains intact. It is not affected by any Git LFS operations, so the
traceability is still valid. A considerable drawback is the confusing file history because it is seen
as normal binary file, then, after introducing Git LFS, it is seen as a file handled by LFS. Another

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 40
disadvantage is that the initial repository remains intact; its size is not decreased by introducing
LFS, it increases slower since the files are tracked with LFS. Performance improvements are not
visib le for large repositories.
B. Create a new repository and start tracking files with LFS

In order to continue the development of the project in a new repository which contains files
tracked with LFS from scratch, the following steps need to be performed:
Step 1: Stop the current development, lock the repository and create a baseline.
Step 2 : Analyze the repository and identify candidates for LFS.
Step 3 : Create a new repository and update the .gitattributes file with the extensions to be
tracked by LFS.
Step 4: Ensure all users use the proper version of Git (which includes LFS).
Step 5 : Start the development in the new repository.
When choosing this alternative, it can be a plus and in the same time a minus the fact that two
repositories are used for developing the same project. Once created the new repository, the
development on the initial one must stop, thing which may lead to confusion among users. On
the other hand, the original repository remains intact, which means that the traceability is still
valid. Moreover, introducing LFS in a new repository brings numerous improvements in terms of
performance, the major one being the reduced size of the new repository.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 41
C. Convert the entire repository
When choosing to convert the entire repository as solution for switching to Git LFS, the
following steps need to be performed:
Step 1 : Stop the development and lock the repository.
Step 2 : Analyze the repository and identify candidates for LFS.
Step 3 : Create a mirror clone of the origin al repository.
Step 4: Create a new em pty repository on server which will be the remote of the converted
repository.
Step 5 : Ensure there are no subbranches of master branch in the original repository. If exist,
rename them before conversion, run the conversion and rename t hem back when the conversion
process finishes.
Step 6 : Run the conversion using bozaro/git -lfs-migrate (ensure conversion is run with –
g flag).
Step 7 : Push the converted repository on the server ( git fsck && git push –mirror
…).
Note : The original rep ository must remain locked!
As the previous migration strategy, the performance improvement is considerable regardless the
initial size of the repository. Since files are tracked with LFS from the beginning of the
development, it leads to a major reduction of the repository size. However, when migrating the
repository, new commit IDs are generated, which denote a brand new project history. The chain
of disadvantages continues with the traceability which is not applicable anymore because of new
commits. Desp ite all these drawbacks, this method can be used for testing purpose, when
measuring the performance improvements on a large repository which contains files tracked with
LFS.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 42
6.2. Migration of Target R epositories to Git LFS

In the 3rd chapter, Applying Git LFS , the purpose of this thesis is underlined. Starting from two
target repositories different in size and content, the challenge is to introduce Git LFS in order to
solve the binary file storage problem. In order to go for Git LFS in a proper way, a reposito ry
analysis is required; the information provided by the analysis is further used in the conversion
process. The target is to observe the behavior of the repositories which contain files tracked by
LFS and to compare the execution time of the basic Git com mands, which are influenced by the
repository size, in parallel, for each repository, in the original one and in the corresponding
converted one.
For testing purpose, the strategy of convert ing the entire repository is ch osen.
In the conversion process, Gi tHub recommends the git-lfs-migrate tool provided by Artem
V. Navrotskiy (as known as bozaro on github.com). The tool developed by bozaro (Java tool)
requires the following parameters:
 -s: absolute path to the source repository; the repository must be a ba re one;
 -d: absolute path to the destination repository; the converted repository is going to be a
bare one;
 -g: URL of the destination repository; the location on the server where the converted
repository is going to be pushed;
 A list of paths/extensions to be tracked by LFS;
The medium size repository and the big size one have been converted following the same steps
which are described below. The attached s creensho t is part of the conversion of the big
repository process.
Step 1 : Mirror clone the original repository;

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 43
Step 2 : Create a new empty repository on the test server (Repository name: cds -sqe-
lfs/sw.sys.fpke_17s1_fgc_converted_LFS );
Step 3 : Rename all subbranches of master branch ;
It has been noticed a strange behavior of bozaro tool if the source repository does not contain a
master branch, but subbranches of master branch. In this case, the tool creates automatically the
master branch in the destination repository, but it finishes with error when trying to add the
subbranches of master. In G it it is not possible to have a branch and subbranches of the same
branch in the same time. An issue has been opened to bozaro in order to take into consideration
the behavior of the tool in the presented use case, but currently there is no solution f or this. The
issue can be accessed here: https://github.com/bozaro/git -lfs-migrate/issues/41
As quick fix, before conversion, all subbranches of master branch has to be renamed in order to
avoid any conflict. If talking about a normal repository, the renam ing can be done easily by
navigating to .git/refs/heads and changing the name of master folder to a placeholder.
Since we are talking about a bare repository, the renaming has to be done using Git commands.
The renaming has been done as following:
#!/usr/b in/bash
branchMatches=$(git branch –all | grep 'master/')
for branch in $branchMatches; do
initialvalue=$branch
resulted_value="${branch/master/placeholder}"
$(git branch -m $initialvalue $resulted_value)
Done

Step 4 : Convert using bozaro tool;
Step 5 : Rename back the new subbranches to their original version and delete the default master
branch created automatically by the conversion;
Step 6 : Push the local converted repository to the test server;

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 44
7. Test

The conversion steps have been monitored and me asured and also some performance test have
been performed. The results can be found in the next paragraphs.
Beside performance tests, the daily work use cases have been reproduced on the converted
repositories in order to ensure the integrity. The followin g daily work tests have been performed:
 Push new commits which include updates on files tracked by LFS, then check the
integrity of files.
 Fetch, merge and checkout commits which include files tracked by LFS, then check the
integrity of files.
 Pull and checkout commits which include files tracked by LFS, then check the integrity
of files.
 Checkout a branch different from the default one, then check the integrity of files tracked
by LFS.
Daily work tests have been performed on both repositories (medium si ze and big size) in three
locations (Timisoara, Guadalajara and Singapore). All test passed.

7.1.1. Test Medium Size Repository

Target repositories:
Original repository: sw.sys.hud2gen_original
Converted repository: sw.sys.hud2gen_ original_converted

 Mirror clone the original repository

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 45

 Convert the original reposit ory to LFS using bozaro method by tracking the candidates
obtained after the analysis process
java -jar "d:/Git LFS/git -lfs-migrate/git -lfs-migrate.jar" -s
sw.sys.hud2gen_original.git -d converted _repository.git "*.udb"
"*.prg" "*.7z" "*.abs" "*.bin" "*.vfc" "*.chm" "*.bpl" "*.xlsm"
"*.ncb" "*.zip" "*.ttf" "*.pdb" "*.ppt" "*.pdf"

 Push the converted repository to server
Method 1 (this method is recommended by GitHub when handling large LFS content):
a. Push firstly the LFS objects only
git remote set -url origin git@git -id-test0.conti.de:cds -sqe-
lfs/sw.sys.hud2gen_original_converted.git

time git lfs push –all origin

b. Disable the L FS hook and push the non -LFS objects
cd hooks
mv pre-push pre -push.disabled
time git push –mirror origin

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 46
Method 2 : push entire repository using a single command
time git push –mirror origin

 Clone the original repository

 Clone the converted repository

All tests have been performed on a machine in Babenhausen, Germany, with the following Git
configuration:

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 47
7.1.2. Test Big Size Repository

Target repositories:
Original repository: sw.sys.fpke_17s1_fgc_original
Converted repository: sw.sys.fpke_17s1_fgc_converted

 Mirror c lone the original repository (test performed on a machine in Babenhausen,
Germany)

 Convert the original repository to LFS using bozaro method by tracking the candidates
obtained after the analysis process (test per formed on a machine in Babenhausen,
Germany)
java -jar "d:/Git LFS/git -lfs-migrate/git -lfs-migrate.jar" -s
sw.sys.fpke_17s1_fgc_original.git -d
sw.sys.fpke_17s1_fgc_converted.git -g git@git -id-
test0.conti.de:cds -sqe-
lfs/sw.sys.fpke_17s1_fgc_original_conv erted.git "*.7z" "*.abs"
"*.bin" "*.bpl" "*.chm" "*.dla" "*.gz" "*.ilk" "*.ipl" "*.jar"
"*.mer" "*.mes" "*.nupkg" "*.pdb" "*.pdf" "*.ppe" "*.pptx"
"*.sdf" "*.swf" "*.uboot" "*.uimage" "*.whl" "*.zip"
Conversion time: 3h 56m

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 48
 Clone the original repositor y in three locations
a. Timisoara

b. Guadalajara

c. Singapore

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 49
 Clone the converted repository in three locations
a. Timisoara

b. Guadalajara

The clone of the converted repository initially failed; 4 files have not been
downloaded due to “Permission denied” error. The root cause of this is the
HTTP_PROXY environm ent variable which produces the error. The problem is still
under investigation. If HTTP_PROXY is removed, then the clone finishes successfully.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 50

c. Singapore

All tests have been performed on machines with the following Git configuration:

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 51
8. Conclusion

Git LFS is a trustable solution for the Git binary storage problem because, first of all, it is
maintained and constantly developed by GitHub. It is the mostly searched term in the last period
compared to the actual existing candidates, according to Google Trends.
It has been proved that Git LFS increases the performance in repositories with a huge history.
The performance improvements, especially in the interaction with the server, are evident not
only at large repositories, but also at small ones. The exe cution time of the cloning process is
significantly reduced at a repository containing files tracked by LFS compared to the same
repository, but without LFS. The improvement is exponential; the larger the repository, the better
the execution time of clonin g process.
If not used correctly, Git LFS can damage more than help. In order to adopt a correct workflow
of Git LFS in projects, an analysis is required. The starting point of this thesis is the ap proach of
Lars Schneider in terms of repository analysis . His ideas have been observed and improved, so
further scripts have been derived starting from git-find-large-files.sh . The initial git-
find-large-files.sh has been ported to Perl in order to improve the execution time of the
script on Windows machin e in a Git Bash environment, extended and corrected in order to print
out all the objects in the Git repository followed by additional information. This output has been
passed to su bsequent scripts which perform an object analysis and, in the end, come to a
sugg estion regarding the candidates for Git LFS.
Using the tool proposed by Artem V. Navrotskiy , git-lfs-migrate , together with the list of
candidates obtained from the analysis scripts, the target repository is migrated to LFS. After
migration, a set of tests is defined and executed in different locations in order to observe the
improvement.
As shown in the previous chapter, if used in a proper way, if the proper candidates are identified
in the analysis process, Git LFS brings a considerable improvement, not only in size, but also in
elapsed time when communicating with the server.
For a repository having the initial size 9557.88 MB , after running the conversion using the
output of the repository analysis scripts, its size decreased to 1964.34 MB. The reposito ry size
has been reduced with 79%.
In terms of time performance, the cloning process has been measured and compared in three
different location as shown in Table 8.1.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 52
Timisoara Guadalajara Singapore
Original Repository 309m 52s 119m 42s 67m 28s
Converte d Repository 45m 52s 13m 1s 17m 21s
Improvement 86% 90% 75%

Concluding, the r epository analysis proposed by me provides reliable information which can be
taken as reference when starting to track files with LFS or convert a repository to LFS . The
interaction with server (clone) is much faster for a repository which contains files tracked with
LFS compared to the same repository without LFS. Introducing LFS does not affect the daily
work on the repository.
050100150200250300350
Timisoara Guadalajara SingaporeCloning Time of the Original
Repository (min)
Cloning time of the Converted
Repository (min)

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 53
Acknowledgement

I would like to thank Jo hannes Kramer, Chief Software Architect of Agile Toolchain team in the
Interior Instrumentation and Driver Business Unit of Continental Automotive, for his
implication, cooperation and support in overcoming the obstacles I have been facing through the
research. He has given me a considerable technical support during the implementation of the
proposed solutions for the repository analysis . In the same time I am thankful to Bertrand
Lacroute, the Project Owner in the Agile Toolchain team in the Interior Instrumentation and
Driver Business Unit of Continental Automotive, for allowing me to use productive data in this
research.

I would also like to acknowledge Lars Schneider, Technical Lead GitHub Solutions at Autodesk,
for giving me the starting point in the repository analysis process. Lars has provided a good
documentation for the problem of binary files storage in Git, material which I took as reference
for the beginning of the research.

An important contribution to the finalization of the research has been made by the Agile
Toolchain team in the Interior Instrumentation and Driver Business Unit of Continental
Automotive. I would like to thank the entire team in Timisoara for reviewing my results and
conclusions in this large topic and the team in Guada lajara and Singapore for performing tests
and providing me the test results which confirm the success of this project.

Last but not least, special thanks go to the entire Interior Instrumentation and Driver Business
Unit of Continental Automotive, department which considered this solution to be innovative and
embraced my proposal, so the pilot productive projects have been already nominated in order to
use Git LFS in the way I have described in this document.

Git Large File Storage

Lavinia Nicoleta Gîrteală
Software Engineering
University Polytechnic of Timișoara
Timișoara, 2017 Page 54
References

[1] S. Chacon, B. Straub , “Pro Git”,2nd edition, Apress, 2014
[2] W. Gajda , “Git Recipes. A problem -solution approach”, 2013
[3] M. McQuaid, "Git in Prac tice", 1st edition
[4] L. Schneider, https://github.com/larsxschneider/git -repo-analysis
[5] A. V. Navrotskiy , https://github.com/bozaro/git -lfs-migrate
[6] R. L. Schwartz, B, Foy, T . Phoenix, "Beginning Perl", 6th edition, O'Reilly Media, Inc., 2011
[7] R. Blum, C. Bresnahan, "Linux Command Line and Shell Scripting Bible", 3rd edition, John
Wiley & Sons, 2015
[8] V. Driessen, “A Successful Git Beanching Model”, January 2010
[9] GitFlow – https://datasift.github.io/gitflow/IntroducingGitFlow.html
[10] Git-Annex, https://git -annex.branchable.com/
[11] J. Brown, Git-Fat, https://github.com/jedbrown/git -fat
[12] A. Lebedev, Git-Media, https://github.com/alebedev/git -media
[13] Git Lfs, https://github.com/git -lfs/git -lfs

Similar Posts