think git
3a253b - Initial commit
This talk is about the software git
What is git?
Most you have have probably heard of git, but if you haven't -
git is a version control tool.
It's just a fancy way of saying you track what changes you are making to a file.
You've probably used a version control tool, or maybe even invented one that works for you.
Link
If you've ever worked on a paper using Word, you've probably created a verison control system that works for you by saving a file with different "version" tags.
The reason someone might do this is because they like having a backup of all their past changes. If someone wants to delete a paragraph, you keep a copy of the old version just in case
And when you are collaborating with someone, this becomes more challenging. Tracking changes and merging content added by different people is time consuming.
As you can imagine, this presents a problem in software development.
It is likely that multiple people will work on the same file sometimes even at the same time.
Even small software projects can have tens of files in multiple folders.
Additionally, developers frequently experiment with features and this should never affect the "master" copy of the project
Link
git to the rescue! Essentially what git can be is this, it can be a history of all changes you have made to every file in the project.
But, unfortunately you have to understand how it works to use it effectively.
Most people when introduced to git, are thrown into using it, and usually don't understand how it works. When something you have never encountered before occurs, you might not know what to do.
I know I've deleted projects and downloaded a fresh copy of a repo because I accidentally created a conflict of some sort.
With this talk, I'd like to introduce what I'm calling the big ideas of git that will help you understand how git works.
git is a supplement to your workflow
The first thing to understand is that git is a supplement to your current workflow. You can work the way you usually work, and all you have to do is use git every time you want to save your progressWorkflow
Edit files
$EDITOR
Save changes
git commit
Before we talk about git, let's talk about workflow.
This is a simple example of a workflow. You edit a file and you save the file. SPC.
Once you reach what you consider a stable state, you can save the file as a different name as backup, or save it on dropbox or however you usually implement a version control system
When you use git however, you don't have to save a different version of the file.
When you are ready to save the current version of the file as a backup, you do that in git by taking a snapshot of the current state.
You can take a "snapshot" of the current state of a file by using a command called git commit.
What really happens is that when you can create a "snapshot", it records the current state of all the files in the directory being tracked.
git commit
So what does a snapshot look like?
Let's assume that a snapshot looks like this circle over here.
As I mentioned earlier, a snapshot contains the current state of the file or directory.
When git creates a snapshot, it also creates a number and attaches that number to it.
git uses an algorithm to calculate this number and
this number is unique, it is very unlikely that two such numbers will be the same
Every snapshot / commit you create will have its own number attached to it. And this number is called a HASH. You can typically you can use the first 6 or 7 characters to uniquely identify a snapshot in your project.
Let's say you have created this first commit, and now you make more changes to the file. and want to save them.
When you make changes to your files, and save them using git commit, git creates a new snapshot with a new HASH.
git also does a few other things.
1) It allows you to create a message that goes along with that commit. A commit has to have a message!
2) When a new commit is created, it saves a reference to the parent commit. A snapshot can have multiple parents - i.e. you can take changes from two commits and merge them into one. The current commit HASH number depends on the path, i.e. it depends on what the parent is. If you change the parent hash i.e. change the content in the parent, every commit hash after that will be different.
3) Whatever commit represents the current project state, git stores a reference to that current commit in the label HEAD
In the word document VCS, we created multiple versions but we did not track how we got to each state. git allows us to do that by storing all this information
You may be wondering, where does git store all this - we will get to that.
This, as you can imagine, forms a graph.
What's your story
and like with any graph, you can choose to tell a story.
In this case, we want to work on a feature of a software, and while working on it we get a new idea.
You can go back to a previous state of the project, and branch off to work on a new idea - not affecting your current work.
And, when you are ready, you can pull these changes back to your master branch
You can go back and review individual changes made in any commit.
You can even undo changes, apply these changes to different state of your project.
You could also discard a branch entirely, or switch between branches.
Branches are awesome, and we'll talk more able branches a little later.
The point here is that this graph exists.
This graph is the documentation of how your project came to be, i.e. the history of your project is a graph that you can traverse at will.
Big idea #1
git history is a graph
So that's big idea number 1, git history is a graph that you can traverse, allowing you to move to any "saved" state in a project
git help --all
usage: git [--version] [--help] [-C ] [-c name=value]
[--exec-path[=]] [--html-path] [--man-path] [--info-path]
[-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
[--git-dir=] [--work-tree=] [--namespace=]
[]
available git commands in '/Applications/Xcode.app/Contents/Developer/usr/libexec/git-core'
add clone fast-import interpret-trailers notes remote-testsvn submodule
add--interactive column fetch log p4 repack subtree
am commit fetch-pack ls-files pack-objects replace svn
annotate commit-tree filter-branch ls-remote pack-redundant request-pull symbolic-ref
apply config fmt-merge-msg ls-tree pack-refs rerere tag
archimport count-objects for-each-ref mailinfo patch-id reset unpack-file
archive credential format-patch mailsplit prune rev-list unpack-objects
bisect credential-cache fsck merge prune-packed rev-parse update-index
bisect--helper credential-cache--daemon fsck-objects merge-base pull revert update-ref
blame credential-osxkeychain gc merge-file push rm update-server-info
branch credential-store get-tar-commit-id merge-index quiltimport send-email upload-archive
bundle cvsexportcommit grep merge-octopus read-tree send-pack upload-pack
cat-file cvsimport gui--askpass merge-one-file rebase sh-i18n--envsubst var
check-attr cvsserver hash-object merge-ours receive-pack shell verify-commit
check-ignore daemon help merge-recursive reflog shortlog verify-pack
check-mailmap describe http-backend merge-resolve relink show verify-tag
check-ref-format diff http-fetch merge-subtree remote show-branch web--browse
checkout diff-files http-push merge-tree remote-ext show-index whatchanged
checkout-index diff-index imap-send mergetool remote-fd show-ref write-tree
cherry diff-tree index-pack mktag remote-ftp stage
cherry-pick difftool init mktree remote-ftps stash
citool difftool--helper init-db mv remote-http status
clean fast-export instaweb name-rev remote-https stripspace
git commands available from elsewhere on your $PATH
loglive
'git help -a' and 'git help -g' list available subcommands and some
concept guides. See 'git help ' or 'git help '
to read about a specific subcommand or concept.
So we know that git history is a graph, right? And
This graph that we created has to be stored somewhere, right?
This graph is stored in a .git folder
Every folder on your computer that has a .git folder is git repository.
There is only one such .git folder in a git repo.
all git commit does, is add content to that .git folder.
and git commit is only just one function that git provides
what if you wanted to do more, like manipulate an existing snapshot, delete a set of changes, reorder your history.
git offers you other tools to do that.
git is essentially a toolkit that contains a bunch of functions that help organize snapshots of content.
This is a list of all the functions available to you.
Note - git is not github. what is github? github stores git repository in the internet and that's basically all it does
So, as I was saying this is the list of functions
git help --all
usage: git [--version] [--help] [-C ] [-c name=value]
[--exec-path[=]] [--html-path] [--man-path] [--info-path]
[-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
[--git-dir=] [--work-tree=] [--namespace=]
[]
available git commands in '/Applications/Xcode.app/Contents/Developer/usr/libexec/git-core'
add clone submodule
add--interactive fetch log
am commit
annotate
apply tag
reset
archive format-patch
bisect merge
bisect--helper pull revert
blame gc push rm
branch
grep
gui--askpass rebase
daemon help reflog
diff remote
checkout
cherry-pick init stash
mv status
clean instaweb
git commands available from elsewhere on your $PATH
loglive
'git help -a' and 'git help -g' list available subcommands and some
concept guides. See 'git help ' or 'git help '
to read about a specific subcommand or concept.
But you don't have to use every one of the functions available. I've only every had to use a handful.
because Git was initially a toolkit for a VCS rather than a full user-friendly VCS, it has a bunch of verbs that do low-level work and were designed to be chained together UNIX style or called from scripts. These commands are generally referred to as “plumbing” commands, and the more user-friendly commands are called “porcelain” commands.
https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain
Workflow
git init
Edit files
$EDITOR
Group changes
git add
Review changes
git status
Save changes
git commit
Let's talk about our workflow again. To actually use git we need to add a couple of steps to the workflow we talked about earlier. We group changes before committing them.
And we do this using the git add command.
if we open a folder and try to "git add" a file, git will throw us an error. it doesn't know where to add the file to. We need to create a .git folder first and we do that using git init. You only need to do this once for a project, at the every beginning. If a project is already using git, then that .git folder exists and you can run any git command.
the git add command, adds the content you specify to a "staging area"
You can add multiple files at the same time to the staging area
And then, when you call the git commit function, git takes everything that is in the staging area and pushes it into a new snapshot
Think of the content in the staging area as the changes you would like your next snapshot to represent
SPC
Before you commit these changes to a new snapshot, you might want to review your changes.
SPC
git status is used to review your changes,
SPC
And to actually save this snapshot you can use git commit
Link
Here is another way to look at this
The most important takeaway here is that you use git add to create what you want your next snapshot to look like.
The working directory is the files in your folder.
You make changes to your working directory and git add them to the staging area.
you can add multiple files
Next, you commit the staged files into a git repository
git commit updates the .git folder and thereby updating the graph
demo
.git folder is the database of key value pairs that track your project history. You can start a git repository by typing `git init`. When you add a file, git runs a SHA1 hash on the file which returns a hash. This hash is stored in a tree object and a commit object. If you create a file on your machine called `readme` and write `# farm` in it, and `git add` it to an empty repo, you should get the following folder in your .git repo
├── b8
│ └── fa28116ec360ae79b59da237ca991ab31a696e
Computes the hash of the file.
Stores the contents of the README file using the hash of the file to name the file.
Adds a reference to the README file to the git index
because git only cares about content. git uses content as a heuristic to find if a file has been renamed. if too much of the content has been changed between two commits when a rename occured, git will think it is a different file. and you can manually set this heuristic, I believe the default is 50%.
When you git commit the staged file, git creates a blob and a tree.
git does calculate a diff though, only when it needs to push. Again, this can be set manually. You can say - take longer than you usually do, but make me the smallest patch you can.
Link
This is the structure of a single commit/snapshot. snapshot (unofficial)==commit(git internal object).
Tree is like a directory and can link to other trees or blobs. blobs are like files.
Link
If the bottom file was modified, and committed to the repository in a new snapshot, git will still use the same blobs for unmodified files
There are other things going on here, like the tag which is just a label. Every branch name is just a label too and we'll touch on that later. Now for a quick demo
Big idea #2
Difference between Head, Index and Working directory
Link
git branch master
Let's talk about branches. Branches are just labels. The default branch is master. HEAD represents your current commit. When you add a snapshot succesfully, git will change the reference of the branch to the new HEAD, which is the new or current commit
Big idea #3
branches are just labels
demo
demo showing git mergetool
Best Mergetools
Link
There are mergetools out there other than vim, kdiff3 is open source and free on all platforms. p4merge is great.
demo
demo showing git rebase
demo
demo showing git rebase interactive
Big idea #4
Local commits are yours to do with what you like
Link
Everything we've seen so far has been local. git pull = git fetch + git merge
demo
demo showing git fetch
Big idea #5
Remote is special branch, but a branch nonetheless
Difference between working directory, staging area and .git repository
Local history is whatever you make it
Push to master only if tests pass
Write good commit messages!
Discuss workflow with team
Less of this and more of this
Source code for this presentation
- Free
- Fast
- Secure
- Supports multiple non linear workflows
- Easy to learn
git is free, fast, secure and support different workflows.
Hit SPC
And most importantly it is easy to learn
Free as in [beer, speech]
Link
Free to download for Windows, OSX, Linux. Free to modify under the GNU General Public License version 2.
Small
The Mozilla project's CVS repository is about 3 GB; it's about 12 GB in Subversion's fsfs format. In Git it's around 300 MB.
git repositories are usually smaller than CVS repository for the same source code and history. git is efficient at storing content and changes in content.
Fast
Link
git repositories are fast. Table above shows some comparisons of svn vs git. svn requires a central repository to operate, whereas git is entirely local. This means no network latencies. This also means a few other things, the entire project is on your local machine i.e. EVERYONE that has cloned the repository has the entire history of the project on their machine as well. Distributed backup.
Distributed non linear workflows
Subversion-Style Workflow
Link
git allows you to use it the way you want. You can use it exactly like how you would svn.
Distributed non linear workflows
Integration Manager Workflow
Link
git also allows you to pick and compile specific changes from anyone else that has made their repository public
Distributed non linear workflows
Dictator and Lieutenants Workflow
Link
Workflow followed by Linux devs, where Linus has a repository on his local machine that no one else has access to.
- git is not GitHub
- git is not Dropbox
- git is not svn
git is different from GitHub. GitHub a web service that host a public free remote copy of your repository. git is the toolkit that builds the repository. They are not the same, and there are loads of other places you can store your code remotely - gitlab, bitbucket etc. Why, you can even set up a computer in your home to act as a remote git server.
git is not Dropbox, and although it can be used for sharing word documents and pdfs, this may not be such a good idea - this will not scale well. git works best for text files.
CVS
SVN stores differences in the form of deltas.
Git
git however, stores snapshots. People often assume that git stores the differences between files, but that is not true. git tracks content and not files. The way git wins, is by reusing the same "blob" in different snapshots if the file has not changed.
-
git is British English slang for "unpleasant person".
- Linus Torvalds likes to name projects after himself
The Parable by Tom Preston-Werner (Founder of GitHub)
Link
This article goes through the steps one may attempt to create a version control system. In this hypothetical scenario one may start off by creating multiple folders as a backup. The final VCS that is arrived at is very similar to git. The author is the founder of GitHub
Link
Linus on why he created git
think git
This talk is about the software git