think git – What is git? – git functions



think git – What is git? – git functions

2 0


think-git


On Github kdheepak89 / think-git

think git

3a253b - Initial commit
This talk is about the software git

What is git?

Most you have have probably heard of git, but if you haven't - git is a version control tool. It's just a fancy way of saying you track what changes you are making to a file. You've probably used a version control tool, or maybe even invented one that works for you.

Link

If you've ever worked on a paper using Word, you've probably created a verison control system that works for you by saving a file with different "version" tags. The reason someone might do this is because they like having a backup of all their past changes. If someone wants to delete a paragraph, you keep a copy of the old version just in case And when you are collaborating with someone, this becomes more challenging. Tracking changes and merging content added by different people is time consuming. As you can imagine, this presents a problem in software development. It is likely that multiple people will work on the same file sometimes even at the same time. Even small software projects can have tens of files in multiple folders. Additionally, developers frequently experiment with features and this should never affect the "master" copy of the project

Link

git to the rescue! Essentially what git can be is this, it can be a history of all changes you have made to every file in the project. But, unfortunately you have to understand how it works to use it effectively. Most people when introduced to git, are thrown into using it, and usually don't understand how it works. When something you have never encountered before occurs, you might not know what to do. I know I've deleted projects and downloaded a fresh copy of a repo because I accidentally created a conflict of some sort. With this talk, I'd like to introduce what I'm calling the big ideas of git that will help you understand how git works.

git is a supplement to your workflow

The first thing to understand is that git is a supplement to your current workflow. You can work the way you usually work, and all you have to do is use git every time you want to save your progress

Workflow

Edit files
$EDITOR
Save changes
git commit
Before we talk about git, let's talk about workflow. This is a simple example of a workflow. You edit a file and you save the file. SPC. Once you reach what you consider a stable state, you can save the file as a different name as backup, or save it on dropbox or however you usually implement a version control system When you use git however, you don't have to save a different version of the file. When you are ready to save the current version of the file as a backup, you do that in git by taking a snapshot of the current state. You can take a "snapshot" of the current state of a file by using a command called git commit. What really happens is that when you can create a "snapshot", it records the current state of all the files in the directory being tracked.

git commit

So what does a snapshot look like? Let's assume that a snapshot looks like this circle over here. As I mentioned earlier, a snapshot contains the current state of the file or directory. When git creates a snapshot, it also creates a number and attaches that number to it. git uses an algorithm to calculate this number and this number is unique, it is very unlikely that two such numbers will be the same Every snapshot / commit you create will have its own number attached to it. And this number is called a HASH. You can typically you can use the first 6 or 7 characters to uniquely identify a snapshot in your project. Let's say you have created this first commit, and now you make more changes to the file. and want to save them. When you make changes to your files, and save them using git commit, git creates a new snapshot with a new HASH. git also does a few other things. 1) It allows you to create a message that goes along with that commit. A commit has to have a message! 2) When a new commit is created, it saves a reference to the parent commit. A snapshot can have multiple parents - i.e. you can take changes from two commits and merge them into one. The current commit HASH number depends on the path, i.e. it depends on what the parent is. If you change the parent hash i.e. change the content in the parent, every commit hash after that will be different. 3) Whatever commit represents the current project state, git stores a reference to that current commit in the label HEAD In the word document VCS, we created multiple versions but we did not track how we got to each state. git allows us to do that by storing all this information You may be wondering, where does git store all this - we will get to that. This, as you can imagine, forms a graph.

What's your story

and like with any graph, you can choose to tell a story. In this case, we want to work on a feature of a software, and while working on it we get a new idea. You can go back to a previous state of the project, and branch off to work on a new idea - not affecting your current work. And, when you are ready, you can pull these changes back to your master branch You can go back and review individual changes made in any commit. You can even undo changes, apply these changes to different state of your project. You could also discard a branch entirely, or switch between branches. Branches are awesome, and we'll talk more able branches a little later. The point here is that this graph exists. This graph is the documentation of how your project came to be, i.e. the history of your project is a graph that you can traverse at will.

Big idea #1

git history is a graph So that's big idea number 1, git history is a graph that you can traverse, allowing you to move to any "saved" state in a project

git functions

git help --all
usage: git [--version] [--help] [-C ] [-c name=value]
           [--exec-path[=]] [--html-path] [--man-path] [--info-path]
           [-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
           [--git-dir=] [--work-tree=] [--namespace=]
            []

available git commands in '/Applications/Xcode.app/Contents/Developer/usr/libexec/git-core'

  add                       clone                     fast-import               interpret-trailers        notes                     remote-testsvn            submodule
  add--interactive          column                    fetch                     log                       p4                        repack                    subtree
  am                        commit                    fetch-pack                ls-files                  pack-objects              replace                   svn
  annotate                  commit-tree               filter-branch             ls-remote                 pack-redundant            request-pull              symbolic-ref
  apply                     config                    fmt-merge-msg             ls-tree                   pack-refs                 rerere                    tag
  archimport                count-objects             for-each-ref              mailinfo                  patch-id                  reset                     unpack-file
  archive                   credential                format-patch              mailsplit                 prune                     rev-list                  unpack-objects
  bisect                    credential-cache          fsck                      merge                     prune-packed              rev-parse                 update-index
  bisect--helper            credential-cache--daemon  fsck-objects              merge-base                pull                      revert                    update-ref
  blame                     credential-osxkeychain    gc                        merge-file                push                      rm                        update-server-info
  branch                    credential-store          get-tar-commit-id         merge-index               quiltimport               send-email                upload-archive
  bundle                    cvsexportcommit           grep                      merge-octopus             read-tree                 send-pack                 upload-pack
  cat-file                  cvsimport                 gui--askpass              merge-one-file            rebase                    sh-i18n--envsubst         var
  check-attr                cvsserver                 hash-object               merge-ours                receive-pack              shell                     verify-commit
  check-ignore              daemon                    help                      merge-recursive           reflog                    shortlog                  verify-pack
  check-mailmap             describe                  http-backend              merge-resolve             relink                    show                      verify-tag
  check-ref-format          diff                      http-fetch                merge-subtree             remote                    show-branch               web--browse
  checkout                  diff-files                http-push                 merge-tree                remote-ext                show-index                whatchanged
  checkout-index            diff-index                imap-send                 mergetool                 remote-fd                 show-ref                  write-tree
  cherry                    diff-tree                 index-pack                mktag                     remote-ftp                stage
  cherry-pick               difftool                  init                      mktree                    remote-ftps               stash
  citool                    difftool--helper          init-db                   mv                        remote-http               status
  clean                     fast-export               instaweb                  name-rev                  remote-https              stripspace

git commands available from elsewhere on your $PATH

  loglive

'git help -a' and 'git help -g' list available subcommands and some
concept guides. See 'git help ' or 'git help '
to read about a specific subcommand or concept.
So we know that git history is a graph, right? And This graph that we created has to be stored somewhere, right? This graph is stored in a .git folder Every folder on your computer that has a .git folder is git repository. There is only one such .git folder in a git repo. all git commit does, is add content to that .git folder. and git commit is only just one function that git provides what if you wanted to do more, like manipulate an existing snapshot, delete a set of changes, reorder your history. git offers you other tools to do that. git is essentially a toolkit that contains a bunch of functions that help organize snapshots of content. This is a list of all the functions available to you. Note - git is not github. what is github? github stores git repository in the internet and that's basically all it does So, as I was saying this is the list of functions
git help --all
usage: git [--version] [--help] [-C ] [-c name=value]
           [--exec-path[=]] [--html-path] [--man-path] [--info-path]
           [-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
           [--git-dir=] [--work-tree=] [--namespace=]
            []

available git commands in '/Applications/Xcode.app/Contents/Developer/usr/libexec/git-core'

  add                       clone                                                                                                                             submodule
  add--interactive                                    fetch                     log
  am                        commit
  annotate
  apply                                                                                                                                                       tag
                                                                                                                                    reset
  archive                                             format-patch
  bisect                                                                        merge
  bisect--helper                                                                                          pull                      revert
  blame                                               gc                                                  push                      rm
  branch
                                                      grep
                                                      gui--askpass                                        rebase

                            daemon                    help                                                reflog

                            diff                                                                          remote
  checkout


  cherry-pick                                         init                                                                          stash
                                                                                mv                                                  status
  clean                                               instaweb

git commands available from elsewhere on your $PATH

  loglive

'git help -a' and 'git help -g' list available subcommands and some
concept guides. See 'git help ' or 'git help '
to read about a specific subcommand or concept.
But you don't have to use every one of the functions available. I've only every had to use a handful. because Git was initially a toolkit for a VCS rather than a full user-friendly VCS, it has a bunch of verbs that do low-level work and were designed to be chained together UNIX style or called from scripts. These commands are generally referred to as “plumbing” commands, and the more user-friendly commands are called “porcelain” commands. https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain

Workflow

git init
Edit files
$EDITOR
Group changes
git add
Review changes
git status
Save changes
git commit
Let's talk about our workflow again. To actually use git we need to add a couple of steps to the workflow we talked about earlier. We group changes before committing them. And we do this using the git add command. if we open a folder and try to "git add" a file, git will throw us an error. it doesn't know where to add the file to. We need to create a .git folder first and we do that using git init. You only need to do this once for a project, at the every beginning. If a project is already using git, then that .git folder exists and you can run any git command. the git add command, adds the content you specify to a "staging area" You can add multiple files at the same time to the staging area And then, when you call the git commit function, git takes everything that is in the staging area and pushes it into a new snapshot Think of the content in the staging area as the changes you would like your next snapshot to represent SPC Before you commit these changes to a new snapshot, you might want to review your changes. SPC git status is used to review your changes, SPC And to actually save this snapshot you can use git commit

Link

Here is another way to look at this The most important takeaway here is that you use git add to create what you want your next snapshot to look like. The working directory is the files in your folder. You make changes to your working directory and git add them to the staging area. you can add multiple files Next, you commit the staged files into a git repository git commit updates the .git folder and thereby updating the graph

demo

.git folder is the database of key value pairs that track your project history. You can start a git repository by typing `git init`. When you add a file, git runs a SHA1 hash on the file which returns a hash. This hash is stored in a tree object and a commit object. If you create a file on your machine called `readme` and write `# farm` in it, and `git add` it to an empty repo, you should get the following folder in your .git repo ├── b8 │   └── fa28116ec360ae79b59da237ca991ab31a696e Computes the hash of the file. Stores the contents of the README file using the hash of the file to name the file. Adds a reference to the README file to the git index because git only cares about content. git uses content as a heuristic to find if a file has been renamed. if too much of the content has been changed between two commits when a rename occured, git will think it is a different file. and you can manually set this heuristic, I believe the default is 50%. When you git commit the staged file, git creates a blob and a tree. git does calculate a diff though, only when it needs to push. Again, this can be set manually. You can say - take longer than you usually do, but make me the smallest patch you can.

Link

This is the structure of a single commit/snapshot. snapshot (unofficial)==commit(git internal object). Tree is like a directory and can link to other trees or blobs. blobs are like files.

Link

If the bottom file was modified, and committed to the repository in a new snapshot, git will still use the same blobs for unmodified files There are other things going on here, like the tag which is just a label. Every branch name is just a label too and we'll touch on that later. Now for a quick demo

demo

Big idea #2

Difference between Head, Index and Working directory Link

Branches

git branch master

Let's talk about branches. Branches are just labels. The default branch is master. HEAD represents your current commit. When you add a snapshot succesfully, git will change the reference of the branch to the new HEAD, which is the new or current commit

git checkout -b feature

git checkout master

git merge master

Big idea #3

branches are just labels

Merge Conflict

demo

demo showing git mergetool

Best Mergetools

Link There are mergetools out there other than vim, kdiff3 is open source and free on all platforms. p4merge is great.

Changing history

git rebase master

demo

demo showing git rebase

demo

demo showing git rebase interactive

Big idea #4

Local commits are yours to do with what you like

Remote

Link

Everything we've seen so far has been local.

git pull = git fetch + git merge

demo

demo showing git fetch

Big idea #5

Remote is special branch, but a branch nonetheless

Review of big ideas

git history is a graph

Difference between working directory, staging area and .git repository

Branches are just labels

Remote is a branch too

Local history is whatever you make it

Best practices

Commit related changes

Commit often

Branches are inexpensive

Push to master only if tests pass

Write good commit messages!

Discuss workflow with team

Less of this and more of this

Source code for this presentation

Additional slides

Advantages of git

  • Free
  • Fast
  • Secure
  • Supports multiple non linear workflows
  • Easy to learn
git is free, fast, secure and support different workflows. Hit SPC And most importantly it is easy to learn

Free as in [beer, speech]

Link

Free to download for Windows, OSX, Linux. Free to modify under the GNU General Public License version 2.

Small

The Mozilla project's CVS repository is about 3 GB; it's about 12 GB in Subversion's fsfs format. In Git it's around 300 MB. git repositories are usually smaller than CVS repository for the same source code and history. git is efficient at storing content and changes in content.

Fast

Link git repositories are fast. Table above shows some comparisons of svn vs git. svn requires a central repository to operate, whereas git is entirely local. This means no network latencies. This also means a few other things, the entire project is on your local machine i.e. EVERYONE that has cloned the repository has the entire history of the project on their machine as well. Distributed backup.

Distributed non linear workflows

Subversion-Style Workflow Link

git allows you to use it the way you want. You can use it exactly like how you would svn.

Distributed non linear workflows

Integration Manager Workflow Link

git also allows you to pick and compile specific changes from anyone else that has made their repository public

Distributed non linear workflows

Dictator and Lieutenants Workflow Link

Workflow followed by Linux devs, where Linus has a repository on his local machine that no one else has access to.

  • git is not GitHub
  • git is not Dropbox
  • git is not svn
git is different from GitHub. GitHub a web service that host a public free remote copy of your repository. git is the toolkit that builds the repository. They are not the same, and there are loads of other places you can store your code remotely - gitlab, bitbucket etc. Why, you can even set up a computer in your home to act as a remote git server. git is not Dropbox, and although it can be used for sharing word documents and pdfs, this may not be such a good idea - this will not scale well. git works best for text files.

CVS

SVN stores differences in the form of deltas.

Git

git however, stores snapshots. People often assume that git stores the differences between files, but that is not true. git tracks content and not files. The way git wins, is by reusing the same "blob" in different snapshots if the file has not changed.

A brief history of Git

  • git is British English slang for "unpleasant person".

- Linus Torvalds likes to name projects after himself

The Parable by Tom Preston-Werner (Founder of GitHub) Link

This article goes through the steps one may attempt to create a version control system. In this hypothetical scenario one may start off by creating multiple folders as a backup. The final VCS that is arrived at is very similar to git. The author is the founder of GitHub

git blame README Link

Link

Linus on why he created git
think git This talk is about the software git