This article explains to beginners and intermediate Linux users how to install, configure and use Git. Its also for those who are comfortable with Linux but are hesitant to use Git, due to their lack of familiarity with it.
Git is a distributed version control system (DVCS) created by Linus Torvalds. Its mostly used by developers, but it can also be used to store your dot files (a dot file begins with a . and the term is generally used to refer to .bashrc, .vimrc or other such set-up/configuration files), important scripts, etc.
Its different from other similar DVCS systems because it does not treat data as a set of files, but rather, when you save your project in Git, it takes a snapshot of how the files look at that instant in time and stores a reference to it. The core of Git is a key-value data system. No matter what data you insert into Git, it will checksum it using an SHA 1 algorithm and create a 40-character hex key. Therefore, the name of the file is not really relevant to Git.
GitHub is a Web-based hosting service; it offers a remote Git repository where you can host your files.
Note: You dont need GitHub to use Git. You can host your remote repository on a company server as well.
Git terminology
Lets get familiar with some terminology:
- Working tree: This is the directory in which you put the files that you want Git to manage. This is where you store files so that they can be staged to be pushed into the repository. Basically, this could be any directory on your local file system that has a Git repository associated with it. When you run the git init command in a normal directory, it transforms the normal directory into a working tree (working directory), because running that command creates a .git sub-directory right inside the directory in which you have run the command. Well soon see how this is done.
- Repository (Git directory): The repository, also called the Git directory, is where all your game-changing Git files are stored and where all the Git goodness starts. It basically stores all the meta data, objects, etc, for your project. A local repository exists on your laptop and is associated with your working tree. Its the .git directory that gets created when you run git init as stated earlier.
- A remote repository exists outside of your laptop, somewhere on a remote server. In our case, it will be on GitHub.
- Git objects: There are different kinds of objects that Git uses such as:
Tree – This contains one or moretreesand blobs. A tree is a sort of a file system directory and it can point to other Git trees. Think of it as a directory having other sub-directories or blobs.
Blobs – These are just normal plain files such as text files, source code files, or even pictures; something like the file contents or the inode information. Basically, a blob can be thought of as something thats not a directory.
Commit – This is generated the moment you run the git commit command. It has the meta data as well as a pointer to the root project directory, so that the snap shots can be recreated whenever needed. - Git index: This is the staging area of Git. When you run the git add command, you add files to this staging area. When you run the git commit command, these files are committed to your local Git database. You then push the files into the remote repository.
Note: The index is NOT the repository, neither is it the working tree/directory.
- Git commit: This is a point in time snapshot of your working directory. You run the git commit command and the files are committed to the local Git database as stated earlier. This command basically checksums all the files and directories in the location where it is run and creates a commit object.
- Branch: It is a pointer to a commit. The default branch name in Git is master. The master branch appears the first time we do a commit. If we do another commit, the master then points to this new commit.
There is other terminology, but this is enough for starters.
Opening a GitHub account
So lets open a GitHub account first. Head to www.github.com and sign up. Its free.
Note: Please note down the e-mail address and the password you use to create this account. We will need it in a while when we configure Git on the laptop or desktop.
Once registration is complete, you can sign in. On the top right side of the page, you will see the name you chose and there will be a + sign next to it. Click on it and then on the new repository option. Take a look at the screenshot in Figure 1.
On the next page, choose any name for your repository. Mines called mydotfiles. Provide a brief description about your repository. Please do not select the Initialise this repository with a README option because we will be creating a README ourselves.
Click on the Create repository button and when the next page loads, make a note of the URL shown.
Figure 3 shows how it looks in my case. This is a screenshot of the top half of the page that loads up when you hit the Create repository button showing the Git remote repository name.
Important note: Please remember, whatever repository name you choose, youll either need to have a directory with exactly the same name or, youll need to create a new directory with the same name. Git is basically used to sync your directories and files located inside a directory on your laptop with the one having the same name on GitHub.
Installing and configuring Git for first time use
Install Git by running:
yum install git
Once you have it installed, run the git version command to see which version we are on. This is how it looks on my laptop:
[pmu@t430 ~]$ git --version git version 1.8.3.1 [pmu@t430 ~]$
Now, lets configure Git for first time use. We will set up the name and e-mail address on our local laptop or desktop. This is the step we have to run the very first time we set up Git on a laptop or desktop.
Important: Please ensure that you use the same e-mail ID that you used to create the GitHub account.
[pmu@t430 ~]$ git config --global user.name pmu [pmu@t430 ~]$ git config --global user.email pmu.rwx@gmail.com
Check the Git man page to see what other options can be set. Once done, you can check the options with the following command:
[pmu@t430 ~]$ git config --list
Initialising Git
We will now create a directory with the same name as that of the repository.
[pmu@t430 ~]$ mkdir mydotfiles
Lets get into the directory and initialise Git by running the git init command. Once you do that, there should be a .git directory created in there with a few files and directories under it. In Git terminology, the mydotfiles directory has now become a working tree. Now, the mydotfiles directory and everything under it can be uploaded to Git.
Creating files for some Git action
We will create the Readme file now and add some content to it. I created a Readme.txt file in Vim, saved it and then ran the more command to show the output in the terminal.
[pmu@t430 mydotfiles]$ vim Readme.txt [pmu@t430 mydotfiles]$ more Readme.txt
This is a Readme file, my first file that Ill try to upload on GitHub.
[pmu@t430 mydotfiles]$
You can add anything you want to the file.
Now lets run the git status command. This is a very helpful command that lets us know exactly what stage we are at.
Now lets look at the concept thats unique to Git. The moment we add a file, Git creates a hash checksum and refers to the file using that checksum. In other words, we call the file Readme.txt but Git refers to it by its checksum. If we were to look for files in a directory, wed run the ls command. With Git, we run the Git ls-files –stage command.
[pmu@t430 mydotfiles]$ git ls-files --stage 100644 b70f72952f495b2aae83f2ff1a50b5ee8d001edb 0 Readme.txt [pmu@t430 mydotfiles]$
Look at the long string – b70f72952f495b2aae83f2ff1a50b5ee8d001edb. Thats how Git refers to what we call the Readme.txt file. So what does this file have? Well, we could check it as follows:
[pmu@t430 mydotfiles]$ git show b70f
This is a Readme file, my first file that Ill try to upload on GitHub.
[pmu@t430 mydotfiles]$
Thats the Git equivalent of more Readme.txt or cat Readme.txt. Thats exactly the same content we have in our Readme.txt file. Note that we can just supply the first four characters of the hash. So this means that Git really doesnt track or manage a file using the filename we give to it. It just cares about the hashed checksum. In other words, if two or more files were to have the exact same content, then the hash checksum generated for those files will be exactly the same.
The following example will make it even clearer.
Lets copy the Readme.txt file with some other name without changing any content inside the actual file.
[pmu@t430 mydotfiles]$ cp Readme.txt Oncemore.txt [pmu@t430 mydotfiles]$ ls -l total 8 -rw-rw-r--. 1 pmu pmu 72 Sep 22 21:03 Oncemore.txt -rw-rw-r--. 1 pmu pmu 72 Sep 22 20:53 Readme.txt [pmu@t430 mydotfiles]$
Lets run the Git status command and see how it responds.
[pmu@t430 mydotfiles]$ git status # On branch master # Initial commit # Changes to be committed: # (use git rm --cached <file>... to unstage) # new file: Readme.txt # Untracked files: # (use git add <file>... to include in what will be committed) # Oncemore.txt [pmu@t430 mydotfiles]$
It shows the newly created file. Note that it states Readme.txt as a new file, because we just added it with the git add Readme.txt command. We also have a copy of that file with the name Oncemore.txt. Since we have still not added it, it shows up as an untracked file.
Will Git show some different output with the Git ls-files stage command this time?
[pmu@t430 mydotfiles]$ git ls-files --stage 100644 b70f72952f495b2aae83f2ff1a50b5ee8d001edb 0 Readme.txt [pmu@t430 mydotfiles]$
No, it does not. So, lets go ahead and add the Oncemore.txt file. We will use a . this time so that it adds all the files (Readme.txt and Oncemore.txt).
[pmu@t430 mydotfiles]$ git add . [pmu@t430 mydotfiles]$ git status # On branch master # Initial commit # Changes to be committed: # (use git rm --cached <file>... to unstage) # new file: Oncemore.txt # new file: Readme.txt [pmu@t430 mydotfiles]$
Note how the Oncemore.txt file now shows up as a new file instead of as untracked. So, how does Git see these two files? Lets find out again with the Git ls-files stage command.
[pmu@t430 mydotfiles]$ git ls-files --stage 100644 b70f72952f495b2aae83f2ff1a50b5ee8d001edb 0 Oncemore.txt 100644 b70f72952f495b2aae83f2ff1a50b5ee8d001edb 0 Readme.txt [pmu@t430 mydotfiles]$
Well, its basically two files with different names, but with the same checksum.
The following output confirms it:
[pmu@t430 mydotfiles]$ git ls-files --stage | awk {print $2} | sort | uniq b70f72952f495b2aae83f2ff1a50b5ee8d001edb [pmu@t430 mydotfiles]$
Indeed, the checksum value is the same; its just that there are two file names associated with it. The git show command lets us peek into the file using the checksum value. If you and I were to see the file content, we would use cat/more/less or something like that. But Git uses the git show command and we supply the checksum to it.
[pmu@t430 mydotfiles]$ git show b70f This is a Readme file, my first file that Ill try to upload on GitHub. [pmu@t430 mydotfiles]$
If you were expecting to see the content mentioned twice, youre still thinking the filename way which is not the case here. For Git, its just b70f72952f495b2aae83f2ff1a50b5ee8d001edb. This is what Git calls a blob.
Lets leave Oncemore.txt alone for a while now and just focus on Readme.txt.
Well now run the git commit command, which will add the file to the Git local repository.
@t430 mydotfiles]$ git commit -m Adding my first file - Readme.txt Readme.txt [master (root-commit) 26ab994] Adding my first file - Readme.txt 1 file changed, 1 insertion(+) create mode 100644 Readme.txt [pmu@t430 mydotfiles]$
The output shows that the Readme.txt file is now committed to the repository. The part added in quotes after -m is the message or comment. The output above tells us that there is one insertion and the last three digits that appear after the create mode word 644 indicate the umask of the file. So whats the number 26ab994? Lets find out using the git show command once again:
[pmu@t430 mydotfiles]$ git show 26ab994 commit 26ab994663499b21d8e2de7fc0f53925954fae7c Author: pmu <pmu.rwx@gmail.com> Date: Mon Sep 22 21:08:18 2014 +0530 Adding my first file - Readme.txt diff --git a/Readme.txt b/Readme.txt new file mode 100644 index 0000000..b70f729 --- /dev/null +++ b/Readme.txt @@ -0,0 +1 @@ +This is a Readme file, my first file that Ill try to upload on GitHub. [pmu@t430 mydotfiles]$
That is the commit object hash checksum. This is the Git object which has the authors name, the date and time, the comment, the hashed checksum name of the file that was committed, and some more information. Note that in the line index 0000000..b70f729, the last seven digits are nothing but the first seven digits of the checksum that Git created for our Readme.txt file.
Lets run the git status command again.
[pmu@t430 mydotfiles]$ git status # On branch master # Changes to be committed: # (use git reset HEAD <file>... to unstage) # new file: Oncemore.txt [pmu@t430 mydotfiles]$
This makes sense, because we committed only the Readme.txt file, but didnt commit the Oncemore.txt file. Gits telling us the same. Its telling us that we still have not committed the Oncemore.txt file.
Pushing to GitHub
Now, its time to push our Readme.txt file onto GitHub, which is our remote repository. Well, first lets check if we have any remote repository already residing in there.
[pmu@t430 mydotfiles]$ git remote -v [pmu@t430 mydotfiles]$
No, we dont have it. So lets go ahead and add the remote repository. Do you remember that in the beginning of the tutorial, we had noted down the repository URL? Thats the one we will use. The command for that is git remote add origin the_name_of_your_github_repo.git.
In my case it will look like what follows:
[pmu@t430 mydotfiles]$ git remote add origin https://github.com/ugrankar/mydotfiles.git
Lets see what the git remote -v command says this time.
[pmu@t430 mydotfiles]$ git remote -v origin https://github.com/ugrankar/mydotfiles.git (fetch) origin https://github.com/ugrankar/mydotfiles.git (push) [pmu@t430 mydotfiles]$
It shows the repository added.
Heres what the command means:
- git remote add: This means add the remote directory to Git.
- origin: This is the default name for the remote location, so that you can use it instead of typing the lengthy https://github.com/ugrankar/mydotfiles.git line. I could have used the word pmu instead of origin or even abracadabra instead of origin. It doesnt matter. But origin seems to be the name thats used most often and, as mentioned before, its the default name. So we will use that.
Before we run the next command, open up your GitHub repository in your browser.
Figure 4 shows how mine looks.
Now, lets push the file to GitHub using the git push origin master command.
[pmu@t430 mydotfiles]$ git push origin master
Note: Youll have to use the same username and password that you use to create and access your GitHub account. Theres a method to add ssh keys, but lets go this way for starters.
Refresh the GitHub page and youll see your Readme.txt file there. Note the line that says latest commit 26ab994663 which is the commit object.
Lets get back to the output captured above.
Note the line where it mentions counting objects and Total 3 (delta 0). Its 3 Object in there. So what are these objects?
Git has a command git rev-list objects all that will list out all the objects.
[pmu@t430 mydotfiles]$ git rev-list --objects --all
The got rev-list objects all command shows all the objects.
So lets check them one by one using the git show command.
[pmu@t430 mydotfiles]$ git show 26ab commit 26ab994663499b21d8e2de7fc0f53925954fae7c Author: pmu <pmu.rwx@gmail.com> Date: Mon Sep 22 21:08:18 2014 +0530 Adding my first file - Readme.txt diff --git a/Readme.txt b/Readme.txt new file mode 100644 index 0000000..b70f729 --- /dev/null +++ b/Readme.txt @@ -0,0 +1 @@ +This is a Readme file, my first file that Ill try to upload on GitHub. [pmu@t430 mydotfiles]$
So thats the commit object.
Lets check more:
[pmu@t430 mydotfiles]$ git show 82b77 tree 82b77 Readme.txt
Thats the tree, or the working tree, as Git would refer to it, and mydotfiles, as how I would call it.
And now for the next entry:
[pmu@t430 mydotfiles]$ git show b70f This is a Readme file, my first file that Ill try to upload on GitHub. [pmu@t430 mydotfiles]$
Thats our Readme.txt file.
Note: Git says its just got the Readme.txt file in it. But we also had the Oncemore.txt file in there, didnt we? Yes, we did, but we had just added the file, not committed it.
So lets commit the Oncemore.txt file and see what we get:
[pmu@t430 mydotfiles]$ git commit -m Now adding the Oncemore.txt file Oncemore.txt [master 2aa51b8] Now adding the Oncemore.txt file 1 file changed, 1 insertion(+) create mode 100644 Oncemore.txt [pmu@t430 mydotfiles]$
Run the git status command like the last time.
[pmu@t430 mydotfiles]$ git status # On branch master nothing to commit, working directory clean [pmu@t430 mydotfiles]$
The output states that we have nothing to commit. Lets check the objects we have:
[pmu@t430 mydotfiles]$ git rev-list --objects --all
There are now five objects. The first and the third in the list above seem to be new, so lets check them out.
[pmu@t430 mydotfiles]$ git show 2aa5 commit 2aa51b8cae195fb0d786115898179b20b51a094b Author: pmu <pmu.rwx@gmail.com> Date: Mon Sep 22 21:20:26 2014 +0530 Now adding the Oncemore.txt file diff --git a/Oncemore.txt b/Oncemore.txt new file mode 100644 index 0000000..b70f729 --- /dev/null +++ b/Oncemore.txt @@ -0,0 +1 @@ +This is a Readme file, my first file that Ill try to upload on GitHub. [pmu@t430 mydotfiles]$ [pmu@t430 mydotfiles]$ git show 6116 tree 6116 Oncemore.txt Readme.txt [pmu@t430 mydotfiles]$
Note that 2aa5 is the commit object created after we ran the git commit command for Oncemore.txt and 6116 is the working tree object, which now shows two files Readme.txt and Oncemore.txt.
Note that the checksum starting with b70f is now associated with the Oncemore.txt file.
We can do a git push using the below command.
[pmu@t430 mydotfiles]$ git push origin master
Take a look at your GitHub page and it should show you that its uploaded. Figure 6 shows how my page looks now.
Note how the comment for Oncemore.txt is the same as it was for the Readme.txt. We had committed Oncemore.txt with the comment, Now adding the Oncemore.txt file; so why did Git retain the same comment that we used while committing Oncemore.txt? Once again, for Git, the file name does not matter. It sees that the commit is for the same Blob and hence retains the command.
This time, it uploads two objectsthe commit object (for Oncemore.txt) and the working tree object. Why the working tree object again? Thats because the working tree object gets a new checksum after we added the Oncemore.txt file to it. Run the git show command with the four-character hex and youll see what I mean.
Now, lets try to change the content of Oncemore.txt file and see what happens.
Ive added a line so that the Oncemore.txt file looks like what follows:
[pmu@t430 mydotfiles]$ more Oncemore.txt This is a Readme file, my first file that Ill try to upload on GitHub. This is a new line added to the Oncemore.txt file. Its not there in the Readme.txt file. [pmu@t430 mydotfiles]$
Lets run git status and see what it tells us:
[pmu@t430 mydotfiles]$ git status # On branch master # Changes not staged for commit: # (use git add <file>... to update what will be committed) # (use git checkout -- <file>... to discard changes in working directory) # modified: Oncemore.txt no changes added to commit (use git add and/or git commit -a) [pmu@t430 mydotfiles]$
The file shows up as modified. So, we need to add the file again like how Git instructs us to. Lets do that and see what happens.
[pmu@t430 mydotfiles]$ git add Oncemore.txt [pmu@t430 mydotfiles]$ git status # On branch master # Changes to be committed: # (use git reset HEAD <file>... to unstage) # modified: Oncemore.txt [pmu@t430 mydotfiles]$
Has the checksum now changed?
[pmu@t430 mydotfiles]$ git ls-files --stage 100644 f10c9773eef39357a15a18183d1a4d42b349267d 0 Oncemore.txt 100644 b70f72952f495b2aae83f2ff1a50b5ee8d001edb 0 Readme.txt [pmu@t430 mydotfiles]$
Indeed, it has. Now, Git sees this as a completely different file…or rather, a different checksum.
Now, lets run the git commit command with exactly the same comment we had used earlier while committing the Oncemore.txt file.
git commit -m Now adding the Oncemore.txt file Oncemore.txt [pmu@t430 mydotfiles]$ git commit -m Now adding the Oncemore.txt file Oncemore.txt [master f911d56] Now adding the Oncemore.txt file 1 file changed, 1 insertion(+) [pmu@t430 mydotfiles]$
Now, we have a new Commit Object – f911d56. Its time for a Git push now.
Refresh your GitHub page and youll see that the Oncemore.txt file has exactly the same comment that you added just two minutes before.
Well, understand that it didnt remember the comment that you ran the first time while committing Oncemore.txt. This is the comment that you added just now. You could have very well added a different command.
Cloning from GitHub
Now that youve got your files up there on GitHub, what can you do with them? Cloning is one of them. Basically, when you clone a repo, you get a local copy of a Git repository so that you can fiddle around with it. Lets create a directory in which we will clone the repository.
Note: There is no need to run git init this time, because we are not creating a repo that we will be pushing up. Were just cloning an existing repository.
pmu@t430 mydotfiles]$ mkdir clonedir [pmu@t430 mydotfiles]$ cd clonedir [pmu@t430 clonedir]$ git clone https://github.com/ugrankar/mydotfiles.git
And what does it have?
[pmu@t430 clonedir]$ ls -la total 0 drwxrwxr-x. 3 pmu pmu 23 Sep 22 22:44 . drwxrwxr-x. 4 pmu pmu 68 Sep 22 22:44 .. drwxrwxr-x. 3 pmu pmu 53 Sep 22 22:44 mydotfiles [pmu@t430 clonedir]$ cd mydotfiles/ [pmu@t430 mydotfiles]$ ls Oncemore.txt Readme.txt [pmu@t430 mydotfiles]$ ls -a . .. .git Oncemore.txt Readme.txt [pmu@t430 mydotfiles]$
Thereyouve got the same files that you had uploaded!