2007年7月18日星期三

ext2 filesystem concepts

1. fast symbolic link

Ext2fs implements fast symbolic links. A fast symbolic link does not use any data block on the filesystem. The target name is not stored in a data block but in the inode itself. This policy can save some disk space (no data block needs to be allocated) and speeds up link operations (there is no need to read a data block when accessing such a link). Of course, the space available in the inode is limited so not every link can be implemented as a fast symbolic link. The maximal size of the target name in a fast symbolic link is 60 characters. We plan to extend this scheme to small files in the near future.

Ext2 filesystem introduction

1. The EXT2 Inode

Figure 9.2: EXT2 Inode

In the EXT2 file system, the inode is the basic building block; every file and directory in the file system is described by one and only one inode. The EXT2 inodes for each Block Group are kept in the inode table together with a bitmap that allows the system to keep track of allocated and unallocated inodes. The figure above shows the format of an EXT2 inode, amongst other information, it contains the following fields:

mode
This holds two pieces of information; what this inode describes and the permissions that users have to it. For EXT2, an inode can describe one of file, directory, symbolic link, block device, character device or FIFO.
Owner Information
The user and group identifiers of the owners of this file or directory. This allows the file system to correctly allow the right sort of accesses,
Size
The size of the file in bytes,
Timestamps
The time that the inode was created and the last time that it was modified,
Datablocks
Pointers to the blocks that contain the data that this inode is describing. The first twelve are pointers to the physical blocks containing the data described by this inode and the last three pointers contain more and more levels of indirection. For example, the double indirect blocks pointer points at a block of pointers to blocks of pointers to data blocks. This means that files less than or equal to twelve data blocks in length are more quickly accessed than larger files.

You should note that EXT2 inodes can describe special device files. These are not real files but handles that programs can use to access devices. All of the device files in /dev are there to allow programs to access Linux's devices. For example the mount program takes as an argument the device file that it wishes to mount.

2. The EXT2 Superblock

The Superblock contains a description of the basic size and shape of this file system. The information within it allows the file system manager to use and maintain the file system. Usually only the Superblock in Block Group 0 is read when the file system is mounted but each Block Group contains a duplicate copy in case of file system corruption. Amongst other information it holds the:

Magic Number
This allows the mounting software to check that this is indeed the Superblock for an EXT2 file system. For the current version of EXT2 this is 0xEF53.
Revision Level
The major and minor revision levels allow the mounting code to determine whether or not this file system supports features that are only available in particular revisions of the file system. There are also feature compatibility fields which help the mounting code to determine which new features can safely be used on this file system,
Mount Count and Maximum Mount Count
Together these allow the system to determine if the file system should be fully checked. The mount count is incremented each time the file system is mounted and when it equals the maximum mount count the warning message ``maximal mount count reached, running e2fsck is recommended'' is displayed,
Block Group Number
The Block Group number that holds this copy of the Superblock,
Block Size
The size of the block for this file system in bytes, for example 1024 bytes,
Blocks per Group
The number of blocks in a group. Like the block size this is fixed when the file system is created,
Free Blocks
The number of free blocks in the file system,
Free Inodes
The number of free Inodes in the file system,
First Inode
This is the inode number of the first inode in the file system. The first inode in an EXT2 root file system would be the directory entry for the '/' directory.

3. The EXT2 Group Descriptor

Each Block Group has a data structure describing it. Like the Superblock, all the group descriptors for all of the Block Groups are duplicated in each Block Group in case of file system corruption.

Each Group Descriptor contains the following information:

Blocks Bitmap
The block number of the block allocation bitmap for this Block Group. This is used during block allocation and deallocation,
Inode Bitmap
The block number of the inode allocation bitmap for this Block Group. This is used during inode allocation and deallocation,
Inode Table
The block number of the starting block for the inode table for this Block Group. Each inode is represented by the EXT2 inode data structure described below.
Free blocks count, Free Inodes count, Used directory count

The group descriptors are placed on after another and together they make the group descriptor table. Each Blocks Group contains the entire table of group descriptors after its copy of the Superblock. Only the first copy (in Block Group 0) is actually used by the EXT2 file system. The other copies are there, like the copies of the Superblock, in case the main copy is corrupted.

4. EXT2 Files

Finding a File in an EXT2 File System

A Linux filename has the same format as all Unix filenames have. It is a series of directory names separated by forward slashes (``/'') and ending in the file's name. One example filename would be /home/rusling/.cshrc where /home and /rusling are directory names and the file's name is .cshrc. Like all other Unix systems, Linux does not care about the format of the filename itself; it can be any length and consist of any of the printable characters. To find the inode representing this file within an EXT2 file system the system must parse the filename a directory at a time until we get to the file itself.

The first inode we need is the inode for the root of the file system and we find its number in the file system's superblock. To read an EXT2 inode we must look for it in the inode table of the appropriate Block Group. If, for example, the root inode number is 42, then we need the 42nd inode from the inode table of Block Group 0. The root inode is for an EXT2 directory, in other words the mode of the root inode describes it as a directory and it's data blocks contain EXT2 directory entries.

home is just one of the many directory entries and this directory entry gives us the number of the inode describing the /home directory. We have to read this directory (by first reading its inode and then reading the directory entries from the data blocks described by its inode) to find the rusling entry which gives us the number of the inode describing the /home/rusling directory. Finally we read the directory entries pointed at by the inode describing the /home/rusling directory to find the inode number of the .cshrc file and from this we get the data blocks containing the information in the file.

Changing the Size of a File in an EXT2 File System

One common problem with a file system is its tendency to fragment. The blocks that hold the file's data get spread all over the file system and this makes sequentially accessing the data blocks of a file more and more inefficient the further apart the data blocks are. The EXT2 file system tries to overcome this by allocating the new blocks for a file physically close to its current data blocks or at least in the same Block Group as its current data blocks. Only when this fails does it allocate data blocks in another Block Group.

Whenever a process attempts to write data into a file the Linux file system checks to see if the data has gone off the end of the file's last allocated block. If it has, then it must allocate a new data block for this file. Until the allocation is complete, the process cannot run; it must wait for the file system to allocate a new data block and write the rest of the data to it before it can continue. The first thing that the EXT2 block allocation routines do is to lock the EXT2 Superblock for this file system. Allocating and deallocating changes fields within the superblock, and the Linux file system cannot allow more than one process to do this at the same time. If another process needs to allocate more data blocks, it will have to wait until this process has finished. Processes waiting for the superblock are suspended, unable to run, until control of the superblock is relinquished by its current user. Access to the superblock is granted on a first come, first served basis and once a process has control of the superblock, it keeps control until it has finished. Having locked the superblock, the process checks that there are enough free blocks left in this file system. If there are not enough free blocks, then this attempt to allocate more will fail and the process will relinquish control of this file system's superblock.

If there are enough free blocks in the file system, the process tries to allocate one.

If the EXT2 file system has been built to preallocate data blocks then we may be able to take one of those. The preallocated blocks do not actually exist, they are just reserved within the allocated block bitmap. The VFS inode representing the file that we are trying to allocate a new data block for has two EXT2 specific fields, prealloc_block and prealloc_count, which are the block number of the first preallocated data block and how many of them there are, respectively. If there were no preallocated blocks or block preallocation is not enabled, the EXT2 file system must allocate a new block. The EXT2 file system first looks to see if the data block after the last data block in the file is free. Logically, this is the most efficient block to allocate as it makes sequential accesses much quicker. If this block is not free, then the search widens and it looks for a data block within 64 blocks of the of the ideal block. This block, although not ideal is at least fairly close and within the same Block Group as the other data blocks belonging to this file.

If even that block is not free, the process starts looking in all of the other Block Groups in turn until it finds some free blocks. The block allocation code looks for a cluster of eight free data blocks somewhere in one of the Block Groups. If it cannot find eight together, it will settle for less. If block preallocation is wanted and enabled it will update prealloc_block and prealloc_count accordingly.

Wherever it finds the free block, the block allocation code updates the Block Group's block bitmap and allocates a data buffer in the buffer cache. That data buffer is uniquely identified by the file system's supporting device identifier and the block number of the allocated block. The data in the buffer is zero'd and the buffer is marked as ``dirty'' to show that it's contents have not been written to the physical disk. Finally, the superblock itself is marked as ``dirty'' to show that it has been changed and it is unlocked. If there were any processes waiting for the superblock, the first one in the queue is allowed to run again and will gain exclusive control of the superblock for its file operations. The process's data is written to the new data block and, if that data block is filled, the entire process is repeated and another data block allocated.

5. EXT2 Directories

Figure: EXT2 Directory

In the EXT2 file system, directories are special files that are used to create and hold access paths to the files in the file system. Figure 9.3 shows the layout of a directory entry in memory.

A directory file is a list of directory entries, each one containing the following information:

inode
The inode for this directory entry. This is an index into the array of inodes held in the Inode Table of the Block Group. In figure 9.3, the directory entry for the file called file has a reference to inode number i1,
name length
The length of this directory entry in bytes,
name
The name of this directory entry.

The first two entries for every directory are always the standard . and .. entries meaning "this directory" and "the parent directory" respectively.


2007年5月19日星期六

Everyday GIT With 20 Commands

GIT suite 包含了 100 個以上的指令,而各指令的 manual page 中說明了該指令的功用以及使用的細節;但除非你知道用哪一個指令達成你想要做的事,不然壓根不知道要從哪一個 manual page 看起;反之,如果你已經知道要從哪個 manual 看起,那你大概也不需要看 manual 了。

難道這意指在使用 git 前,你必需瞭解每一個指令的用法嗎?其實不然。依據你的角色的不同,需要瞭解的指令也有些許的差異;但是不論扮演什麼樣的角色,你只要學一小部份的指令,就已足夠每日所需。這份文件的目的是成為一份密技表,以及為各種角色提供一個 git 的入口。

只要你有 repository,你就會需要 [Basic Repository] 中提到的指令 --- 其實這意指每個人都要看,因為 git 的每一個 working tree 都是一個 repository。

再來,只要有 commit 的需求,[Individual Developer (Standalone)] 中提到的指令就是你需要的,不論這個 repository 是否有提供他人存取。

如果你和他人合作,你也會需要 [Individual Developer (Participant)] 中提到的指令。

擔任 [Integrator] 角色的朋友,除了上面提到的以外,還需要多學一點。

[Repository Administration] 中提到的,則是給那些負責協助開發人員維護及交流 git repositories 的系統管理員所需要知道的。

Basic Repository

我們用這些指令來維護及操作 git repositories。

Examples

檢查 repository 是否有問題,並移除無用的部份
$ git fsck-objects (1)
$ git prune
$ git count-objects (2)
$ git repack (3)
$ git prune (4)
  1. 這裡我們沒有使用 "—full",一般來說這比較節省執行時使用的資源,並且可以保證 repository 的健康程度在合理的範圍內。

  2. 檢查散落的 objects 數量,並且回報如果不把它們打包會浪費多少資源。

  3. 因為沒有用 "-a",git 會使用累進式打包。依照經驗,以累積的方式每 4-5MB 打包散落的 objects 可以有不錯的成效。

  4. 在打包後,使用 prune 來移除重覆的散落 objects。

Repack a small project into single pack.
$ git repack -a -d (1)
$ git prune
  1. pack all the objects reachable from the refs into one pack and remove unneeded other packs

Individual Developer (Standalone)

A standalone individual developer does not exchange patches with other people, and works alone in a single repository, using the following commands.

Examples

Extract a tarball and create a working tree and a new repository to keep track of it.
$ tar zxf frotz.tar.gz
$ cd frotz
$ git-init-db
$ git add . (1)
$ git commit -m 'import of frotz source tree.'
$ git tag v2.43 (2)
  1. add everything under the current directory.

  2. make a lightweight, unannotated tag.

Create a topic branch and develop.
$ git checkout -b alsa-audio (1)
$ edit/compile/test
$ git checkout -- curses/ux_audio_oss.c (2)
$ git add curses/ux_audio_alsa.c (3)
$ edit/compile/test
$ git diff (4)
$ git commit -a -s (5)
$ edit/compile/test
$ git reset --soft HEAD^ (6)
$ edit/compile/test
$ git diff ORIG_HEAD (7)
$ git commit -a -c ORIG_HEAD (8)
$ git checkout master (9)
$ git pull . alsa-audio (10)
$ git log --since='3 days ago' (11)
$ git log v2.43.. curses/ (12)
  1. create a new topic branch.

  2. revert your botched changes in "curses/ux_audio_oss.c".

  3. you need to tell git if you added a new file; removal and modification will be caught if you do "commit -a" later.

  4. to see what changes you are committing.

  5. commit everything as you have tested, with your sign-off.

  6. take the last commit back, keeping what is in the working tree.

  7. look at the changes since the premature commit we took back.

  8. redo the commit undone in the previous step, using the message you originally wrote.

  9. switch to the master branch.

  10. merge a topic branch into your master branch

  11. review commit logs; other forms to limit output can be combined and include —max-count=10 (show 10 commits), —until=2005-12-10.

  12. view only the changes that touch what's in curses/ directory, since v2.43 tag.

Individual Developer (Participant)

A developer working as a participant in a group project needs to learn how to communicate with others, and uses these commands in addition to the ones needed by a standalone developer.

  • git-clone(1) from the upstream to prime your local repository.

  • git-pull(1) and git-fetch(1) from "origin" to keep up-to-date with the upstream.

  • git-push(1) to shared repository, if you adopt CVS style shared repository workflow.

  • git-format-patch(1) to prepare e-mail submission, if you adopt Linux kernel-style public forum workflow.

Examples

Clone the upstream and work on it. Feed changes to upstream.
$ git clone git://git.kernel.org/pub/scm/.../torvalds/linux-2.6 my2.6
$ cd my2.6
$ edit/compile/test; git commit -a -s (1)
$ git format-patch origin (2)
$ git pull (3)
$ git whatchanged -p ORIG_HEAD.. arch/i386 include/asm-i386 (4)
$ git pull git://git.kernel.org/pub/.../jgarzik/libata-dev.git ALL (5)
$ git reset --hard ORIG_HEAD (6)
$ git prune (7)
$ git fetch --tags (8)
  1. repeat as needed.

  2. extract patches from your branch for e-mail submission.

  3. "pull" fetches from "origin" by default and merges into the current branch.

  4. immediately after pulling, look at the changes done upstream since last time we checked, only in the area we are interested in.

  5. fetch from a specific branch from a specific repository and merge.

  6. revert the pull.

  7. garbage collect leftover objects from reverted pull.

  8. from time to time, obtain official tags from the "origin" and store them under .git/refs/tags/.

Push into another repository.
satellite$ git clone mothership:frotz/.git frotz (1)
satellite$ cd frotz
satellite$ cat .git/remotes/origin (2)
URL: mothership:frotz/.git
Pull: master:origin
satellite$ echo 'Push: master:satellite' >>.git/remotes/origin (3)
satellite$ edit/compile/test/commit
satellite$ git push origin (4)

mothership$ cd frotz
mothership$ git checkout master
mothership$ git pull . satellite (5)
  1. mothership machine has a frotz repository under your home directory; clone from it to start a repository on the satellite machine.

  2. clone creates this file by default. It arranges "git pull" to fetch and store the master branch head of mothership machine to local "origin" branch.

  3. arrange "git push" to push local "master" branch to "satellite" branch of the mothership machine.

  4. push will stash our work away on "satellite" branch on the mothership machine. You could use this as a back-up method.

  5. on mothership machine, merge the work done on the satellite machine into the master branch.

Branch off of a specific tag.
$ git checkout -b private2.6.14 v2.6.14 (1)
$ edit/compile/test; git commit -a
$ git checkout master
$ git format-patch -k -m --stdout v2.6.14..private2.6.14 |
git am -3 -k (2)
  1. create a private branch based on a well known (but somewhat behind) tag.

  2. forward port all changes in private2.6.14 branch to master branch without a formal "merging".

Integrator

A fairly central person acting as the integrator in a group project receives changes made by others, reviews and integrates them and publishes the result for others to use, using these commands in addition to the ones needed by participants.

Examples

My typical GIT day.
$ git status (1)
$ git show-branch (2)
$ mailx (3)
& s 2 3 4 5 ./+to-apply
& s 7 8 ./+hold-linus
& q
$ git checkout master
$ git am -3 -i -s -u ./+to-apply (4)
$ compile/test
$ git checkout -b hold/linus && git am -3 -i -s -u ./+hold-linus (5)
$ git checkout topic/one && git rebase master (6)
$ git checkout pu && git reset --hard master (7)
$ git pull . topic/one topic/two && git pull . hold/linus (8)
$ git checkout maint
$ git cherry-pick master~4 (9)
$ compile/test
$ git tag -s -m 'GIT 0.99.9x' v0.99.9x (10)
$ git fetch ko && git show-branch master maint 'tags/ko-*' (11)
$ git push ko (12)
$ git push ko v0.99.9x (13)
  1. see what I was in the middle of doing, if any.

  2. see what topic branches I have and think about how ready they are.

  3. read mails, save ones that are applicable, and save others that are not quite ready.

  4. apply them, interactively, with my sign-offs.

  5. create topic branch as needed and apply, again with my sign-offs.

  6. rebase internal topic branch that has not been merged to the master, nor exposed as a part of a stable branch.

  7. restart "pu" every time from the master.

  8. and bundle topic branches still cooking.

  9. backport a critical fix.

  10. create a signed tag.

  11. make sure I did not accidentally rewind master beyond what I already pushed out. "ko" shorthand points at the repository I have at kernel.org, and looks like this:

    $ cat .git/remotes/ko
    URL: kernel.org:/pub/scm/git/git.git
    Pull: master:refs/tags/ko-master
    Pull: maint:refs/tags/ko-maint
    Push: master
    Push: +pu
    Push: maint

    In the output from "git show-branch", "master" should have everything "ko-master" has.

  12. push out the bleeding edge.

  13. push the tag out, too.

Repository Administration

A repository administrator uses the following tools to set up and maintain access to the repository by developers.

  • git-daemon(1) to allow anonymous download from repository.

  • git-shell(1) can be used as a restricted login shell for shared central repository users.

update hook howto has a good example of managing a shared central repository.

Examples

Run git-daemon to serve /pub/scm from inetd.
$ grep git /etc/inetd.conf
git stream tcp nowait nobody \
/usr/bin/git-daemon git-daemon --inetd --syslog --export-all /pub/scm

The actual configuration line should be on one line.

Run git-daemon to serve /pub/scm from xinetd.
$ cat /etc/xinetd.d/git-daemon
# default: off
# description: The git server offers access to git repositories
service git
{
disable = no
type = UNLISTED
port = 9418
socket_type = stream
wait = no
user = nobody
server = /usr/bin/git-daemon
server_args = --inetd --syslog --export-all --base-path=/pub/scm
log_on_failure += USERID
}

Check your xinetd(8) documentation and setup, this is from a Fedora system. Others might be different.

Give push/pull only access to developers.
$ grep git /etc/passwd (1)
alice:x:1000:1000::/home/alice:/usr/bin/git-shell
bob:x:1001:1001::/home/bob:/usr/bin/git-shell
cindy:x:1002:1002::/home/cindy:/usr/bin/git-shell
david:x:1003:1003::/home/david:/usr/bin/git-shell
$ grep git /etc/shells (2)
/usr/bin/git-shell
  1. log-in shell is set to /usr/bin/git-shell, which does not allow anything but "git push" and "git pull". The users should get an ssh access to the machine.

  2. in many distributions /etc/shells needs to list what is used as the login shell.

CVS-style shared repository.
$ grep git /etc/group (1)
git:x:9418:alice,bob,cindy,david
$ cd /home/devo.git
$ ls -l (2)
lrwxrwxrwx 1 david git 17 Dec 4 22:40 HEAD -> refs/heads/master
drwxrwsr-x 2 david git 4096 Dec 4 22:40 branches
-rw-rw-r-- 1 david git 84 Dec 4 22:40 config
-rw-rw-r-- 1 david git 58 Dec 4 22:40 description
drwxrwsr-x 2 david git 4096 Dec 4 22:40 hooks
-rw-rw-r-- 1 david git 37504 Dec 4 22:40 index
drwxrwsr-x 2 david git 4096 Dec 4 22:40 info
drwxrwsr-x 4 david git 4096 Dec 4 22:40 objects
drwxrwsr-x 4 david git 4096 Nov 7 14:58 refs
drwxrwsr-x 2 david git 4096 Dec 4 22:40 remotes
$ ls -l hooks/update (3)
-r-xr-xr-x 1 david git 3536 Dec 4 22:40 update
$ cat info/allowed-users (4)
refs/heads/master alice\|cindy
refs/heads/doc-update bob
refs/tags/v[0-9]* david
  1. place the developers into the same git group.

  2. and make the shared repository writable by the group.

  3. use update-hook example by Carl from Documentation/howto/ for branch policy control.

  4. alice and cindy can push into master, only bob can push into doc-update. david is the release manager and is the only person who can create and push version tags.

HTTP server to support dumb protocol transfer.
dev$ git update-server-info (1)
dev$ ftp user@isp.example.com (2)
ftp> cp -r .git /home/user/myproject.git
  1. make sure your info/refs and objects/info/packs are up-to-date

  2. upload to public HTTP server hosted by your ISP.

x86_64, irq: check remote IRR bit before migrating level triggered irq

on x86_64 kernel, level triggered irq migration gets initiated in the context of that interrupt(after executing the irq handler) and following steps are followed to do the irq migration.

1. mask IOAPIC RTE entry; // write to IOAPIC RTE
2. EOI; // processor EOI write
3. reprogram IOAPIC RTE entry // write to IOAPIC RTE with new destination and
// and interrupt vector due to per cpu vector
// allocation.
4. unmask IOAPIC RTE entry; // write to IOAPIC RTE

Because of the per cpu vector allocation in x86_64 kernels, when the irq migrates to a different cpu, new vector(corresponding to the new cpu) will get allocated.

An EOI write to local APIC has a side effect of generating an EOI write for level trigger interrupts (normally this is a broadcast to all IOAPICs). The EOI broadcast generated as a side effect of EOI write to processor may be delayed while the other IOAPIC writes (step 3 and 4) can go through.

Normally, the EOI generated by local APIC for level trigger interrupt contains vector number. The IOAPIC will take this vector number and search the IOAPIC RTE entries for an entry with matching vector number and clear the remote IRR bit (indicate EOI). However, if the vector number is changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI is received later. This will cause the remote IRR to get stuck causing the interrupt hang (no more interrupt from this RTE).

Current x86_64 kernel assumes that remote IRR bit is cleared by the time IOAPIC RTE is reprogrammed. Fix this assumption by checking for remote IRR bit and if it still set, delay the irq migration to the next interrupt arrival event(hopefully, next time remote IRR bit will get cleared before the IOAPIC RTE is reprogrammed).