1. The EXT2 Inode
Figure 9.2: EXT2 Inode
In the EXT2 file system, the inode is the basic building block; every file and directory in the file system is described by one and only one inode. The EXT2 inodes for each Block Group are kept in the inode table together with a bitmap that allows the system to keep track of allocated and unallocated inodes. The figure above shows the format of an EXT2 inode, amongst other information, it contains the following fields:
- mode
- This holds two pieces of information; what this inode describes and the permissions that users have to it. For EXT2, an inode can describe one of file, directory, symbolic link, block device, character device or FIFO.
- Owner Information
- The user and group identifiers of the owners of this file or directory. This allows the file system to correctly allow the right sort of accesses,
- Size
- The size of the file in bytes,
- Timestamps
- The time that the inode was created and the last time that it was modified,
- Datablocks
- Pointers to the blocks that contain the data that this inode is describing. The first twelve are pointers to the physical blocks containing the data described by this inode and the last three pointers contain more and more levels of indirection. For example, the double indirect blocks pointer points at a block of pointers to blocks of pointers to data blocks. This means that files less than or equal to twelve data blocks in length are more quickly accessed than larger files.
You should note that EXT2 inodes can describe special device files. These are not real files but handles that programs can use to access devices. All of the device files in /dev are there to allow programs to access Linux's devices. For example the mount program takes as an argument the device file that it wishes to mount.
2. The EXT2 Superblock
The Superblock contains a description of the basic size and shape of this file system. The information within it allows the file system manager to use and maintain the file system. Usually only the Superblock in Block Group 0 is read when the file system is mounted but each Block Group contains a duplicate copy in case of file system corruption. Amongst other information it holds the:
- Magic Number
- This allows the mounting software to check that this is indeed the Superblock for an EXT2 file system. For the current version of EXT2 this is 0xEF53.
- Revision Level
- The major and minor revision levels allow the mounting code to determine whether or not this file system supports features that are only available in particular revisions of the file system. There are also feature compatibility fields which help the mounting code to determine which new features can safely be used on this file system,
- Mount Count and Maximum Mount Count
- Together these allow the system to determine if the file system should be fully checked. The mount count is incremented each time the file system is mounted and when it equals the maximum mount count the warning message ``maximal mount count reached, running e2fsck is recommended'' is displayed,
- Block Group Number
- The Block Group number that holds this copy of the Superblock,
- Block Size
- The size of the block for this file system in bytes, for example 1024 bytes,
- Blocks per Group
- The number of blocks in a group. Like the block size this is fixed when the file system is created,
- Free Blocks
- The number of free blocks in the file system,
- Free Inodes
- The number of free Inodes in the file system,
- First Inode
- This is the inode number of the first inode in the file system. The first inode in an EXT2 root file system would be the directory entry for the '/' directory.
3. The EXT2 Group Descriptor
Each Block Group has a data structure describing it. Like the Superblock, all the group descriptors for all of the Block Groups are duplicated in each Block Group in case of file system corruption.
Each Group Descriptor contains the following information:
- Blocks Bitmap
- The block number of the block allocation bitmap for this Block Group. This is used during block allocation and deallocation,
- Inode Bitmap
- The block number of the inode allocation bitmap for this Block Group. This is used during inode allocation and deallocation,
- Inode Table
- The block number of the starting block for the inode table for this Block Group. Each inode is represented by the EXT2 inode data structure described below.
- Free blocks count, Free Inodes count, Used directory count
The group descriptors are placed on after another and together they make the group descriptor table. Each Blocks Group contains the entire table of group descriptors after its copy of the Superblock. Only the first copy (in Block Group 0) is actually used by the EXT2 file system. The other copies are there, like the copies of the Superblock, in case the main copy is corrupted.
4. EXT2 Files
Finding a File in an EXT2 File System
A Linux filename has the same format as all Unix filenames have. It is a series of directory names separated by forward slashes (``
/'') and ending in the file's name. One example filename would be
/home/rusling/.cshrc where
/home and
/rusling are directory names and the file's name is
.cshrc. Like all other Unix systems, Linux does not care about the format of the filename itself; it can be any length and consist of any of the printable characters. To find the inode representing this file within an
EXT2 file system the system must parse the filename a directory at a time until we get to the file itself.
The first inode we need is the inode for the root of the file system and we find its number in the file system's superblock. To read an EXT2 inode we must look for it in the inode table of the appropriate Block Group. If, for example, the root inode number is 42, then we need the 42nd inode from the inode table of Block Group 0. The root inode is for an EXT2 directory, in other words the mode of the root inode describes it as a directory and it's data blocks contain EXT2 directory entries.
home is just one of the many directory entries and this directory entry gives us the number of the inode describing the /home directory. We have to read this directory (by first reading its inode and then reading the directory entries from the data blocks described by its inode) to find the rusling entry which gives us the number of the inode describing the /home/rusling directory. Finally we read the directory entries pointed at by the inode describing the /home/rusling directory to find the inode number of the .cshrc file and from this we get the data blocks containing the information in the file.
Changing the Size of a File in an EXT2 File System
One common problem with a file system is its tendency to fragment. The blocks that hold the file's data get spread all over the file system and this makes sequentially accessing the data blocks of a file more and more inefficient the further apart the data blocks are. The EXT2 file system tries to overcome this by allocating the new blocks for a file physically close to its current data blocks or at least in the same Block Group as its current data blocks. Only when this fails does it allocate data blocks in another Block Group.
Whenever a process attempts to write data into a file the Linux file system checks to see if the data has gone off the end of the file's last allocated block. If it has, then it must allocate a new data block for this file. Until the allocation is complete, the process cannot run; it must wait for the file system to allocate a new data block and write the rest of the data to it before it can continue. The first thing that the EXT2 block allocation routines do is to lock the EXT2 Superblock for this file system. Allocating and deallocating changes fields within the superblock, and the Linux file system cannot allow more than one process to do this at the same time. If another process needs to allocate more data blocks, it will have to wait until this process has finished. Processes waiting for the superblock are suspended, unable to run, until control of the superblock is relinquished by its current user. Access to the superblock is granted on a first come, first served basis and once a process has control of the superblock, it keeps control until it has finished. Having locked the superblock, the process checks that there are enough free blocks left in this file system. If there are not enough free blocks, then this attempt to allocate more will fail and the process will relinquish control of this file system's superblock.
If there are enough free blocks in the file system, the process tries to allocate one.
If the EXT2 file system has been built to preallocate data blocks then we may be able to take one of those. The preallocated blocks do not actually exist, they are just reserved within the allocated block bitmap. The VFS inode representing the file that we are trying to allocate a new data block for has two EXT2 specific fields, prealloc_block and prealloc_count, which are the block number of the first preallocated data block and how many of them there are, respectively. If there were no preallocated blocks or block preallocation is not enabled, the EXT2 file system must allocate a new block. The EXT2 file system first looks to see if the data block after the last data block in the file is free. Logically, this is the most efficient block to allocate as it makes sequential accesses much quicker. If this block is not free, then the search widens and it looks for a data block within 64 blocks of the of the ideal block. This block, although not ideal is at least fairly close and within the same Block Group as the other data blocks belonging to this file.
If even that block is not free, the process starts looking in all of the other Block Groups in turn until it finds some free blocks. The block allocation code looks for a cluster of eight free data blocks somewhere in one of the Block Groups. If it cannot find eight together, it will settle for less. If block preallocation is wanted and enabled it will update prealloc_block and prealloc_count accordingly.
Wherever it finds the free block, the block allocation code updates the Block Group's block bitmap and allocates a data buffer in the buffer cache. That data buffer is uniquely identified by the file system's supporting device identifier and the block number of the allocated block. The data in the buffer is zero'd and the buffer is marked as ``dirty'' to show that it's contents have not been written to the physical disk. Finally, the superblock itself is marked as ``dirty'' to show that it has been changed and it is unlocked. If there were any processes waiting for the superblock, the first one in the queue is allowed to run again and will gain exclusive control of the superblock for its file operations. The process's data is written to the new data block and, if that data block is filled, the entire process is repeated and another data block allocated.
5. EXT2 Directories
Figure: EXT2 Directory
In the EXT2 file system, directories are special files that are used to create and hold access paths to the files in the file system. Figure 9.3 shows the layout of a directory entry in memory.
A directory file is a list of directory entries, each one containing the following information:
- inode
- The inode for this directory entry. This is an index into the array of inodes held in the Inode Table of the Block Group. In figure 9.3, the directory entry for the file called file has a reference to inode number i1,
- name length
- The length of this directory entry in bytes,
- name
- The name of this directory entry.
The first two entries for every directory are always the standard . and .. entries meaning "this directory" and "the parent directory" respectively.