ReiserFS

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
ReiserFS 3.6
Developer(s)Namesys
Full nameReiserFS
Introduced2001 with Linux 2.4.1
Partition identifierApple_UNIX_SVR2 (Apple Partition Map)
0x83 (MBR)
EBD0A0A2-B9E5-4433-87C0-68B6B72699C7 (GPT)
Structures
Directory contentsB+ tree
File allocationBitmap[1]
Limits
Max. volume size16 TiB[2]
Max. file size1 EiB (8 TiB on 32 bit systems)[2]
Max. number of files232−3 (~4 billion)[2]
Max. filename length4032 bytes, limited to 255 by Linux VFS
Allowed characters in filenamesAll bytes except NUL and '/'
Features
Dates recordedmodification (mtime), metadata change (ctime), access (atime)
Date rangeDecember 14, 1901 – January 18, 2038
Date resolution1 s
ForksExtended attributes
File system permissionsUnix permissions, ACLs and arbitrary security attributes
Transparent compressionNo
Transparent encryptionNo
Other
Supported operating systemsLinux, ReactOS

ReiserFS is a general-purpose, journaled computer file system initially designed and implemented by a team at Namesys led by Hans Reiser. ReiserFS is currently supported on Linux (without quota support) licensed as GPLv2. Introduced in version 2.4.1 of the Linux kernel, it was the first journaling file system to be included in the standard kernel. ReiserFS was the default file system in Novell's SUSE Linux Enterprise until Novell decided to move to ext3 on October 12, 2006 for future releases.[3]

Namesys considered ReiserFS version 3.6 which introduced a new on-disk format allowing bigger filesizes, now occasionally referred to as Reiser3, as stable and feature-complete and, with the exception of security updates and critical bug fixes, ceased development on it to concentrate on its successor, Reiser4. Namesys went out of business in 2008 after Reiser's conviction for murder. The product is now maintained as open source by volunteers.[4] The reiserfsprogs 3.6.27 were released on 25 July 2017.[5]

Features[edit]

At the time of its introduction, ReiserFS offered features that had not been available in existing Linux file systems:

  • Metadata-only journaling (also block journaling, since Linux 2.6.8), its most-publicized advantage over what was the stock Linux file system at the time, ext2.[citation needed]
  • Online resizing (growth only), with or without an underlying volume manager such as LVM.[citation needed] Since then, Namesys has also provided tools to resize (both grow and shrink) ReiserFS file systems offline.[citation needed]
  • Tail packing, a scheme to reduce internal fragmentation. Tail packing can have a significant performance impact. Reiser4 may have improved this by packing tails where it does not negatively affect performance.[6]

Design[edit]

ReiserFS stores file metadata ("stat items"), directory entries ("directory items"), inode block lists ("indirect items"), and tails of files ("direct items") in a single, combined B+ tree keyed by a universal object ID. Disk blocks allocated to nodes of the tree are "formatted internal blocks". Blocks for leaf nodes (in which items are packed end-to-end) are "formatted leaf blocks". All other blocks are "unformatted blocks" containing file contents. Directory items with too many entries or indirect items which are too long to fit into a node spill over into the right leaf neighbour. Block allocation is tracked by free space bitmaps in fixed locations.

By contrast, ext2 and other Berkeley FFS-like file systems of that time simply used a fixed formula for computing inode locations, hence limiting the number of files they may contain.[7] Most such file systems also store directories as simple lists of entries, which makes directory lookups and updates linear time operations and degrades performance on very large directories. The single B+ tree design in ReiserFS avoids both of these problems due to better scalability properties.

Version history[edit]

Regarding the development of a filesystem, three fields have to be considered separately:

  • The on-disk format, thus the way data is structured on the media.
  • The implementation of the filesystem driver, which enables the operating system to read and write data.
  • The maintenance tools for creating, deleting, resizing, defragmenting and checking partitions on the media.

Therefore this section contains three tables: one for the 3.5.x series of the filesystem driver that was used to read and write the 3.5 on-disk format, another one for the 3.6.x series of the filsystem driver that is used to read and write the 3.6 on-disk format and a third one for the tools contained in the reiserfsprogs package.

Since the on-disk format 3.6 is regarded as feature complete and stable, it is frozen and doesn't get changed anymore. The filesystem driver for Linux, that is part of the official kernel as well as the reiserfsprogs operating in user space are still being maintained and getting updates (rarely also new features) even nowadays (as of 2019).

The defrag program was never fully implemented, although there were attempts up to the year 2014. One reason why it never was finished was the rise of solid state disks (SSD) which – contrary to the classic spinning wheel hard disk drives (HDD) – do not need to be defragmented, or more precisely: shouldn't be defragmented at all, as any unnecessary write operations will shorten the life of an SSD.

ReiserFS 3.5.x Release date Linux Kernel Notes
3.5.9 mid 1999 2.2.x
3.5.24 2000-08-05 Fixes for get_num_ver, zeroing of truncated bytes-in, unformatted nodes, r5 hash handling (for reiserfsck), null transaction handle in do_balance (for fsck), bad path checking in fsck, bad delimiting keys are now caught, tail locking, and 'du' works properly.[8]
3.5.25 2000-09-07 This release includes a fix for files with holes, an ioctl_patch, and improved fsck code. r5_hash is now set as the default.
3.5.26 2000-09-22 A fix for performance degradation of 3.5.25.
3.5.29 2000-12-23 This release is the latest version patched for 2.2.18. Please visit the alternative download URL or homepage to download this version.
3.5.30 2001-02-22 Bug fix in reiserfs ioctl code for lilo and decrement key bug fix.
3.5.31 2001-03-27 Bugfixes, a missed case in hash detection code, a new REISERFS_CHECK mode (you can mount by 3.5.x then by 3.6.x and again by 3.5.x), and a check for the count parameter in reiserfs_file_write. (less)
ReiserFS 3.6.x Linux Kernel Release date Notes
3.6.x ? ? New on-disk format. "In reiserfs up until version 3.5 the offset and the type fields were both 4 byte values. This meant, that the maximum file size was limited to roughly 2^32 bytes, or 4GB (2^32 bytes plus the data of one more indirect item plus the tail, actually). To increase the maximum file size in the file system, in version 3.6,the offset field was increased to 60 bits, and the type field shrunk to 4 bits. This now allows for a theoretical maximum file size of 2^60 bytes, but since there can be only 2^32 blocks with a maximum of 2^16 bytes per block, the file system itself only supports 2^48 bytes"[9]
3.6.12 2.4.0-test6 2000-08-12 A port to linux-2.4.0-test6, a fix for memory corruption with long file names, r5_hash is the default hash (gives the best performance), and improved journal code.
3.6.13 2000-08-19 This version features a fix for another 'du' bug, a new ioctl command: REISERFS_IOC_UNPACK (for LILO support), and gives a way to unpack any reiserfs file with tail. The documentation has also been updated. (less)
3.6.14 2.4.0-test7 2000-09-01 This release has been ported to kernel 2.4.0-test7 and minor documentation fixes have been made.
3.6.15 2.4.0-test8 2000-09-15 Porting of the code to linux-2.4.0-test8, an optimization in get_block to avoid unneeded transactions, a new scheme of block allocating (to speed up work with big files), large rewrites to truncate and writepage to fix deadlocks and race conditions with file tails, and a new benchmarks dir with mongo.sh benchmark in the utils directory.
3.6.16 2000-09-18 This release adds a fix for a bug that was found in 3.6.15 and the new scheme of blocks allocation in 3.6.15 has been reverted (performances were bad).
3.6.17 2000-09-21 This release adds a fix for the "StarOffice + Reiserfs" bug that was found in 3.6.16.
3.6.18 2.4-test9 2000-10-13 A port to 2.4-test9, a fix for the not-more-than 64k subdirectories, and misc. experimental features.
3.6.23 2.4.0-test12 2000-12-23 This release is the latest beta patch for kernel 2.4.0-test12.
2.4.1 2001-01-29 ReiserFS became part of the Linux kernel.[10] From here on, any further changes to the filesystem driver are collected at Linus Torvalds' Git repository.[11]
2.4.13 ? Jeff Mahoney: reiserfs endian safeness
Chris Mason: reiserfs O_SYNC/fsync performance improvements
2.6.8 2004-08-14 Chris Mason (back then an employee of SUSE, later the lead developer of Btrfs) committed block allocator optimizations, btree readahead and data logging support (the data journal)[12]
2.6.9 2004-10-19 Chris Mason: Reiserfs v3 barrier support: Add reiserfs support for flush barriers, mount with -o barrier=flush to enable them. Barriers are triggered on fsync and for log commits
2.6.10 2004-12-24 Jeff Mahoney: Add I/O error handling to journal operations
2.6.12 2005-06-17 Paolo 'Blaisorblade' Giarrusso: make resize option auto-get new device size
Jeff Mahoney: Add selinux support
2.6.19 2006-11-29 Jeff Mahoney: On-demand bitmap loading - speeds up mounting speed and eliminates minimum window size for bitmap searching - improves the allocator in some corner-cases
2.6.30 2009-06-09 Jeff Mahoney: Use generic xattr handlers; Journaled xattrs; Use generic readdir for operations across all xattrs; Add atomic addition of selinux attributes during inode creation
2.6.33 2010-02-24 Reiserfs de-BKLification: "One of the biggest shortcomings of reiserfs v3 (and one of the reasons why most distros use Ext instead) is that its codebase handles concurrency using a single big lock - the BKL (Big Kernel Lock). This means that its SMP scalability is very poor. This release won't fix that issue, but it replaces the BKL with a reiserfs-specific solution. In this release, there are no more traces of the BKL inside reiserfs. It has been converted into a recursive mutex. This sounds dirty but plugging a traditional lock into reiserfs would involve a deeper rewrite as the reiserfs architecture is based on the ugly big kernel lock rules.
Due to the subtle semantics of the locking changes, some workloads may have small performance regressions and other have improvements.
3.1 2011-10-24 Christoph Hellwig: Default to barrier=flush
4.12 2017-07-02 Jan Kara: Pull quota, reiserfs, udf and ext2 updates:
"The branch contains changes to quota code so that it does not modify persistent flags in inode->i_flags (it was the only place in kernel doing that) and handle it inside filesystem's quotaon/off handlers instead.
The branch also contains two UDF cleanups, a couple of reiserfs fixes and one fix for ext2 quota locking"
4.1.50 Jan Kara: Don't clear SGID when inheriting ACLs
Jeff Mahoney: don't preallocate blocks for extended attributes; fix race in prealloc discard
Arnd Bergmann: avoid a -Wmaybe-uninitialized warning
4.1.52 Andrew Morton: fs/reiserfs/journal.c: add missing reiserfs_warning() arg
Jan Kara: Make cancel_old_flush() reliable
4.11.5 Jan Kara: Make flush bios explicitly sync
4.12.4 Jan Kara: Don't clear SGID when inheriting ACLs
4.14.17 Jeff Layton: remove unneeded i_version bump
4.14.36 Andrew Morton: fs/reiserfs/journal.c: add missing reiserfs_warning() arg
4.14.57 Eric Biggers: fix buffer overflow with long warning messages
4.14.67 Jann Horn: fix broken xattr handling (heap corruption, bad retval)
4.14.70 Arnd Bergmann: change j_timestamp type to time64_t
4.14.84 Jann Horn: propagate errors from fill_with_dentries() properly
4.16.4 Andrew Morton: fs/reiserfs/journal.c: add missing reiserfs_warning() arg
4.17.19 Jann Horn: fix broken xattr handling (heap corruption, bad retval)
4.17.9 Eric Biggers: fix buffer overflow with long warning messages
4.18.5 Jann Horn: fix broken xattr handling (heap corruption, bad retval)
4.18.8 Arnd Bergmann: change j_timestamp type to time64_t
4.19 Jann Horn: fix broken xattr handling (heap corruption, bad retval)
Arnd Bergmann: change j_timestamp type to time64_t; remove obsolete print_time function; use monotonic time for j_trans_start_time
4.19.5 Jann Horn: propagate errors from fill_with_dentries() properly
4.20 Masahiro Yamada: remove workaround code for GCC 3.x
Jann Horn: propagate errors from fill_with_dentries() properly
4.4.114 Jan Kara: Don't clear SGID when inheriting ACLs
Jeff Mahoney: don't preallocate blocks for extended attributes; fix race in prealloc discard
4.4.118 Arnd Bergmann: avoid a -Wmaybe-uninitialized warning
4.4.123 Jan Kara: Make cancel_old_flush() reliable
4.4.129 Andrew Morton: fs/reiserfs/journal.c: add missing resierfs_warning() arg
4.4.152 Jann Horn: fix broken xattr handling (heap corruption, bad retval)
4.4.156 Arnd Bergmann: change j_timestamp type to time64_t
4.4.165 Jann Horn: propagate errors from fill_with_dentries() properly
4.4.23 Jeff Mahoney: fix "new_insert_key may be used uninitialized ..."
4.4.27 Al Viro: switch to generic_{get,set,remove}xattr()
Mike Galbraith: Unlock superblock before calling reiserfs_quota_on_mount()
4.7.10 Mike Galbraith: Unlock superblock before calling reiserfs_quota_on_mount()
4.7.6 Jeff Mahoney: fix "new_insert_key may be used uninitialized ..."
4.8.4 Mike Galbraith: Unlock superblock before calling reiserfs_quota_on_mount()
4.9.114 Eric Biggers: fix buffer overflow with long warning messages
4.9.124 Jann Horn: fix broken xattr handling (heap corruption, bad retval)
4.9.127 Arnd Bergmann: change j_timestamp type to time64_t
4.9.141 Jann Horn: propagate errors from fill_with_dentries() properly
4.9.79 2018-01-31 Jeff Mahoney: don't preallocate blocks for extended attributes; fix race in prealloc discard
4.9.80 2018-02-04 Jeff Layton: remove unneeded i_version bump
4.9.84 2018-02-25 Arnd Bergmann: avoid a -Wmaybe-uninitialized warning
4.9.89 2018-03-22 Jan Kara: Make cancel_old_flush() reliable
4.9.96 2018-04-14 Andrew Morton: fs/reiserfs/journal.c: add missing resierfs_warning() arg
5.2 2019-07-07 Al Viro: convert to ->free_inode()
Bharath Vedartham: add comment to explain endianness issue in xattr_hash; fs/reiserfs/journal.c: Make remove_journal_hash static
Jan Kara: A couple of small bugfixes and cleanups for quota, udf, ext2, and reiserfs
reiserfsprogs version Release date Notes[13]
n.a. 2001-02-05 mkreiserfs: can make filesystem with 1 data block; 3.6 format is now default
3.6.27 2017-07-24 build: properly define version in reiserfscore.pc
misc: include <sys/sysmacros.h>
xattrs: handle both hash forms in reiserfs_check_xattr

Performance[edit]

Compared with ext2 and ext3 in version 2.4 of the Linux kernel, when dealing with files under 4 KiB and with tail packing enabled, ReiserFS may[14] be faster. This was said[by whom?] to be of great benefit in Usenet news spools, HTTP caches, mail delivery systems, and other applications where performance with small files is critical. However, in practice[according to whom?] news spools use a feature called cycbuf, which holds articles in one large file; fast HTTP caches and several revision control systems use a similar approach, nullifying these performance advantages. For email servers, ReiserFS was problematic due to semantic problems explained below. Also, ReiserFS had a problem with very fast filesystem aging when compared to other filesystems — in several usage scenarios filesystem performance decreased dramatically with time.[citation needed]

Before Linux 2.6.33,[15] ReiserFS heavily used the big kernel lock (BKL) — a global kernel-wide lock — which does not scale very well[16][17] for systems with multiple cores, as the critical code parts are only ever executed by one core at a time.

Usage[edit]

ReiserFS was the default file system in SuSE Linux since version 6.4 (released in 2000),[18][19] until switching to ext3 in SUSE Linux Enterprise 10.2/openSUSE 11, announced in 2006.[20][21]

Jeff Mahoney of SUSE wrote a post on 14 September 2006 proposing to move from ReiserFS to ext3 for the default installation file system.[16] Some reasons he mentioned were scalability, "performance problems with extended attributes and ACLs", "a small and shrinking development community", and that "Reiser4 is not an incremental update and requires a reformat, which is unreasonable for most people."[16] On October 4 he wrote a response comment on a blog in order to clear up some issues.[22] He wrote that his proposal for the switch was unrelated to Hans Reiser being under trial for murder.[23] Mahoney wrote he "was concerned that people would make a connection where none existed" and that "the timing is entirely coincidental and the motivation is unrelated."[22]

Criticism[edit]

Some directory operations (including unlink(2)) are not synchronous on ReiserFS, which can result in data corruption with applications relying heavily on file-based locks (such as mail transfer agents qmail[24] and Postfix[25]) if the machine halts before it has synchronized the disk.[26]

There are no programs to specifically defragment a ReiserFS file system, although tools have been written to automatically copy the contents of fragmented files hoping that more contiguous blocks of free space can be found. However, a "repacker" tool was planned for the next Reiser4 file system to deal with file fragmentation.[27] With the rise of Solid State Disks this problem became irrelevant as contrary to Hard Disk Drives there is no slow down caused by fragmentation as SSDs don't use any moving parts. It is even recommended to abstain from any kind of defragmentation on SSDs because it will shorten their lifetime.

fsck[edit]

The tree rebuild process of ReiserFS's fsck has attracted much criticism by the *nix community: If the file system becomes so badly corrupted that its internal tree is unusable, performing a tree rebuild operation may further corrupt existing files or introduce new entries with unexpected contents,[28] but this action is not part of normal operation or a normal file system check and has to be explicitly initiated and confirmed by the administrator.

ReiserFS v3 images should not be stored on a ReiserFS v3 partition (e.g., backups or disk images for emulators) without transforming them (e.g., by compressing or encrypting) in order to avoid confusing the rebuild. Reformatting an existing ReiserFS v3 partition can also leave behind data that could confuse the rebuild operation and make files from the old system reappear. This also allows malicious users to intentionally store files that will confuse the rebuilder. As the metadata is always in a consistent state after a file system check, corruption here means that contents of files are merged in unexpected ways with the contained file system's metadata. The ReiserFS successor, Reiser4, fixes this problem.

Earlier issues[edit]

ReiserFS in versions of the Linux kernel before 2.4.16 were considered unstable by Namesys and not recommended for production use, especially in conjunction with NFS.[29]

Early implementations of ReiserFS (prior to that in Linux 2.6.2) were also susceptible to out-of-order write hazards. But the current journaling implementation in ReiserFS is now on par with that of ext3's "ordered" journaling level.

See also[edit]

References[edit]

  1. ^ Reiser FS node layout, Namesys, archived from the original on 2006-06-14
  2. ^ a b c "Reiser FS Specifications", FAQ, Namesys, archived from the original on 2006-07-05
  3. ^ Shankland, Stephen (2006-10-16). "Novell makes file storage software shift". Business Tech. cnet..
  4. ^ Shankland, Stephen (January 16, 2008). "Namesys vanishes, but Reiser project lives on". CNet. Archived from the original on March 27, 2016. Retrieved 2008-01-26.
  5. ^ ""Fossies" - the Fresh Open Source Software Archive". July 25, 2017. Retrieved 2019-07-25.
  6. ^ Reiser, Hans. "Reiser4 is released!". Archived from the original on 2007-10-24. Retrieved 2006-07-15.
  7. ^ Mingming Cao, Theodore Y. Ts'o, Badari Pulavarty, Suparna Bhattacharya (2005-07-26). "State of the Art: Where we are with the Ext3 file system". 2005 Linux Symposium. Ottawa, Canada: IBM Linux Technology Center. Retrieved 2007-03-08.CS1 maint: multiple names: authors list (link)
  8. ^ http://freshmeat.sourceforge.net/projects/reiserfs/releases?page=2
  9. ^ https://web.archive.org/web/20121019000728/http://homes.cerias.purdue.edu:80/~florian/reiser/reiserfs.php
  10. ^ https://www.diskinternals.com/glossary/reiserfs/
  11. ^ https://github.com/torvalds/linux/tree/master/fs/reiserfs
  12. ^ https://kernelnewbies.org/Linux_2_6_8
  13. ^ https://fossies.org/linux/reiserfsprogs/ChangeLog
  14. ^ "PHP Manual". php.net. The PHP Group. Retrieved 5 December 2018.
  15. ^ "kill-the-BKL". git.kernel.org.
  16. ^ a b c Jeff Mahoney (2006-09-14). "Proposal: Change in default fs for releases >= 10.2". gmane.org. Retrieved 2009-08-23..
  17. ^ discussion thread stored at gmane.org
  18. ^ "Archive:SuSE Linux 6.4". openSUSE wiki. Retrieved 2017-06-28.
  19. ^ "SUSE LINUX 9.1 Administration Guide: Major File Systems in Linux". Novell. Retrieved 2017-06-28.
  20. ^ Shankland, Stephen (16 October 2006). "Novell makes file storage software shift". CNET.
  21. ^ Sharma, Mayank (12 October 2006). "Novell will switch from ReiserFS to ext3". Linux.com.
  22. ^ a b comment by Jeff Mahoney (2006-10-04). "SUSE 10.2 Ditching ReiserFS as its' default FS? (comment 29)". linux.wordpress.com / archive.org. Archived from the original on 2006-11-09. Retrieved 2009-08-23.
  23. ^ CBS 5 / AP / BCN (2006-09-14). "Oakland Police Search Home Of Missing Woman's Ex". cbs5.com / archive.org. Archived from the original on 2006-11-06. Retrieved 2009-08-23.
  24. ^ Daniel Robbins (2001), "Advanced file system implementor's guide". Retrieved 5. July 2006
  25. ^ Matthias Andree (2001), LKML post on Postfix synchronity assumptions. Retrieved 15. July 2006
  26. ^ NEOHAPSIS - Peace of Mind Through Integrity and Insight
  27. ^ Hans Reiser, Reiser4 design, repacker Archived 2007-10-24 at the Wayback Machine. Retrieved 5. July 2006
  28. ^ Theodore Ts'o LKML post. Retrieved 5. July 2006
  29. ^ ReiserFS download page, see warning. Retrieved 5. July 2006

External links[edit]