Skip to main content
Glossary Term

File format

File Format Specifications and Patents - File formats often have published specifications describing encoding methods - Specifications enable testing of program functionality - Some developers view their specifications as trade secrets - Reverse engineering or acquiring specification documents are common methods to utilize file formats - File formats with publicly available specifications are more widely supported - Patent law is often used to protect file formats - Some formats use patented algorithms - GIF file format required patented compression algorithm prior to 2004 - GIF patent expired in the US in mid-2003 and worldwide in mid-2004 - Patent expiration led to the development of alternative formats like PNG Identifying File Type - Different operating systems use different approaches to determine file format - Multiple approaches are often needed to read foreign file formats - Filename extension is a popular method used by many operating systems - Filename extension is the portion of the filename after the final period - Limited number of three-letter extensions can cause confusion Internal Metadata and File Headers - Another way to identify file format is using information stored inside the file itself - Information can be specifically meant for identification or binary strings in specific locations - Internal metadata provides reliable identification of file format - This method is used in addition to other approaches by modern operating systems - Internal metadata helps in reading and working with foreign file formats - File headers contain metadata about the file and its contents. - They are usually stored at the start of the file, but can be present in other areas too. - Text-based file headers are human-readable and can be examined easily. - Binary formats usually have binary headers. - File headers may be used by an operating system to quickly gather information about a file. - File headers can store information about image format, size, resolution, and color space. - They can also contain authoring information such as the creator, date, and camera settings. - Metadata in file headers is used by software during the loading process and afterwards. - Text-based file headers are larger in size but can be easily examined using simple software. - Binary headers may require complex interpretation and can result in corrupt metadata. Magic Numbers and Shebang Lines - Magic numbers are identifiers stored inside the file itself for file type recognition. - Any unique feature of a file format can be used as a magic number. - Magic numbers offer better guarantees for identifying the file format correctly. - They can determine more precise information about the file. - Magic numbers can be inefficient for displaying large lists of files. - Shebang lines in script files are a special case of magic numbers. - Shebang lines identify a specific command interpreter and its options. - The magic number in shebang lines is human-readable text. - Shebang lines are used to execute scripts with the correct interpreter. - They are commonly used in Unix-based systems. External Metadata and Symlinks - External metadata stores information about the file format in the file system. - This approach keeps metadata separate from the main data and the file name. - External metadata is less portable compared to filename extensions or magic numbers. - Zip files or archive files solve the problem of handling metadata. - Zip files collect multiple files together with metadata in a compressed and encrypted format. - Symlinks are symbolic links that point to another file or directory. - They provide a way to create shortcuts or references to other files or directories. - Symlinks can be used to create aliases or alternative names for files or directories. - They are commonly used in Unix-like operating systems. - Symlinks can be created using the ln command.