Unix File Types: Going Beyond Regular Files And Directories

Definition of a Unix File

In Unix-style operating systems, a file is an abstract data object that consists of a possibly named set of unstructured bytes or characters. Files are accessed and manipulated using input/output operations through system calls provided by the operating system kernel. From a user perspective, files manifest as a hierarchical directory structure residing in a file system. Internally, the kernel uses various physical storage media like hard drives, SSDs, etc to store file contents persistently.

Regular Files and Directories

Regular files and directories make up the bulk of the hierarchical file system visible to users. Regular files contain arbitrary data – text, images, video, executables etc. Files are created, opened, read from, written to, seeked and closed by making the appropriate system calls. Directories act as containers for files and other directories, mapping textual names to underlying objects stored in the file system. The ls command lists directory contents.

Block Special Files

Block special files or block device files refer to devices that transfer data in chunks or ‘blocks’ – like hard drives, flash drives etc. They provide buffered access to hardware devices, where data is read/written in blocks via system calls like read()/write(). Common block devices include hard disk drives (HDDs), solid-state drives (SSDs), floppy drives, CD/DVD drives etc.

Character Special Files

Character special files or character device files refer to devices that work with data streams or bytes. They provide unbuffered access in a character-by-character fashion to the underlying hardware device via calls like fgetc()/fputc(). Common character devices include serial ports, parallel ports, sound cards, modems, mice etc. Many video and networking devices may also use character device files.

Named Pipes

Named pipes allow two or more unrelated processes to communicate in a first-in-first-out (FIFO) manner. One process writes to the pipe, while the other reads from it. Named pipes work by opening the pipe file for reading or writing, allowing data and file descriptors to be passed between the processes. Pipes can facilitate communication between parent and child processes or unrelated processes.

Sockets

Sockets allow inter-process networking communication between processes on the same or different machines. Processes can send and receive data streams by reading/writing to socket files after establishing a socket connection. Sockets constitute the fundamental underpinnings of transport protocols like TCP and UDP. Domain sockets communicate locally in the same machine, while internet sockets involve network data transfers.

Symbolic Links

Symbolic links or symlinks act as shortcuts or aliases to other files and directories. The symlink file contains a text pointer to the target pathname. Reading from or writing to a symlink manipulates the target. Symlinks allow easy access to files via multiple virtual paths without having to store multiple copies. They operate transparently, with programs operating on them as if operating on the target file.

Device Files

Device files refer collectively to block, character and other device-related special files that provide access points for various hardware devices connected to the system. These may include disk drives, printers, graphics cards, network adapters, sensors etc. Drivers interact with hardware devices and expose virtual file representations that applications can easily work with using regular file operations.

Door Files

Door files or doors act as portals or interfaces to functions implemented outside the kernel, often for performance reasons. Opening or accessing a door file triggers calls to external door server processes that carry out the intended functions like database operations, graphics rendering etc. Doors avoid making large volumes of system calls for frequent tasks.

Event Poll Files

Event Poll or epoll files allow efficient multiplexing and monitoring of I/O events on multiple target file descriptors. Instead of having processes block while waiting for I/O, epoll notifies processes when requested I/O events occur on watched file descriptors. This allows asynchronous and scalable I/O handling against large numbers of connections.

Multiplexed Files

Multiplexed files map to underlying master files but gain unique file descriptors to allow concurrent access where exclusive access would otherwise conflict. Reads and writes go to unique offsets, preventing collisions between file descriptors of multiplexed files targeting the same lower-level entity. They logically represent multiple virtual instances of one actual file.

Crypt Files

Crypt files or encrypted files store data securely in an encrypted form to prevent unauthorized access. Cryptographic algorithms are used along with passphrases and cryptographic keys to encrypt and decrypt file contents. Operations directed at a crypt file automatically encrypt or decrypt data while writing to or reading from the underlying physical file respectively.

Understanding Major Differences Between File Types

While regular files simply contain streams of bytes, many other Unix file types act as special interfaces used for accessing devices, communicating with processes, transferring data or mapping to system resources. Each file type has specific semantics governing its access, manipulation and purpose. For instance, character devices transfer data streams whereas block files work with buffered chunks. Pipes facilitate inter-process communication and sockets enable networking between processes. Symlinks act as aliases for other files. Understanding these differences allows proper usage as per application needs.

Identifying File Types From ls Output

The ls -l command displays file type information along with permissions, ownership and other metadata. The first character distinctly identifies special file types:

– regular file
d directory
b block special file
c character special file
l symbolic link file
s socket file
p named pipe

Knowing the coding scheme helps recognize file types at a glance during day-to-day system usage.

Working With Different File Types

While regular files are used for storing data, other file types require understanding their semantics and purpose. Attempting to directly read from hardware devices won’t work. Specific system calls suited to the file type need to be made such as ioctl() or fstat() for working with device files. Sockets require initializing network communication via the socket API before sending/receiving data. Standard I/O functions work with pipes and sockets. Door files act as interfaces to servers implementing application functionality. Epoll taps events on watched file descriptors. Usage conventions differ across file types.

Example Code Snippets For Creating Special Files

Here are some sample code snippets for programmatically creating certain special file types from a C program for reference:

// Block device file 
#include 
int main() {
  mknod("foo", S_IFBLK | 0000, 2048);  
}

// Character device file
#include 
int main() {
  mknod("bar", S_IFCHR | 0000, 4);
}

// Symbolic link  
#include 
int main() {
  symlink("./target", "./alias");
}  

// Named pipe
#include 
#include 
int main() {
  mkfifo("my_pipe", 0644);
} 

// Socket file
#include 
int main() {
  int sockfd = socket(AF_INET, SOCK_STREAM, 0);  
}

Using mknod to Make Device Files

The mknod system call creates special block or character device files. It requires file permissions, the file type constant – S_IFBLK/S_IFCHR and the major/minor device ids as arguments.

// Make /foo as a block device 
mknod("/foo", S_IFBLK | 644, 0x300);   

// Make /bar as a character device
mknod("/bar", S_IFCHR | 600, 100);

Major ids determine the associated device driver while minor ids specify an instance or channel. The kernel maps these ids to handle data transfer for the target hardware device like a disk or terminal.

Named Pipe Creation and Usage

Named pipes can be created using the mkfifo command or by calling mkfifo() from a program. Pipe files have type ‘p’ visible via ls -l. Data written to one end flows through the pipe and can be read at the other end on a first-come-first-served basis. Pipes only buffer up to a system-defined size, so reads and writes should execute synchronously.

// Create mypipe 
$ mkfifo mypipe   

// Reader process
$ cat mypipe  

// Writer process  
$ echo "Text" > mypipe

Pipes allow inter-process signaling and data transfer between related descendant processes or fully independent unaware processes.

Making Sockets for IPC

Sockets are created using the socket system call that returns a file descriptor referring to the socket file. Communication requires binding the socket to a port and establishing connections. Data transfer can then happen via reads and writes. Socket types like SOCK_STREAM, SOCK_DGRAM determine communication characteristics.

#include 
int main() {

  // Create socket 
  int sockfd = socket(AF_INET, SOCK_STREAM, 0);  
  
  // Connect/communicate
  
  // Close socket
  close(sockfd); 
}

Sockets allow reliable, low-overhead IPC ideal for distributed applications and network programming.

Creating and Resolving Symlinks

Symlinks are created using the ln -s command. The readlink call returns the symlink text pointer while the realpath call pursues the symlink chains to get the final resolved pathname.

  
$ ln -s target shortcut

$ readlink shortcut
target

$ realpath shortcut
/home/.../target

Symlinks offer handy aliasing while encapsulating the actual endpoint locations.