2101 Procfs Implementation
Proofs is implemented as a dynamically loadable kernel module, /kernel/fs/ procfs, and is loaded automatically by the system at boot time, /proc is mounted during system startup by virtue of the defaulfproc entry in the/etc/vfstab file. The mount phase causes the invocation of the procfs prinit() (initialize) and prmount() file-system-specific functions, which initialize thevfs structure for procfs and create and initialize a vnode for the top-level directory file,/proc.
The kernel memory space for the/proc files is, for the most part, allocated dynamically, with an initial static allocation for the number of directory slots reguired to support the maximum number of processes the system is configured to support (see Section 2.5).
A kernel procdir (procfs directory) pointer is initialized as a pointer to an array oprocent (procfs directory entry) structures. The size of this array is derived from the v.v_proc variable established at boot time, representing the maximum number of processes the system can support. The entry in procdir maintains a pointer to the process structure and maintains a link to the next entry in the array. Thprocdir array is indexed through the pr_slot field in the process'spid structure. The procdir slot is allocated to the process from the array and initialized at process creation time (fork()) (see Figure 2.4).
The specific format of the procfs directory entries is described in the procfs kernel code. It is modeled after a typical on-disk file system: Each directory entry in the kernel is described with a directory name, offset into the directory, a length field, and an inode number. The inode number for a /proc file object is derived internally from the file object type and process PID. Note tha^roc directory entries are not cached in the directory name lookup cache (dnlc); by definition they are already in physical memory.
Because procfs is a file system, it is built on the virtual file system (VFS) and vnode framework. In Solaris, an instance of a file system is described by a vfs object, and the underlying files are each described by »node. Procfs builds thevfs and vnode structures, which are used to reference the file-system-specific functions for operations on the file systems (for example, mount, unmount), and file-system-specific functions on the /proc directories and file objects (for example, open, read, write).
Beyond the vfs and vnode structures, the procfs implementation defines two primary data structures that describe file objects in th^roc file system. The first, prnode, is the file-system-specific data linked to thevnode. Just as the kernel UFS implementation defines aninode as a file-system-specific structure that describes a UFS file, procfs defines a prnode to describe a procfs file. Every file in tlWproc directory has a vnode and prnode.
typedef struct prnode {
vnode_t *pr_next; /* list of all vnodes for process */
kmutex_t pr_mutex; /* locks pr_files and child pr_flags */
uint_t pr_hatid; /* hat layer id for page data files */
prcommon_t *pr_common; /* common data structure 7
prcommon_t *pr_pcommon; /* process common data structure */
vnode_t **pr_files; /* contained files array (directory) */ uint_t prjndex; /* position within parent */ vnode_t *pr_pidfile; /* substitute vnode for old /proc */ vnode_t *pr_realvp; /* real vnode, file in objected dirs */ proc_t *pr_owner; /* the process that created this node */ vnode_t *pr_vnode; /* pointer to vnode */ struct contract *pr_contract; /* contract pointer */ int pr_cttype; /* active template type */
See usr/src/uts/common/fs/proc/prdata.h
The second structure, prcommon, resides at the directory level for/proc direc-tory files. That is, the/proc/<pid> and /proc/<pid>/lwp/<lwpid> directories each link to a prcommon structure. The underlying nondirectory file objects within/proc/<pid> and /proc/<pid>/lwp/<lwpid> do not have an associatedprcommon structure. The reason is thatprcommon's function is the synchronization of access to the file objects associated with a process or an LWP within a process.
* Common file object to which all /proc vnodes for a specific process
* or Iwp refer. One for the process, one for each Iwp.
typedef struct prcommon {
kmutex_t prc_mutex; /* to wait for the proc/lwp to stop */ kcondvar_t prc_wait; /* to wait for the proc/lwp to stop */ ushort_t prc_flags; /* flags */
uint_t prc_writers; /* number of write opens of prnodes */ uint_t prc_selfopens; /* number of write opens by self */ pid_t prc_pid; /* process id */ model_t prc_datamodel; /* data model of the process */ proc_t *prc_proc; /* process being traced */ kthread_t *prc_thread; /* thread (Iwp) being traced */ int prc_slot; /* procdir slot number */
int prc_tslot; /* Iwpdir slot number,-1 if reaped */
int prc_refcnt; /* this structure's reference count */
struct pollhead prc_pollhead; /* list of all pollers */ } prcommon_t;
See usr/src/uts/common/fs/proc/prdata.h
The prcommon structure provides proofs clients with a common file abstraction of the underlying data files within a specific directory.
Structure linkage is maintained at the proc structure and LWP level, which reference their respective'proc file vnodes. Every process links to its primary /proc vnode (that is, the vnode that represents the/proc/<pid> file), and maintains an LWP directory list reference to the per-LWP /proc entries.
* An Iwp directory entry.
* If le_thread == NULL, this is an unreaped zombie Iwp.
typedef struct Iwpent {
kthread_t *le_thread; /* the active Iwp, NULL if zombie */
uint16_t le_waiters; /* total number of lwp_wait()ers */ uintl 6_t le_dwaiters; /* number that are daemons */ clock_t le_start; /* start time of this Iwp */ struct vnode *le_trace; /* pointer to/proc Iwp vnode */ } lwpent_t;
k Elements of the Iwp directory, p->p_lwpdir[], k We allocate Iwp directory entries separately from Iwp directory k elements because the Iwp directory must be allocated as an array. * The number of Iwps can grow quite large and we want to keep the k size of the kmem_alloc()d directory as small as possible.
c If ld_entry == NULL, the entry is free and is on the free list, c p->p_lwpfree, linked through ld_next. If ld_entry != NULL, the c entry is used and ld_next is the thread-id hash link pointer.
typedef struct Iwpdir {
struct Iwpdir *ld_next; struct Iwpent *ld_entry; } lwpdir_t;
/* hash chain or free list */ /* Iwp directory entry */
struct proc {
kthread_t *p_tlist; lwpdir_t *p_lwpdir; lwpdir_t *p_lwpfree; lwpdir_t **p_tidhash; uint_t p_lwpdir_sz; uint_t p_tidhash_sz;
/* circular list of threads */ /* thread (Iwp) directory */ /* pjwpdir free list 7 /* tid (Iwpid) lookup hash table */ /* number of p_lwpdir[] entries */ /* number of p_tidhash[] entries */
struct vnode *p_trace; struct vnode *p_plist;
/* pointer to primary /proc vnode */ /* list of/proc vnodes for process */
See usr/src/uts/common/sys/proc. h
Indexing of the process pjwpdir is based on the/proc directory entry slot for the target LWP. Thelwpent_t references the vnode for an LWP's /proc/<pid>/ lwp/<lwpid>/through the vnode and prnode_t, as illustrated inFigure 2.10.

- Figure 2.10. Proofs Structures
Figure 2.10 shows a single process with two LWPs that link to the underlying procfs objects. Each LWP in the process links to its procfs prnode tHRough thelwpent_t vnode path shown. The LWP'sprnode links back to the process'sprnode through the pr_pcommon pointer. The connection to the /proc directory slot is through the process'spid_t pr_slot link (not shown in Figure 2.10; see Figure 2.4). /proc/<pid>/lwp/<lwpid> slots are linked for each LWP in their respectiverc_tslot field.
When a reference is made to a procfs directory and underlying file object, the kernel dynamically creates the necessary structures to service a client request for file I/O. More succinctly, the procfs structures and links are created and torn down dynamically. They are not created when the process is created (aside from the procdir procfs directory entry and directory slot allocation). They appear to be always present because the files are available whenever an open(2) request is made or a lookup is done on a procfs directory or data file object. (It is something like the light in your refrigeratorit's always on when you look, but off when the door is closed).
The data made available through procfs is, of course, always present in the kernel proc structures and other data structures that, combined, form the complete process model in the Solaris kernel. By hiding the low-level details of the kernel process model and abstracting the interesting information and control channels in a relatively generic way, procfs provides a service to client programs interested in extracting bits of data about a process or somehow controlling the execution flow. The abstractions are created when requested and are maintained as long as necessary to support file access and manipulation requests for a particular file.
File I/O operations through procfs follow the conventional methods of first opening a file to obtain a file descriptor, then performing subsequent read/write operations, and closing the file when the operation is completed. The creation and initialization of the prnode and prcommon structures occur when the procfs-specificvnode operations are entered through thevnode switch table mechanism as a result of a client (application program) request. The actual procfs vnode operations have specific functions for the lookup and read operations on the directory and data files within the /proc directory.
The implementation in procfs of lookup and read requests through an array of function pointers that resolve to the procfs file-type-specific routine is accomplished through the use of a lookup table and corresponding lookup functions. The file type is maintained at two levels. At the vnode level, procfs files are defined asVPROC file types (v_type field in thevnode). The prnode includes a type field (pr_type) that defines the specific procfs file type being described by the pnode.
* Node types for/proc files (directories and files contained therein).
typedef enum prnodetype {
PR_STATUS, PR_LSTATUS, PR PSINFO,
/* /proc/<pid>/status /* /proc/<pid>/lstatus /* /proc/<pid>/psinfo
See usr/src/uts/common/fs/proc/prdata.h
The procfs file types correspond directly to the description of /proc files and directories that are listed at the beginning of this section (and in the proc(2) man page).
The vnode kernel layer is entered (vn_open()), and a series of lookups is performed to construct the full path name of the desireé^roc file. Macros in the vnode layer invoke file-system-specific operations. In this exampleyOP_LOOKUP() resolves to the procfs pr_lookup() function. pr_lookup() checks access permissions and vectors to the procfs function appropriate for the directory file type, for example, pr_lookup_piddir()to perform a lookup on a/proc/<pid> direc-tory. Each of thepr_lookup_xxx() directory lookup functions does some directory-type-specific work and calls prgetnode() to fetch the prnode.
prgetnode() creates the prnode for the/proc file and initializes several of theprnode and vnode fields. For/proc PID and LWPID directories (/proc/ <pid>, /proc/<pid>/lwp/<lwpid>), the prcommon structure is created, linked to the prnode, and partially initialized. Note that for/proc directory files, the vnode type is changed from VPROC (set initially) to VDIR, to correctly reflect the file type as a directory (it is a procfs directory, but a directory file nonetheless).
Once the path name is fully constructed, the VOP_OPEN() macro invokes the file-system-specificopen() function. The procfs propen() code does some additional prnode and vnode field initialization and file access testing for specific file types. Oncq3ropen() completes, control is returned to vn_open() and ultimately a file descriptor representing a procfs file is returned to the caller.
The reading of a procfs data file object is similar in flow to the open scenario, in which the execution of a read system call on a procfs file ultimately causes the code to enter the procfs prread() function. For each available file object (data structure), the procfs implementation defines a specific read function: pr_read_psinfo(), pr_read_pstatus(), pr_read_lwpsinfo(), etc. The specific function is entered from prread()tHRough an array of function pointers indexed by the file typethe same method employed for the previously described lookup operations.
The Solaris 10 implementation of procfs, in which both 32-bit and 64-bit binary executables can run on a 64-bit kernel, provides 32-bit versions of the data files available in the /proc hierarchy. For each data structure that describes the contents of áproc file object, a 32-bit eguivalent is available in a 64-bit Solaris kernel (for example, Iwpstatusand Iwpstatus32, psinfo and psinfo32). In addition to the 32-bit structure definitions, each of the pr_read_xxx() functions has a 32-bit eguivalent in the procfs kernel modulea function that deals specifically with the 32-bit data model of the calling program. Procfs users are not exposed to the multiple data model implementation in the 64-bit kernel. When prread() is entered, it checks the data model of the calling program and invokes the correct function as reguired by the data model of the caller. An exception to this is a read of the address space (/proc/<pid>/as) file; the caller must be the same data model. A 32-bit binary cannot read the as file of a 64-bit process. A 32-bit process can read theas file of another 32-bit process running on a 64-bit kernel.
The pr_read_xxx() functions essentially read the data from its original source in the kernel and write the data to the corresponding procfs data structure fields, thereby making the reguested data available to the caller. For example, pr_read_psinfo() reads data from the targeted process's proc structure, credentials structure, and address space $s) structure and writes it to the corresponding fields in the psinfo structure. Access to the kernel data reguired to satisfy the client reguests is synchronized with theproc structure's mutex lock, plock. This approach protects the per-process or LWP kernel data from being accessed by more than one client thread at a time.
Writes to procfs files are much less freguent. Aside from writing to the directories to create data files on command, writes are predominantly to the process or LWP control file (ctl) to issue control messages. Control messages (documented in proc(1)) include stop/start messages, signal tracing and control, fault management, execution control (for example, system call entry and exit stops), and address space monitoring.
Post a comment