Shared library
A shared library or shared object is a file that is intended to be shared by executable files and further shared object files. Modules used by a program are loaded from individual shared objects into memory at load time or runtime, rather than being copied by a linker when it creates a single monolithic executable file for the program.
Shared libraries can be statically linked during compile-time, meaning that references to the library modules are resolved and the modules are allocated memory when the executable file is created. But often linking of shared libraries is postponed until they are loaded.
Most modern operating systems[NB 1] can have shared library files of the same format as the executable files. This offers two main advantages: first, it requires making only one loader for both of them, rather than two (having the single loader is considered well worth its added complexity). Secondly, it allows the executables also to be used as shared libraries, if they have a symbol table. Typical combined executable and shared library formats are ELF and Mach-O (both in Unix) and PE (Windows).
In some older environments such as 16-bit Windows or MPE for the HP 3000, only stack-based data (local) was allowed in shared-library code, or other significant restrictions were placed on shared-library code.
Memory sharing
Library code may be shared in memory by multiple processes, and on disk. If virtual memory is used, processes would execute the same physical page of RAM that is mapped into the different address spaces of the processes. This has advantages. For instance, on the OpenStep system, applications were often only a few hundred kilobytes in size and loaded quickly; most of their code was located in libraries that had already been loaded for other purposes by the operating system.
Programs can accomplish RAM sharing by using position-independent code, as in Unix, which leads to a complex but flexible architecture, or by using common virtual addresses, as in Windows and OS/2. These systems ensure, by various means, like pre-mapping the address space and reserving slots for each shared library, that code has a high probability of being shared. A third alternative is single-level store, as used by the IBM System/38 and its successors. This allows position-dependent code, but places no significant restrictions on where code can be placed or how it can be shared.
In some cases, different versions of shared libraries can cause problems, especially when libraries of different versions have the same file name, and different applications installed on a system each require a specific version. Such a scenario is known as DLL hell, named after the Windows and OS/2 DLL file. Most modern operating systems after 2001 have clean-up methods to eliminate such situations or use application-specific "private" libraries.[1]
Dynamic linking
Dynamic linking or late binding is linking performed while a program is being loaded (load time) or executed (runtime), rather than when the executable file is created. A dynamically linked library (dynamic-link library, or DLL, under Windows and OS/2; shareable image under OpenVMS;[2] dynamic shared object, or DSO, under Unix-like systems) is a library intended for dynamic linking. Only a minimal amount of work is done by the linker when the executable file is created; it only records what library routines the program needs and the index names or numbers of the routines in the library. The majority of the work of linking is done at the time the application is loaded (load time) or during execution (runtime). Usually, the necessary linking program, called a "dynamic linker" or "linking loader", is actually part of the underlying operating system. (However, it is possible, and not exceedingly difficult, to write a program that uses dynamic linking and includes its own dynamic linker, even for an operating system that itself provides no support for dynamic linking.)
Programmers originally developed dynamic linking in the Multics operating system, starting in 1964, and the MTS (Michigan Terminal System), built in the late 1960s.[3]
Optimizations
Since shared libraries on most systems do not change often, systems can compute a likely load address for each shared library on the system before it is needed and store that information in the libraries and executables. If every shared library that is loaded has undergone this process, then each will load at its predetermined address, which speeds up the process of dynamic linking. This optimization is known as prebinding or prelinking on macOS and Linux, respectively. IBM z/VM uses a similar technique, called "Discontinuous Saved Segments" (DCSS).[4] Disadvantages of this technique include the time required to precompute these addresses every time the shared libraries change, the inability to use address space layout randomization, and the requirement of sufficient virtual address space for use (a problem that will be alleviated by the adoption of 64-bit architectures, at least for the time being).
Locating libraries at runtime
Loaders for shared libraries vary widely in functionality. Some depend on the executable storing explicit paths to the libraries. Any change to the library naming or layout of the file system will cause these systems to fail. More commonly, only the name of the library (and not the path) is stored in the executable, with the operating system supplying a method to find the library on disk, based on some algorithm.
If a shared library that an executable depends on is deleted, moved, or renamed, or if an incompatible version of the library is copied to a place that is earlier in the search, the executable would fail to load. This is called dependency hell, existing on many platforms. The (infamous) Windows variant is commonly known as DLL hell. This problem cannot occur if each version of each library is uniquely identified and each program references libraries only by their full unique identifiers. The "DLL hell" problems with earlier Windows versions arose from using only the names of libraries, which were not guaranteed to be unique, to resolve dynamic links in programs. (To avoid "DLL hell", later versions of Windows rely largely on options for programs to install private DLLs—essentially a partial retreat from the use of shared libraries—along with mechanisms to prevent replacement of shared system DLLs with earlier versions of them.)
Microsoft Windows
Microsoft Windows checks the registry to determine the proper place to load DLLs that implement COM objects, but for other DLLs it will check the directories in a defined order. First, Windows checks the directory where it loaded the program (private DLL[1]); any directories set by calling the SetDllDirectory()
function; the System32, System, and Windows directories; then the current working directory; and finally the directories specified by the PATH environment variable.[5] Applications written for the .NET Framework (since 2002), also check the Global Assembly Cache as the primary store of shared dll files to remove the issue of DLL hell.
OpenStep
OpenStep used a more flexible system, collecting a list of libraries from a number of known locations (similar to the PATH concept) when the system first starts. Moving libraries around causes no problems at all, although users incur a time cost when first starting the system.
Unix-like systems
Most Unix-like systems have a "search path" specifying file-system directories in which to look for dynamic libraries. Some systems specify the default path in a configuration file, others hard-code it into the dynamic loader. Some executable file formats can specify additional directories in which to search for libraries for a particular program. This can usually be overridden with an environment variable, although it is disabled for setuid and setgid programs, so that a user can't force such a program to run arbitrary code with root permissions. Developers of libraries are encouraged to place their dynamic libraries in places in the default search path. On the downside, this can make installation of new libraries problematic, and these "known" locations quickly become home to an increasing number of library files, making management more complex.
Dynamic loading
Dynamic loading, a subset of dynamic linking, involves a dynamically linked library loading and unloading at runtime on request. Such a request may be made implicitly or explicitly. Implicit requests are made when a compiler or static linker adds library references that include file paths or simply file names. Explicit requests are made when applications make direct calls to an operating system's API.
Most operating systems that support dynamically linked libraries also support dynamically loading such libraries via a run-time linker API. For instance, Microsoft Windows uses the API functions LoadLibrary
, LoadLibraryEx
, FreeLibrary
and GetProcAddress
with Microsoft Dynamic Link Libraries; POSIX-based systems, including most UNIX and UNIX-like systems, use dlopen
, dlclose
and dlsym
. Some development systems automate this process.
Notes
- Some older systems, e.g., Burroughs MCP, Multics, also have only a single format for executable files, regardless of whether they are shared.
Further reading
- How To Write Shared Libraries by Ulrich Drepper (with much background info)
References
- Anderson, Rick (2000-01-11). "The End of DLL Hell". microsoft.com. Archived from the original on 2001-06-05. Retrieved 2012-01-15.
Private DLLs are DLLs that are installed with a specific application and used only by that application.
- "VSI OpenVMS Linker Utility Manual" (PDF). VSI. August 2019. Retrieved 2021-01-31.
- "A History of MTS". Information Technology Digest. 5 (5).
- IBM Corporation (2011). Saved Segments Planning and Administration (PDF). Retrieved Jan 29, 2022.
- "Dynamic-Link Library Search Order". Microsoft Developer Network Library. Microsoft. 2012-03-06. Archived from the original on 9 May 2012. Retrieved 2012-05-20.