Open Software Foundation Research Institute(1)
The design of the MK++ microkernel pushes the envelope of software
engineering for system software design. Its innovative combination of
object-orientation and strict dependency layering provides not only a high
assurance base, but one that preserves the performance of its traditionally
designed predecessors. Advanced system properties, such as bounded (real-time)
execution time, are considered in a way whose assurance can be asserted. The
MK++ approach to software engineering and system structuring is presented, as
well as the results of the first phase of the MK++ program.
Over the past two years, the Research Institute of the Open Software
Foundation has produced a new microkernel [OSFES], based on the concepts of
Mach, with the long term objective of supporting high performance, secure,
scalable, real-time computing. This microkernel, the MK++ microkernel, meets the
goal of attaining high performance while providing high assurance through
innovative combinations of object orientation and strict dependency
The MK++ program is a response to the fundamental operating system problem. System software suffers from the following two realities:
From these realities follow the goals of the MK++ program --
This paper will discuss the nature of the MK++ microkernel -- how it relates
to its predecessors, how it was engineered, how it contributes to advances in
providing assurance, how it is structured and what its resulting performance
The MK++ project was an outgrowth of OSF's MK system efforts [OSFMK] (which themselves were an outgrowth of CMU's Mach work [CMUMach]). Various shortcomings of the MK code base contributed to the decision to write a new microkernel from scratch:
Although the major focus of the project, as well as of this paper, concerns
the engineering of MK++ with the intent of addressing the system
structuring/assurance issue, the MK++ project also addresses the complexity
To be an effective microkernel, MK++ has to be an essential microkernel -- a microkernel that implements only essential features. That is, the interfaces, features and mechanisms it exports need to be basic, direct, predictable and actually needed for the higher level system servers to do their job.
MK++ does provide all the necessary microkernel services: IPC, VM, tasks and threads, clocks and devices. The services have been shown to be sufficient in that they support all of the functionality of the higher level system servers, allowing the overall system to be self-hosting(2), (3). The essential character is in the details --
The last item pertains to a key element of the MK++ project and its future:
the development of compile, build and run-time composable services.
If it were necessary to pick a single area in which the significance of the
MK++ project appears it would be the area of software engineering. Of course,
the traditional issues considered as defining software engineering apply --
project management, configuration management, documentation, testing, etc. --
but the significant advance presented by MK++ is in the architecting of system
software -- the combining of object-orientation and strict dependency layering
in innovative ways.
MK++'s predecessor, MK5, as well as much of modern OS architecture research (i.e., micro-kernel and server based), concerns extra-kernel components and their relationships (messages, domains, objects...). Release 1 of MK++, discussed here, concerns the intra-kernel components and their relationships (layering, objects...). Future work concerns with the relationship of intra- and extra- kernel components ("colocation" and beyond).
The virtues of object-orientation are so well known that they need no repeating here. The benefits of object orientation extend naturally into the area of system software --
Surprisingly, the use of objects inside operating systems has received little
attention. OS use typically involves kernel support for user objects, such as in
[Eden] and [Argus]. [Chorus] and [Spring] are composed of objects internally
(C++) but internal object orientation is not exposed nor exploited. Object
orientation is extensively used in [Choices] to demonstrate polymorphism in
aspects of system software in sample system structures, but not to illustrate
the practical construction of a high performance system.
Various aspects of how MK++ uses objects to encapsulate aspects of the system are discussed in Section 5. The emphasis is how MK++ integrates object orientation with dependency ordering (layering) --
Layering, the forgotten stepchild of software engineering, concerns
understanding and ordering dependencies. A dependency can take many forms -- "is
called by", "waits for", "sets value", "sustains memory", "provides processing",
Layering concerns identifying distinct "layers" where a single layer can have arbitrary dependency relationships between its components, but that layer may only have dependencies on layers "lower than" it, and not on layers "higher than" it. A layer encapsulates dependencies such that a dependency graph formed out of the layers has no circularities.
The definition of a layer expands the definition of the interfaces exposed by that layer via dependency constraints. Layers simplify reasoning about complex interactions (for example, decoupling thread management from thread scheduling) by providing an ordering upon the analysis of the interactions.
The existence of layers naturally orders activities within the architecture. For example, a lock hierarchy naturally follows from the layering hierarchy. Also, system initialization by layer order simplifies and structures initialization.
Layering is common in high assurance systems. First suggested by Dykstra [The], it was subsequently applied seriously to OS in [Multics] Guardian and SRI's [PSOS]. The use of layering was considered so fundamental to the structuring and analysis of high assurance software that it became an "Orange Book" requirement for security evaluation class B3 [TCSEC].
Unfortunately, layering is generally ignored in most OS work. It is perceived as being applicable to functional decomposition but not to object decomposition. It is also perceived as overly constraining design, producing bad performance as a result. Much of these perceptions came from previous layered system efforts that defined layers in terms of the functionality that became available at that layer, with strict notions about the nature of the functionality and how functionality was related to other functionality.
It is possible, though, and highly desirable, to apply the notions of layering to an object oriented structure.
Layering and object orientation complement each other --
Figure 1 illustrates the various elements involved in the combination of
layering and object orientation, as realized in MK++. A rectangle illustrates an
object, encapsulating data and methods (functions). Within a layer (the region
between two gray lines), the objects may be mutually dependent. An object within
a layer may only be dependent on objects in lower layers. The figure shows two
types of dependencies: a direct downward dependency (perhaps indicating object
invocation), and downward inheritance, where an object effectively spans layers,
but where both layering and object-orientation contribute to encapsulate and
order dependencies between the components of the overall object. Combining these
elements provides the most flexible, most structured approach to decomposition
of a complex concurrent system.
Although layering and object orientation can be viewed as orthogonal views of
decomposition, there are various subtleties to their combination, especially in
regard to parameterized types (whose layering is affected by the layers of the
types which are parameters) and polymorphic (virtual) methods (whose layering is
affected both by the layering of the abstract definition and their concrete
The following examples show how combining layering and object orientation provide the basis for decomposing the complex entities and relationships that appear in system software.
Figure 2 shows the decomposition of a complex system entity(4).
In the figure, there is a system object visible to users. That object has
several sets of state:
These distinct sets of state, along with the mechanisms and concurrency rules that govern them, are encapsulated in distinct layers. The use of inheritance allows the composition of complex abstractions into a single object that spans layers. Note that the unidirectional nature of inheritance corresponds to the unidirectional dependency order of strict layering.
Figure 3 shows the decomposition of a complex system relationship. In the
figure, there is a many to many relationship between the two sets of objects
(the A's and the B's). There is state associated with the relationship, so the
relationship is encapsulated by an object. Maintaining the relationship between
the two related objects is only part of the purpose of the objects so related --
maintaining the relationship involves the related objects as well as the
relationship object. Although the relationship between an A and a B is one
relationship, there are aspects of the relationship that are more closely
aligned with the A or the B -- in particular, each side of the relationship
(since it concerns the linkage to an A or a B) has its own data, methods and
concurrency mechanisms. By decomposing the relationship into two parts,
separated by layering but joined with inheritance, the relationship is
decomposed into two, smaller encapsulated relationships that layer with the
objects with which they are most connected.
The MK++ kernel is designed to meet the assurance requirements of the [TCSEC]
B3 security evaluation class. Relative to the architecture of the system, the
criteria specify that a significant use of layering, abstraction and data hiding
is required. What is not so clearly specified is the nature of how those layers,
abstractions and data are to be described. This general topic of interface
specification is a major concern in software engineering. After years of debate,
one thing is clear -- what is needed is a better definition of an
Most work in object-oriented interface design (type theory) concerns what might be called "interface syntax" -- type correctness. While interface syntax is of fundamental importance to object-oriented work in general, and MK++ in particular, the significant question to be answered is how to specify "interface semantics" -- constraint correctness. Answering this question gets to the heart of a problem very familiar to software engineers -- why it is that a perfectly functioning piece of software no longer works if run in any configuration other than the one in which it is developed. The reason is that there were dependencies and constraints implicit in the interfaces that are no longer being satisfied. The following two examples illustrate problems with unspecified interface properties.
In Figure 4, thread 1 invoked object A in such a way that object A is still locked upon return (perhaps A was locked on entry). Thread 2 subsequently invokes object C, expecting unrestricted access to it, not knowing that object C internally invokes object A. Although the relationship between C and A may well be private to the subsystem, the concurrency relationship is apparently not.
In Figure 5, thread 1 once again invokes object A, leaving it locked. Thread
1 then invokes object B, expecting unrestricted access, but object B unknowingly
invokes A and deadlocks.
There is research being conducted in the area of semantic expression in
interfaces, such as the Architectural Connector Model [CMUFC]. Such approaches
talk about properties of interfaces and do provide some means for proving
correctness of interface matching, but they do not address the pragmatics of
interface design and implementation nor their evolution.
In the object community, there is work such as the adaptive polymorphism propagation patterns of [Demeter] which provides an approach, but lacks the framework and discipline of layered design.
The term that describes the complete expression of "interface" and what is implied by that interface is contract. The term contract, though, is currently overused and misunderstood. For example, in [Spring], subcontracts do include issues such as data transmission and caching, but concurrency requirements and other dependency issues continue to be avoided. A complete contract includes the following elements:
The cost of maintaining such a contract could well be very expensive.
Clearly, one would want to minimize the size of the contract, which is done by
careful encapsulation of interface issues. Still, some interface properties will
be exposed, and this exposure is compounded by the interrelationships between
interfaces. As shown by the resource conflict examples above, some properties of
one interface to a subsystem are affected by the use of other interfaces (state
invariants, concurrency, serializing resource access (bounding service time),
information flow). This observation, along with the pragmatics of describing the
contract for an evolving set of interfaces (where some interfaces may simply be
variants or optimizations of others), suggests that the way to think about an
interface to a subsystem is not simply in terms of the set of descriptions of
each of the interfaces, but in terms of abstract behavior-property groups
exported by the collective interface.
Figure 6 diagrams the relationship between the top level system specification, the designs of the subsystems and the object interfaces in terms of abstract behavior-property groups.
From the point of view of the top level system specification, abstract
behavior-property groups indicate where in the system architecture the
properties of the abstract state described in the specification are enforced and
transitions are made between states (refer to [OSFFM] for a formal view of
this). For example, given a high level assertion such as "a thread is bound to
one and only one task", a group indicates the elements of the design that make
this true and what elements of the design are allowed to know the binding. From
the point of view of an object's interface, abstract behavior-property groups
relate the interface with the properties that affect it and that it affects, and
related the interface to the containing subsystem's design and place in the
overall system architecture. For example, a group indicates what other elements
of the design interact with the concurrency mechanisms that affect a particular
mechanism. From the point of view of the design of a subsystem, abstract
behavior-property groups relate the subsystem to how it contributes to
satisfying the overall system specification, and how its design relates to the
actual objects contained in that subsystem. For example, a group may indicate
whether a subsystem can depend on a system assertion appearing true for it, or
whether the assertion may not always hold during critical regions within the
boundaries of the subsystem because the subsystem itself has a role in
maintaining the assertion.
Future work concerns the specification of abstract behavior-property groups for the MK++ subsystems and the pragmatics (process and tools) of maintaining them.
Figure 7 shows the internal layering of the MK++ kernel. The striping
indicates how the various subsystems and layers contribute to the service areas
provided by the kernel. The important thing to notice from this figure is how
multiple layers contribute to implement any given service area, and how the
layers forming a service area are sandwiched between those forming other service
The complete description and rationale for each subsystem and layer forms the High Level Design for MK++ [OSFHLD]. This section will only highlight some key encapsulation issues.
MK++ endeavors to properly schedule all activities; except for rare cases,
all processing performed in the kernel on behalf of some requestor is performed
by that requestor's thread and scheduled on its behalf. The Processor Scheduling
layer controls the execution context of each processor (owning the processor
transitions associated with interrupts, faults and the return from interrupts
and faults) and the scheduling state of each execution point. So as to be able
to properly schedule all activities, Processor Scheduling needs to be the lowest
layer in the system structure.
On the other hand, when a thread is performing a kernel operation, the processing starts at the top of the kernel and eventually returns there. A thread cannot be terminated while it is processing in the kernel, so termination (as well as other higher level thread management primitives) involve layers much higher than Processor Scheduling (namely, all the way through Resource Management Objects, which governs the top level kernel object resource relationships).
The need to separate these two closely related mechanisms -- thread scheduling and thread management -- was a great challenge, and one of MK++'s greatest accomplishments.
Since the high level thread management primitives need to concern themselves with the thread's entrance and exit into the kernel, the primitives key off that transition. The vast majority of the kernel's code has no involvement with, or concern with, thread management. (The exception concerns aborting threads out of interruptible waits associated with external user events.) Processor Scheduling has a minor, albeit key, role in thread management. When a thread management primitive is to be applied to a thread, Processor Scheduling is informed, not about the nature of the operation, simply that there is one. When the target thread next tries to return to user mode, the return-to-user-mode processor context switch logic (in Processor Scheduling) prevents the return, effectively making the thread appear as if it took a thread management trap, the handler for which knows the thread is at the kernel/user boundary, not involved in any kernel processing, and therefore in a position to be manipulated.(6)
This separation permits the scheduler to be a simple six state state machine, with no understanding of higher level thread management issues, nor any need to be modified as thread management evolves.
The MK++ microkernel provides a virtual memory environment for the tasks it
supports. Through the external memory management interfaces, server tasks can
participate in the paging mechanisms. The management of the actual physical
memory paging pool, though, is done by the kernel, in the Resident Memory
Although the management of the paging pool is the major physical memory management responsibility, Resident Memory Management provides additional services for use internal to the kernel to provide optimized usage of physical memory, such as for I/O buffers.
To minimize the cost of physical memory allocation and de-allocation operations, Resident Memory Management wants to export direct access to physical pages, without forcing a virtual memory mapping upon them. Simply giving out the physical addresses of "loose" pages, though, make the pages hard to account, and the mechanisms by which access through them is possible hard to encapsulate. Instead, Resident Memory Management exports page list objects that encapsulate these issues. It is a page list object that is used by the Device Services framework as an I/O buffer, by IPC as representing large message data, and by User Address Space to represent a wired-allocated region.
MK++ supports a message passing model. Even I/O operations involve the
sending of messages to indicate I/O operations. The need to provide interrupt
level delivery of messages is what motivates the layering of Transfer Management
(IPC message queueing) as a low layer, callable from interrupt
This layering presents a problem. When an I/O operation completes, it is possible that the port into which the completion message is to be sent no longer exists. Transfer Management is too low a layer to do anything about this; also, in interrupt context, it cannot acquire the necessary locks to locate the port. The solution to this problem is the creation of connection agent objects.
At the kernel interface, a task can locate a port for the sake of sending a message into it, receiving a message from it, or modifying the state of the port. Deep in the kernel, locating ports is not possible. Instead, when the original request for service came in (for example, when the I/O operation was originally requested), a connection agent was created to represent the future ability to send a message, the target port for the future message(s) located and the connection agent bound to the port. The connection agent represents a guarantee that a message can be sent in the future. If the port were to be destroyed before that time, the port's message queue will automatically re-route any attached connection agents to a special message queue used for messages to be discarded. When the connection agent is subsequently used, it blindly accepts a message and "sends" it, with no extra concern in interrupt context.
The messages, themselves, encapsulate certain special processing. Network input presents a problem in that a single request for input (a call requesting the routine of network input packets to a task) results in a potentially unbounded stream of messages. At interrupt level, the network driver cannot invoke the general purpose (blocking) kernel allocation routines, so the input packet message buffers must be supplied to it some other way. Message objects allocated for network input have overloaded C++ memory allocation operators so that, after they are sent and when they are received by the target task, they automatically return themselves to the network driver for use in sending future packet messages [OSFalloc].
Connection is the kernel's internal name for a port. Connection is an abstract base class for all message targets (user ports and kernel objects) which are invoked by sending messages to them. The mapping of capabilities to connections is done by Connection Management. This is another example of encapsulation -- only Connection Management knows when a connection dies; the rest of the kernel is shielded from the races involved by redirection mechanisms hidden in Connection Management.
All user visible kernel objects (tasks, threads, etc.) derive from the abstract base kernel object implemented in Connection Management. The complexities of kernel object management are encapsulated in the constructors and destructors of that class.
Although it is not the case that there is a base hardware management layer
that encapsulates all machine dependencies, machine dependencies are
non-the-less well encapsulated in their own classes. The nature of this
encapsulation takes several forms depending on the circumstances -- in some
cases a machine independent class forms the basis for a machine dependent class
which in turn forms the basis for a machine independent class.
The careful attention to encapsulation, coupled with sophisticated use of the language and compiler, has resulted in a dramatic reduction in machine dependent code and the barest use of assembly language. In particular, the infamous "locore.s" low-level assembly language module no longer exists. Also, machine dependencies in the address translation map module (aka, pmap) have been reduced to a few hundred lines.
The big area of machine dependency, of course, remains the device drivers. During its design, the device services were specifically layered so as to provide the standard services drivers expect and to interface to them in a standard manner (as related in the DDI/DKI specifications) so as to largely encapsulate the existing drivers. For the initial PC version of MK++, though, where passing a B3 security evaluation was a concern, additional driver and interface work was done to the freely available drivers.
The nature of the MK++ device services has proven to be of value during the development of the PC version -- the MK++ framework is so fussy and the assertions imbedded in it so strong that MK++ discovered countless driver bugs.
Generally speaking, one of the virtues of object technology is the ability to wrap boundaries around legacy code, such as device drivers. For the future of MK++, the balance between full compatibility for drivers versus enforcing strong boundaries is a question waiting the evaluation of modern driver frameworks.
Although the requirement for MK++ release 1 was to imitate the MK5 microkernel, the design did have its eye on the future needs for this technology.
Being of object oriented construction, much of the future advancement of MK++
will consist of replacing or improving objects. Improvements in algorithms and
data structures will certainly take this approach. As MK++ is ported to other
hardware architectures, new or improved abstractions will appear to encapsulate
issues and features not addressed in release 1.
More fundamental changes, such as the movement to an RPC basis, will, of course, have a more dramatic effect. It is these cases that will show the value of the layering done in release 1. Although it is conceivable that not only classes but whole subsystems could be replaced as the future unfolds, the highest level of decomposition, the layering, should stand. The system layering, the result of extensive analysis of the nature and purpose of the system, showed how to decompose and separate VM, IPC and the rest. Now that this decomposition and separation have been accomplished, the ability to wholesale change an aspect of the system without having to rethink the entire system structure becomes possible.
Since the structure of MK++ is radically different from its predecessors, and the approach taken towards its construction so strict, many were skeptical that the goals of performance parity with its predecessors could be achieved. Indeed, this was not an easy goal to achieve, but achieved it was.
The same set of concerns that led to an overall reduction in complexity also
led toward an overall focus on basic, direct mechanisms tailored for their
The kernel interface stresses a messaging (IPC) paradigm and, therefore, so does the design. The decomposition of the IPC service specifically produced subcomponents that efficiently performed a specific aspect of IPC messaging in an isolated way. These subcomponents were then combined in optimal fashions for the various uses of IPC: kernel invocation, exception handling, I/O operations, etc. The lowest level mechanisms were designed so as to be usable in interrupt context (device I/O completion processing, for example) and to minimize preemption latency. Message buffer handling was also optimized for various uses, by overloading C++ memory operators.
Strict dependency analysis, coupled with the interface atomicity analysis, revealed many synchronization simplifications. One way to think of this is that any given layer need only worry about itself; by design, how a layer does its job does not matter, and what higher layers do to it does not matter. Each layer then spends little time worrying about overall system state and more time worrying about getting its job done in a direct and straightforward way. In this way, the basic uses of system mechanisms (IPC, page fault handling, etc.) work in a main-line way. For the more complex uses of the mechanisms, the main-line mechanisms still work as is; the complex uses simply do a little more, making their performance only incrementally worse than the simple cases. The cost of mechanisms becomes related to how much work is requested.
Being a microkernel, management of memory becomes very important. MK++'s predecessor was known for its handling of virtual memory; MK++ added direct handling of physical memory as well. Management of memory for kernel data structures and I/O buffers was greatly simplified as well as being optimized. The logical movement of memory between system components, such as results from paging, was also optimized by direct use of the lower level physical memory management subsystem.
Finally, being of object oriented construction, complex optimizations could be realistically realized by encapsulating them within classes and methods.
Ultimately, it is the implementation that has to perform, and special work
was necessary to meet our goals. Fortunately, MK++ is implemented in C++, which
is ideally suited for system construction, providing extensive control over the
It was a requirement that we achieve our performance goals without sacrificing modularity. MK++ is so finely decomposed, and the exported mechanisms sufficiently complicated, that any system invocation involved the execution of dozens, if not hundreds, of C++ methods. Being C++, many of these methods could be compiled in-line, thereby eliminating the function call overhead.
A question involved the use of virtual functions. A C++ virtual function is used when an abstract interface is being invoked, for which multiple possible concrete implementations exist. A virtual function call is only slightly more expensive than a regular call, but the main performance loss results from the fact that the target method(s) are not compiled in-line. The points within key execution paths where abstraction is used then become important. One property of passing through a virtual function call is that the called routine has more knowledge about the type of objects involved than does the caller. The minimization of virtual function usage followed from careful analysis of the type information known at points within the execution paths. A typical kernel invocation spends its first part operating in a fashion that is independent of the target object; the second part is object dependent. Once the invocation passes through an abstract interface to the target object (which is in some lower layer), that object typically knows exactly what to do (and is bound, by the layering of the design, to deal only with classes of objects knowable by it -- typically objects of concrete type directly related to the target object). MK++ kernel invocations average around four virtual function calls.
Even with extensive in-lining, the compiler in use (gcc 2.6.0) did not produce optimal straight-line instruction streams. The key factor leading to inefficient code generation was the excessive checking for error conditions. That is, if some method determined an error and its caller (and its caller, and so on) needed to check to see if there was an error, the compiler did not optimize away all of the redundant checks. The general solution to this problem follows from the fact that it is a microkernel -- its interfaces are a collection of "sharp tools" that do not need to be kind when the caller makes a mistake. That is, although the kernel must check for caller errors, in cases where the kernel finds an error that is clearly a user programming error (such as specifying a kernel address for a user buffer), instead of taking pains to undo what was done so far and return an error, the kernel simply "patches" the error (by substituting null, for example), thereby allowing the error path to rejoin the straight-line path.
Additional code stream optimizations resulted from carefully ordering the "then" and "else" clauses for expected results, and moving error path logic from the mainstream. Although considerable low level work was necessary to achieve the performance goals, the goals were met while preserving the fine degree of modularity.
In spite of the efforts described above, the general case IPC path in MK++ is
7% slower than in MK7. However, if preemption is enabled in MK7 (as it is always
in MK++) then MK++ is 3% faster.
Specific cases of IPC, though, are faster in MK++. The IPC path for invoking kernel operations reaps the benefit of selecting optimized buffer management, resulting in a 16% savings. The server (syscall) invocation path also has optimized buffer handling, but also takes advantage of customized thread synchronization, again resulting in a 16% savings. These optimizations are simple to provide in MK++, because the strict layering allows such optimizations to be done in complete safety.
Most other system mechanisms (basic page fault handling, thread management, etc.) are in parity.
An interesting comparison is the task creation call. This operation is quite involved, and not specially optimized in either kernel. Yet, the nature of the operations to be performed are similar enough that the two kernels come out tied. Some underlying mechanisms in one kernel are slower than the other and some are faster. All in all, they average out.
As far as higher level system performance (server performance), the AIM results are comparable, given equivalent optimizations.
The key observation is that the use of object orientation and layering has not led to a lose of performance, given sufficient attention to detail. Sufficient experience has been gained with extracting performance from the wide array of services already implemented in MK++ to enable the future development of additional services with performance comparable to "traditional" development.
MK++ presents a modular coherent code base providing real-time, high trust
and support for future generations of hardware. Extensive use of advanced
software engineering practices (object orientation, C++, strict dependency
ordering, state of the art configuration management, a hierarchy of design
documentation, formal method based testing) is made.
MK++ represents the most structured approach to the decomposition of a complex concurrent system, combining object-orientation with strict dependency layering, while providing high performance.