Safety Program Architecture

When we consider how to safeguard computer actions, the difference between good ones and
wrong ones seems to be that the former are authorized by policy and so on, the bad ones
violate policy. This can be seen pretty commonly, and a large class of "bad" actions
happen because some agent (a person or a program: note it is not always the former) 
gets allowed to do something outside the space of one program running in some memory
and is not blocked or questioned where it should be. This is most of the time involved
with opening some dataset, whether it be a disk file of data, a file containing a program
somewhere, or sometimes a communications scheme that gets treated in a somewhat file like
way (a pipe, a socket...). Also, it is common that most objects on a computer are not
particularly sensitive. The little note that reminds someone to pick up a gallon of
milk on the way home, the personal scheduling notes, and so on, are usually of no
interest to thieves. However, the sensitive programs or files can be very sensitive
indeed. 
It is clear that when you think about where to protect such things, the further
the protection is away from the thing to be protected, the harder it is to be sure
that protection will work. It gets too easy to disguise the identity of sensitive
things...so limiting with a firewall is a very coarse grained protection. If on the
other hand access control and authorization are done at the level where the machine
itself defines the object, this control can be hard to evade and entirely clear about
what is being protected.
An initial approximation of this was thought to be useful, as a system which would protect
file access of all kinds, and whose protection could not be circumvented by simple use
of privileges. Since at the time Safety was devised, running programs also involved
opening files, doing extensive checking on file access could block unwanted actions
as well as unwanted accesses. 
It will be clear that the kind of access checking needs to be more than seeing whether
a user is statically permitted to open a file. Users change their motives, and computers
ascribe a user identity to whatever they run which is based usually on who is logged
in, even though the code actually running may not have been consciously started by
a user command. Thus a better auth system (authentication and/or authorization) needs
to be able to check as much evidence as can be had to discern whether access to some
object (initially object=file, but I argue here that it needs to be more widely
defined) is to be granted.
If you examine the functions the Safety program implements, you will note that the
hierarcical storage functions, the user mode undelete functions, and the storage
speed-up functions are not directly security related, but I will suggest that a
security system needs to consider what it can provide to assist users, in that
there is always a temptation to shut down the security system if attacking a system,
or in preparation to attack. Having the security system provide some other functions
which are seen as directly assisting users makes it unattractive to just turn it off.
The kinds of evidence Safety uses are items available in VMS, but the concept is
not limited to those. Also, later experiments have shown that automated scanners
can be made effective in detecting sensitive files with high probability, and
the results of such scans can be used to avoid the need for manually flagging
sensitive files or programs.
With systems like this, user identity is no longer precisely a static tag, but
allows exceptions based on things like his behavior. Choice of programs to use and
level of privilege are two such exceptions in Safety; recent history for this
username's activity can be another, depending on what can be made available of this.
The Safety program does assume that there is an underlying login capability which
will give a username that is well enough checked that it can be used, and it assumes
that the operating system is not utterly compromised. (The hope is that by blocking
unwanted program accesses, paths to corrupting the OS may be blocked.) Thus, 
"blue pill" or "red pill" type accesses or attacks based on changing the boot
path and inserting persistent code are presumed not present. Attacks based on
direct physical access to hardware are also not defended, nor can they be very
well by any code living within the OS. Thus it is seen Safety is not a complete
cure for security holes. It does however have many useful features.
The ability Safety has to block access to processes having too many privileges
will be interesting to some observers, and the ability to have Safety note that
someone is doing something unauthorized (e.g., trying to access a file with a
program not in the allowed set) and open a different file rather than the one
requested are a couple unusual capabilities that readers might note. The latter
ability allows one to set traps for evildoers rather easily. Also, when an
access violation is noted, Safety does not normally report this back to the user.
It will instead report a hardware error by default (giving an intruder the idea
that the underlying disk got a parity error and is possibly failing), so if
access is not redirected somewhere safe, it gets denied without alerting the
user that he has been caught. (A still more interesting scheme consisted of
causing the user process to hang persistently, so that its user memory could be
examined but not erased or altered. Safety could have been set up to allow such
action, but this was not put in initially, because a third party security designer
has to be able to convince customers that the product cannot cause his production
systems to stop working. Something that might exhaust memory until reboot could
be seen as too dangerous to try.)
With these observations, let us get to describing how Safety works and how it
should be extended in a more current system.

The program "Safety" was devised for VMS in the early 1990s as an access-control and
authentication layer just above the filesystem layer. It used the FDT (function decision
table) access points in the drivers, which are the point where information for I/O
requests are taken out of the requesting process and sent to filesystems, rather than
the more conventional path of intercepting the start of the filesystem (the XQP or
possibly ACP) because the FDT interface is documented and was expected to be relatively
stable, where the FDT to XQP interface is not so well documented and is likely to change
more.
Safety used an access control entry (ACE) attached to files (including directory files)
it was monitoring to contain information about what controls were to be done, and was an
interpreter of this information. (A version was written at one time which kept the 
information in a separate file, which also worked.) To avoid time-of-check to time-of-use
attacks (e.g., renames of file paths after the checking was done and before the file access
was done), Safety would do exactly one access to files to get the control information, then
force all subsequent access in the operation to be done to use file identifier, which is
an unchanging file attribute, guaranteed not to change but to refer always to the same
underlying file object. This ensured against such attacks. It might be noted that a
system intercept like this should be able to know about rename requests from anywhere, and
might employ its own locking to ensure that the file accesses should act like an idempotent
single access. If that does not work, it would be necessary to attempt kludges like 
checking inode numbers/device numbers before and after. Just taking an object's path and
using it repeatedly invites the kind of timing attacks mentioned. (It is also clear that
the control information used needs to be protected from attack. In a current environment,
the controls Safety employed would be found inadequate. This would perhaps militate toward
using separate storage for the security information, making it easier to guard than an ACE
which was not after all designed to be attack proof, despite the simple minded controls
that were in place to have the ACE checked for authenticity at runtime.
Safety was limited to disk access, as that was thought to be a useful set of functions
in the early 1990s, and it was deemed that adding network, memory, pipe, and other
I/O channels to what was monitored could be added later if there was interest. The fact
that VMS hardly got mentioned by the vendor did not help cause interest to grow in such
access controls.
A more current system would need to monitor all these channels, and should have also the
ability to monitor other functions.
Database, i/o rate, network locality/remoteness, flagging accesses as changing the trust
one has for the process doing I/O, are all kinds of functions needed.
One should be able in a "security interpreter" like this to monitor not only open, close, 
rename, and delete, but also some read and write functions. For one example, it is useful
at times to know whether some object is accessed at an anomalously high rate, either for
read or write. It takes only a few machine instructions to maintain counters of reads or
writes with every such operation, yet these could enable triggers to notify the system
of unusual goings-on, or to delay access. (Suspicious activity might encounter growing
delay rather than hard blocks, for example.) 
Database access is hard for an operating system to monitor normally, as the details about
what is going on exist within a large and complex application and are not standardized.
However, by monitoring some I/O for particular patterns, it is possible sometimes to
learn useful things about access and permit alarms or controls. These would have to be
specified as interpreter programs. An example might be the kind of monitoring of I/O for
SQL keywords that I wrote a watcher for. By watching input and searching patterns of
SQL keywords by a process, it was possible to search a channel for unusual SQL commands
that might not be expected in a particular channel. This was good for looking for SQL 
injection. Not every situation will lend itself to looking for anomalies (and one must
remember that databases often have end-of-week or end-of-month processing differing by
orders of magnitude from daily use patterns). Still, sometimes databases can be
watched, and a watcher exterior to the DBMS can be harder to attack than the DBMS itself.
Network access should be checked also. One of the key features that net access will
have is its locality, both geographic (using, perhaps, geolocation information that might
be cached to improve performance) and network. If a process wants to access a system
called foo.bar, you might do well to know that it is not accessing system evil.com. Doing
this by name is easily attacked with DNS or routing attacks, but one can use network
address closeness to suggest organizational closeness, avoiding traffic that goes to
unrelated third parties. One can know inside a company what network addresses are inside
and what are outside. The object here is to use the remoteness of access to judge the
trust to be given to the process doing the access. By doing this inside an access control
layer, the security interpreter can perhaps allow or block (or redirect, as Safety does)
access to sensitive internal information by the guilty process. This kind of decision
should be possible as soon as unexpected "external" access is done.
There will be a considerable amount of control needed to set this all up and keep the
monitoring needed set up.
One thing needed which Safety lacked was robust handling of file creation, where
security attributes might be inherited from directory, or where they might be set by
scanning a file to determine sensitivity. During inheritance, it would be possible to
reset softlinks from one file version to the latest if there were a desire to keep that
Safety feature, or to reset HSM links or the like. As a first approximation to a solution
I would suggest an expanded set of directory flags that tell how new files copied into
a directory or created there would be checked. A scanning pass could be used to set up
further refinement. My experiments showed that simply looking for frequencies of
a few dozen regular expressions in files, selected for relevance to a company's 
business, does a reasonable job of locating what datasets are likely to be sensitive.
Very short files might not need such checking, and other heuristics can help flag
what can be left out. The idea here is that such actions can do roughly as well at
identifying sensitive material in large collections of data as can human scans, and
much more cheaply. By having useful "sensitive" flagging on material, it makes the
job of protecting it much easier than trying to protect everything.
The point of all this is that we will be setting up the system so that all available
information about what someone is doing and what evidence there is for that person's
identity and motives can be used in determining what they are allowed to access or do.
The existing features in the Safety product are a start, and should be kept.
However, there is a lot of administration overhead in setting access rules for everything
and what will be feasible will be a set of mostly heuristics that can be applied, not
a set of fine tuned rules. Safety assumed that the most valuable information in an
environment can be picked out by hand. In a large enterprise this won't work. It is not
clear it will work even for one individual's system. Rather, the network neighborhood will
initially be "local system" and "everywhere else", trust defaults will be set up for both
of these, and defaults will exist all over. Once some rules can be laid out for what is
sensitive, scans of files can be used to mark sensitive items, and we would have
inheritance so that items written by processes accessing sensitive items would also
be marked sensitive, then set up for rescan, so that nonsensitive items don't just
drift up in sensitivity. 
Flagging what permissions should be used where issues of "where is the control?" or 
"who is controlling the operations here?" arise are harder. Wherever an untrusted
program runs, it could be an agent for remote control. Where a remote connection
exists, the same could be happening. It might be useful to use explicit human
input as a clue that local control exists...so that for example if you see clicks
from a local device, or typed characters, and some new file is opened shortly after, 
that open seems to be under human control. Where some I/O is started later, without
any evidence of direct human agency, it should be flagged as potentially alien
control and thus of less trust. This kind of thing flies in the face of vast amounts of
automatic function but seems necessary if the trust accorded to a human operator is
to be higher than that being given to unknown agents. The granularity of trust
boundaries needs, too, to be finer than a whole process, where actions like injecting
a .dll into a process can happen. The idea of an access layer will thus be seen to
involve considerable ability to probe what goes on within programs as well as just
watching their I/O. If an injected library or the like needs to be opened by an open
type operation, catching the fact of its being added is straightforward. If this
is not the case (and it certainly need not be the case) then when some object is
being opened, the source process' excution thread counts or similar trails might need
to be inspected. Even there, if code is just put in a nonpaged pool and run from a
timer queue or the like, it might run with minimal evidence of its association, and
would not involve necessarily any kernel thread control structures whose process
association would be easy to find. This kind of monitor logic will need to be worked
out over time.
The whole idea of access control also makes some presumptions that there is a
somewhat stable underpinning present. If it is possible to inject a hypervisor
beneath the OS, or to get a processor to violate the rules it is supposed to keep
intact that separate processes, modes, etc., then access control can be impossible
until this can be repaired. Safety is directed at attacks due to controls not
matching what is wanted, mainly due to insufficient ability to use information about
what is going on. It does not assume that a person will always have the same
motives or trustworthiness, but tries to infer from whatever it can see whether
the person is acting trustworthy at the time an action is done.
inheritance