Safety Program Architecture When we consider how to safeguard computer actions, the difference between good ones and wrong ones seems to be that the former are authorized by policy and so on, the bad ones violate policy. This can be seen pretty commonly, and a large class of "bad" actions happen because some agent (a person or a program: note it is not always the former) gets allowed to do something outside the space of one program running in some memory and is not blocked or questioned where it should be. This is most of the time involved with opening some dataset, whether it be a disk file of data, a file containing a program somewhere, or sometimes a communications scheme that gets treated in a somewhat file like way (a pipe, a socket...). Also, it is common that most objects on a computer are not particularly sensitive. The little note that reminds someone to pick up a gallon of milk on the way home, the personal scheduling notes, and so on, are usually of no interest to thieves. However, the sensitive programs or files can be very sensitive indeed. It is clear that when you think about where to protect such things, the further the protection is away from the thing to be protected, the harder it is to be sure that protection will work. It gets too easy to disguise the identity of sensitive things...so limiting with a firewall is a very coarse grained protection. If on the other hand access control and authorization are done at the level where the machine itself defines the object, this control can be hard to evade and entirely clear about what is being protected. An initial approximation of this was thought to be useful, as a system which would protect file access of all kinds, and whose protection could not be circumvented by simple use of privileges. Since at the time Safety was devised, running programs also involved opening files, doing extensive checking on file access could block unwanted actions as well as unwanted accesses. It will be clear that the kind of access checking needs to be more than seeing whether a user is statically permitted to open a file. Users change their motives, and computers ascribe a user identity to whatever they run which is based usually on who is logged in, even though the code actually running may not have been consciously started by a user command. Thus a better auth system (authentication and/or authorization) needs to be able to check as much evidence as can be had to discern whether access to some object (initially object=file, but I argue here that it needs to be more widely defined) is to be granted. If you examine the functions the Safety program implements, you will note that the hierarcical storage functions, the user mode undelete functions, and the storage speed-up functions are not directly security related, but I will suggest that a security system needs to consider what it can provide to assist users, in that there is always a temptation to shut down the security system if attacking a system, or in preparation to attack. Having the security system provide some other functions which are seen as directly assisting users makes it unattractive to just turn it off. The kinds of evidence Safety uses are items available in VMS, but the concept is not limited to those. Also, later experiments have shown that automated scanners can be made effective in detecting sensitive files with high probability, and the results of such scans can be used to avoid the need for manually flagging sensitive files or programs. With systems like this, user identity is no longer precisely a static tag, but allows exceptions based on things like his behavior. Choice of programs to use and level of privilege are two such exceptions in Safety; recent history for this username's activity can be another, depending on what can be made available of this. The Safety program does assume that there is an underlying login capability which will give a username that is well enough checked that it can be used, and it assumes that the operating system is not utterly compromised. (The hope is that by blocking unwanted program accesses, paths to corrupting the OS may be blocked.) Thus, "blue pill" or "red pill" type accesses or attacks based on changing the boot path and inserting persistent code are presumed not present. Attacks based on direct physical access to hardware are also not defended, nor can they be very well by any code living within the OS. Thus it is seen Safety is not a complete cure for security holes. It does however have many useful features. The ability Safety has to block access to processes having too many privileges will be interesting to some observers, and the ability to have Safety note that someone is doing something unauthorized (e.g., trying to access a file with a program not in the allowed set) and open a different file rather than the one requested are a couple unusual capabilities that readers might note. The latter ability allows one to set traps for evildoers rather easily. Also, when an access violation is noted, Safety does not normally report this back to the user. It will instead report a hardware error by default (giving an intruder the idea that the underlying disk got a parity error and is possibly failing), so if access is not redirected somewhere safe, it gets denied without alerting the user that he has been caught. (A still more interesting scheme consisted of causing the user process to hang persistently, so that its user memory could be examined but not erased or altered. Safety could have been set up to allow such action, but this was not put in initially, because a third party security designer has to be able to convince customers that the product cannot cause his production systems to stop working. Something that might exhaust memory until reboot could be seen as too dangerous to try.) With these observations, let us get to describing how Safety works and how it should be extended in a more current system. The program "Safety" was devised for VMS in the early 1990s as an access-control and authentication layer just above the filesystem layer. It used the FDT (function decision table) access points in the drivers, which are the point where information for I/O requests are taken out of the requesting process and sent to filesystems, rather than the more conventional path of intercepting the start of the filesystem (the XQP or possibly ACP) because the FDT interface is documented and was expected to be relatively stable, where the FDT to XQP interface is not so well documented and is likely to change more. Safety used an access control entry (ACE) attached to files (including directory files) it was monitoring to contain information about what controls were to be done, and was an interpreter of this information. (A version was written at one time which kept the information in a separate file, which also worked.) To avoid time-of-check to time-of-use attacks (e.g., renames of file paths after the checking was done and before the file access was done), Safety would do exactly one access to files to get the control information, then force all subsequent access in the operation to be done to use file identifier, which is an unchanging file attribute, guaranteed not to change but to refer always to the same underlying file object. This ensured against such attacks. It might be noted that a system intercept like this should be able to know about rename requests from anywhere, and might employ its own locking to ensure that the file accesses should act like an idempotent single access. If that does not work, it would be necessary to attempt kludges like checking inode numbers/device numbers before and after. Just taking an object's path and using it repeatedly invites the kind of timing attacks mentioned. (It is also clear that the control information used needs to be protected from attack. In a current environment, the controls Safety employed would be found inadequate. This would perhaps militate toward using separate storage for the security information, making it easier to guard than an ACE which was not after all designed to be attack proof, despite the simple minded controls that were in place to have the ACE checked for authenticity at runtime. Safety was limited to disk access, as that was thought to be a useful set of functions in the early 1990s, and it was deemed that adding network, memory, pipe, and other I/O channels to what was monitored could be added later if there was interest. The fact that VMS hardly got mentioned by the vendor did not help cause interest to grow in such access controls. A more current system would need to monitor all these channels, and should have also the ability to monitor other functions. Database, i/o rate, network locality/remoteness, flagging accesses as changing the trust one has for the process doing I/O, are all kinds of functions needed. One should be able in a "security interpreter" like this to monitor not only open, close, rename, and delete, but also some read and write functions. For one example, it is useful at times to know whether some object is accessed at an anomalously high rate, either for read or write. It takes only a few machine instructions to maintain counters of reads or writes with every such operation, yet these could enable triggers to notify the system of unusual goings-on, or to delay access. (Suspicious activity might encounter growing delay rather than hard blocks, for example.) Database access is hard for an operating system to monitor normally, as the details about what is going on exist within a large and complex application and are not standardized. However, by monitoring some I/O for particular patterns, it is possible sometimes to learn useful things about access and permit alarms or controls. These would have to be specified as interpreter programs. An example might be the kind of monitoring of I/O for SQL keywords that I wrote a watcher for. By watching input and searching patterns of SQL keywords by a process, it was possible to search a channel for unusual SQL commands that might not be expected in a particular channel. This was good for looking for SQL injection. Not every situation will lend itself to looking for anomalies (and one must remember that databases often have end-of-week or end-of-month processing differing by orders of magnitude from daily use patterns). Still, sometimes databases can be watched, and a watcher exterior to the DBMS can be harder to attack than the DBMS itself. Network access should be checked also. One of the key features that net access will have is its locality, both geographic (using, perhaps, geolocation information that might be cached to improve performance) and network. If a process wants to access a system called foo.bar, you might do well to know that it is not accessing system evil.com. Doing this by name is easily attacked with DNS or routing attacks, but one can use network address closeness to suggest organizational closeness, avoiding traffic that goes to unrelated third parties. One can know inside a company what network addresses are inside and what are outside. The object here is to use the remoteness of access to judge the trust to be given to the process doing the access. By doing this inside an access control layer, the security interpreter can perhaps allow or block (or redirect, as Safety does) access to sensitive internal information by the guilty process. This kind of decision should be possible as soon as unexpected "external" access is done. There will be a considerable amount of control needed to set this all up and keep the monitoring needed set up. One thing needed which Safety lacked was robust handling of file creation, where security attributes might be inherited from directory, or where they might be set by scanning a file to determine sensitivity. During inheritance, it would be possible to reset softlinks from one file version to the latest if there were a desire to keep that Safety feature, or to reset HSM links or the like. As a first approximation to a solution I would suggest an expanded set of directory flags that tell how new files copied into a directory or created there would be checked. A scanning pass could be used to set up further refinement. My experiments showed that simply looking for frequencies of a few dozen regular expressions in files, selected for relevance to a company's business, does a reasonable job of locating what datasets are likely to be sensitive. Very short files might not need such checking, and other heuristics can help flag what can be left out. The idea here is that such actions can do roughly as well at identifying sensitive material in large collections of data as can human scans, and much more cheaply. By having useful "sensitive" flagging on material, it makes the job of protecting it much easier than trying to protect everything. The point of all this is that we will be setting up the system so that all available information about what someone is doing and what evidence there is for that person's identity and motives can be used in determining what they are allowed to access or do. The existing features in the Safety product are a start, and should be kept. However, there is a lot of administration overhead in setting access rules for everything and what will be feasible will be a set of mostly heuristics that can be applied, not a set of fine tuned rules. Safety assumed that the most valuable information in an environment can be picked out by hand. In a large enterprise this won't work. It is not clear it will work even for one individual's system. Rather, the network neighborhood will initially be "local system" and "everywhere else", trust defaults will be set up for both of these, and defaults will exist all over. Once some rules can be laid out for what is sensitive, scans of files can be used to mark sensitive items, and we would have inheritance so that items written by processes accessing sensitive items would also be marked sensitive, then set up for rescan, so that nonsensitive items don't just drift up in sensitivity. Flagging what permissions should be used where issues of "where is the control?" or "who is controlling the operations here?" arise are harder. Wherever an untrusted program runs, it could be an agent for remote control. Where a remote connection exists, the same could be happening. It might be useful to use explicit human input as a clue that local control exists...so that for example if you see clicks from a local device, or typed characters, and some new file is opened shortly after, that open seems to be under human control. Where some I/O is started later, without any evidence of direct human agency, it should be flagged as potentially alien control and thus of less trust. This kind of thing flies in the face of vast amounts of automatic function but seems necessary if the trust accorded to a human operator is to be higher than that being given to unknown agents. The granularity of trust boundaries needs, too, to be finer than a whole process, where actions like injecting a .dll into a process can happen. The idea of an access layer will thus be seen to involve considerable ability to probe what goes on within programs as well as just watching their I/O. If an injected library or the like needs to be opened by an open type operation, catching the fact of its being added is straightforward. If this is not the case (and it certainly need not be the case) then when some object is being opened, the source process' excution thread counts or similar trails might need to be inspected. Even there, if code is just put in a nonpaged pool and run from a timer queue or the like, it might run with minimal evidence of its association, and would not involve necessarily any kernel thread control structures whose process association would be easy to find. This kind of monitor logic will need to be worked out over time. The whole idea of access control also makes some presumptions that there is a somewhat stable underpinning present. If it is possible to inject a hypervisor beneath the OS, or to get a processor to violate the rules it is supposed to keep intact that separate processes, modes, etc., then access control can be impossible until this can be repaired. Safety is directed at attacks due to controls not matching what is wanted, mainly due to insufficient ability to use information about what is going on. It does not assume that a person will always have the same motives or trustworthiness, but tries to infer from whatever it can see whether the person is acting trustworthy at the time an action is done. inheritance