Auth Ideas Glenn C. Everhart Nov. 2, 2012 We can start with thinking of a program like VMS "Safety" for server OSs but for current server systems - I suppose Linux for beginning - and designed for large networks. It should allow control of more system access, not just files (tho in Linux in a sense everything's a file), but access controls should cover all kinds of object accesses. It should be possible to control what a disallowed access sees - extend beyond just opening a different file, certainly to different IP addresses and URLs and probably with different identities (to make tracking easier and fit in with dbms access control) and fit also with maybe preprocessing tests of access to objects. This would need to be flexible patterns of what is looked for. User identity is more than 1 process: covers many such, over significant time. Possibly start with everything in a uid since we will want to track what access software is used, and will want to be able to make it sensitive to what's accessed (and to size or data amount) so we can tell that some identity is accessing many objects that are marked with elevated sensitivity or which are part of elevated sensitivity tests (be able to see patterns of actions!) and want to be able to notice when significant amounts of data get sent over boundaries (such access is a pattern constituent) or when possibly large transactions get done across such, so the access can be questioned if need be. The structure(s) tracking user identity & object characteristics need to be persistent so typical access is mostly some memory checks. Given large cheap memory it's feasible to go much further than 20 yrs ago, keep procedure kinds of checks available. Sensitivity/auth strength (=log(success/failure)) (=log(correct/incorect)) =log(number successes that must be present per fail) should be the measure kept for auth strength & sensitivity (keep it somewhat familiar to users who have to calibrate this!). However note that channel, device, and method all have strength and different kinds of risk contribute to sensitivity, so "strength >= sensitivity" is rough only. It is important that the system check on many accesses that patterns of "suspicious behavior" are not triggered, and system actions will be important in that they can alter probabilities. We don't just look for frequency of access to sensitive data but also to patterns of access. A process that looks at only sensitive data over time, though the rate may be slow, might be doing a stealth extract of it. Of course sensitivity markings need auto downgrade (via scanner plus human override rules) to compensate for objects inheriting sensitivity marks. New info might feed into structures in memory (disk-backed) when evidence comes in for sensitivity of access. An API is needed so daemons from elsewhere can feed this. Auth the daemons with remote queries about their internals to be sure they don't get easily tampered with. There'll be many low level alerts: you want to aggregate to let humans judge these when they persist or get more frequent. Trans amounts might need to come in from daemons outside a system too. You want access protection close to what is being protected but some behavior evidence will come from outside (as will info about what software is run, maybe what commands are given to it tho sys can't get too far into the weeds trying to track this.) If access is coming via something like a daemon serving many it will be necessary to figure what access comes from whom. That may need something watching remote daemon to tell? If the remote system has our auth ctl software, the remote sw can communicate with local sw. The "firewall" gets distributed across many systems that way (as in Safety the control on one disk could communicate with that for other disks). (Include all the stuff Safety has!). Greater care in being sure such comm is legit must be done tho; there are more attacks now and the system might be compromised in some nodes. Do we want to try looking for funny patterns on stacks of returns so we can spot infected progs/odd sys calls? Initial thought is not, just to get something working. Doing such in my code to protect it some: maybe. Protecting memory structures: yes, since they will be large and control access. Guess they need to be stored encrypted, accessed via some software that might morph and use implicit key storage (keys computed from e.g. code hash) and have keys change often. Bah...gonna mean that they need to get reencrypted a lot (with moving bdy like shadow disk copy) and have key change even oftener to work like newer virt dsk where encrypt key gets crypted again by user key, so user key can chg oftener). Underlying store needs to be crypted. Memory store might be too slow if crypted...may have to have it clear. If so must protect vs. knl attacks somehow, if only moving it around, maybe having a common relatively fast "gate" routine that gets one to given data. Store in a hash in memory and let the gate figure where bits are? A possible saving grace: the sys is for servers that are less exposed to attack. Might do to be able to detect when one gets infected... There will need to be data areas per user, for longer term stuff, and per object, for sensitivity, accesses permitted limitations, what to do if access is denied and the like. These would be inherited as objects get created and (I would expect) checked and maybe lowered once created by some catch-up scan process to check sensitivity. The checks would run when an object got written to also (in case the sensitive info got cleansed). Obviously if you check once a day, you check only once, even if the object was written to many times. Ditto if the cycle is slower. All this storage is to be on the servers, that had better be relatively safe. A signature of some kind of the data would be good to keep and check so random priv'd processes on the servers could not just scribble on this data. Remote checksums between servers would help detect other corruption if any appeared. User endpoint boxes should not be trusted basically. It is too hard to ensure they aren't compromised, and trusted users are only going to act tustworthy some of the time. The interaction of these data areas (the ones representing users and those representing objects) needs to be that access decisions will call some function(s) which will have both as input and must generate: 1. decision: allow access, disallow access, or demand more info, and 2. decision: what to do if access is denied. This can be to return an error code, access something else instead, delay access (and tell how long), or possibly edit the access somehow. In practice the per-object decision will need to contain the recipes for how to react but in cases of rate limits there will need to be rate info in the per-user data. The per-user data will need to persist too: it must be able to handle cases where a suspicious process might pop up every 10 minutes with no user process at all present between times for that user. (Might as well treat this by group also since per user might be too hard to administer.) Some accesses that need more info might be ok to say of them: allow now, but queue request to get more info and when it arrives, deny further access. This would imply access decisions on every access, at least for some objects. (Usual is to decide at open only.) It would be most sensible to try and do this when bulk media access is done (in case of syscalls that might be reading files e.g., a byte at a time). Thus most sensible kind of limit might be a "clock tick of next check" item, so checks get rate limited. (Care with clock resets here!) With such things access could be allowed. Delayed access is just going to need to put caller into wait state and delay. No help for that; the idea of such is to discourage too-often accesses to sensitive data where prohibition is too harsh (e.g., because the access can sometimes be legit). Access time of all accesses can count this but a count per (delta time) can be used so long as the rate can be a bit fuzzy. Say 10 accesses 10:00-10:05, 20 10:05-10:10, 30 10:10-10:15, 8 10:15-10:20. If I want 32 or less within 10 min I need only the last 2 periods, can discard older ones. Need a couple of these bins to be active so it's possible to dump a whole one and not make threshold vary too much. More bins gives less fluctuation. Each object would need time size of such bins. User roles might have a multiplier. Counts must be per user. Let user multiplier multiply time and another include limits multiplier (to allow some users lots of access). Might need to go to a role structure as well some time so that limits on access by object per user could be thought about. That would mean while normally the object determines its sensitivity, we might decide that we want the user to be factored in also. A not very trustworthy user might be permitted lower access rates to data on a file, but normal access rates to other files. Not sure it will happen (leave this for a V2) but keep in mind. Activity by users, things like access frequency to different object sensitivity classifications, should be either logged or possible to log so that patterns of access might be discerned. This will want to be by path so analysis will be feasible; inode and device or the like would be too hard to track. It would be best if other "suspicion factors" could be recalled - things like the kind of software used for the access perhaps - but that much detail bids fair to produce massive data that cannot be inspected. Better attempt some classifications in broad swaths, with access software class, destination (inside/outside for example), and counts of access to object sensitivity classes. These would just be labels (in unix, extattribs) on files so they travel with the files. Keeping another record to check for tampering & allow reset needs to be there but you want to avoid using such in realtime. Ability to get more info for certain user classes is likely to be asked for (as when you see lots of access to sensitive stuff from somebody, want to be able to know just what's being peeked at.) There needs to be a structure defined which will keep the identity (and behavior) evidence that someone is the person known and acting as expected around. This would get seeded by login at first with some (minimal perhaps) strength. It can be queried by other access requests and additional strength needs fired off. The means for behavior monitoring would be handled by behavior monitors adjusting the id strength saved so that for example malware actions running in code supposedly that of the user would be noted as acting unlike the user should act and thus leading to the conclusion that the actor was not the known user. Alerts about what was going on could be provided and added to at need, once the framework is done. To simplify setup I will I think use FUSE to get a kernel part which is maintained and working (much faster than writing from scratch and will make it easier for folks to rely on). There need to be: . Structure that reflects user identity. This needs to initialize what to monitor from groups user is in (which also can tell what evidence daemons are needed for the user). The strength of evidence (probably for and against identity; easier to set thresholds for pos and neg separately) should be kept and recent behavior scores (initially re access frequency of sensitive stuff) need to be kept around. This structure represents a person on a machine so it is more than just groups, needs to be per individual. . Structure to represent file sensitivity. I think this might be a set of categories of sensitivity and a level, plus info about what is permitted when it is opened & what to do if open is unauthorized. (This can be used for opening something else instead, bogus error returns, max priv levels (in linux = sensitive group memberships), and so on. I expect this to be on disk when inactive so any ACL type info would be here. Max access rates might be here too. Initial values can inherit from creator user or directory permissions; let a scan for sensitive info then be fired off to reset too-high sensitivity levels. .Structure to represent groups? This would correspond to roles, gets used to initialize part of individual-identity structures but I am not sure it does anything else. Initially it seems more sensible in Linux to use uid, not session id, so that multiple logins by the same user get aggregated on a box. Unix sometimes allows users to have only 8 or 16 groups for a user, so we need our own aggregation of roles. We don't want to disrupt what is already in use and taking up groups would do that. Put our groups in with the per-uid structures and check those for per-file access. There need to be daemons that run to gather behavior evidence about users to flag suspicious behavior and update identity strength. Other daemons would scan files for sensitive info. Categories on files would be marked so not every file need be examined. Examining subsets of files should be done when possible. The scheme of having a bitmap of possibly sensitive file IDs (file id might be a hash of pathname + a generation #) so most files could just go thru seems likely useful. Implement a lot like Safety but we must make up hashes. Underlying filenames would be all generated hashes, binned by first N chars perhaps so we can avoid huge directories if the underlying filesystem is inefficient then. That way user must go thru our abstraction. We might encrypt too so that scanning underlying system becomes difficult. Fortunately that is far less costly than it was back in 1979. Might make this one optional (for benefit of the nervous). Initial auth picks some evidence, applies it. Other daemons get other auth evidence when needed. Thus we get several intercommunicating processes per filesystem (or several threads depending on what's easiest). --------------------------- To build a system I envision building on a crypto-disk, since there needs to be some way to be sure acces control & auth code isn't just bypassed. Also such systems have interception partly worked out. Services to be provided: 1. Access allowed by identity evidence strength. This will be established at first access, for a given user ID, initially by some "prompt evidence" classes (is user logged in & strength of that, where is user physically far as we can tell (what machine is he on?), local time for user, what program is initially being used for the access. Once user record is set up we keep it around timestamped for last use. If it goes unused too long we will drop it. Do this instead of session ID to make it harder to evade. 2. Support file access passwords within this record. Need a utility to enter them. Files marked needing passwords don't open unless password has been entered. 3. Per file need spec of what to do on access fail: at least allow error code to be chosen, or alternate file to open (silently) 4. "Paranoid mode": Some programs get treated as super untrusted so when they run, user ID strength is made very low, while they run. Files they create get flagged as untrusted too. When a file is created, it inherits sensitivity tags so that opening a sensitive file flags program as sensitive. What is created by a sensitive program gets sensitive flag. What is created by untrusted program gets untrusted flag. These can be fixed up by scanning for sensitive info as batch downgrader program so we record which files are created to allow such utilities to run. User needs rights to access sensitive or untrusted files (different rights). Untrusted anything created has x bit turned off everywhere so it cannot be executed. 5. Access control: Strength of id-assurance or auth > risk of file generally needed. Sensitive files will have greater risk generally but idea is that sensitivity gets inherited initially, fixed up if too high by downgrader. Sensitivity can be added to by other daemons that flag things like regulatory risk. Ditto auth strength can be altered (it is more of a "user trust" quantity) by daemons dealing with trust dimensions other than auth strength alone. Untrusted processes cannot create files with x field (though there needs to be a marking that allows upgraders/downgraders to run, or things like fsck or defraggers to run without interference.) 6. Note that id assurance ("auth strength") depends also on non-prompt info which should get fed in be daemons. 7. When an initial access is done, auth fail can (configurable) request daemons to gather more auth evidence whose strength may add to auth strength. Opening process in that case waits for such to run. Timeout means auth fail processing gets used. 8. Access rate (read rate) of sensitive files will be accumulated as a rate over some time period. File tags will have rate allowances; nonzero allowance means rate gets checked. Access by anything with user id gets counted. Excess rate means i/o gets slowed down (put process into wait state, pull it out after a "jiffy" or a few such. Should this be per file or global? Per file seems good to do but a global limit for nonzero accesses can be used to thwart sneaky reads of a lot of different sources by one user. This all depends on user ID. (note uid, not euid.) 9. It is expected that mostly files will get tagged sensitive or not (too hard to administer gradations in most cases) but the math will still be used. 10. Read or write access should get the access check run also, either all the time or every so often (second?) so that dynamic auth changes ("he starts acting weird") can be handled. This can mean access gets blocked for daemons (if this is allowed; I would have this function configurable per file (a site would tend to have all markings one or the other default)) or stop after an open had succeeded. Report as i/o error generally. I am suspicious of idea of switching to some other file in these cases. The per uid structures would be backed to disk in one place; per file ones should be separate. Seems likely I might need to do per-file data based on inode number and maj/min device. Want this stuff disk based in case someone contrives to attack from many processes - run process, attack a little, exit, run another process etc. - want the statistics to accumulate over time. Also want authN strength info not to be lost. It should time out mostly. File sensitivity should mostly be slowly varying if it changes at all. Scan for sensitive info (look for many sets of regex's) to downgrade, but now & then scan unscanned files to avoid missing anything. Import from other sites will need to allow some bulk setting of flags. Thus we will have a facility to set markings of a process that is being used for such, so e.g. an sftp daemon might be set to have all it pulls in get some set of markings. The foregoing is an initial part only. We need badly to control net access, and that might not bother with the VFS system. At most a "file level" check might tell you someone is accessing the network device. We should integrate probably something like a firewall so it can be sensitive to the port and IP addresses. I expect an initially useful distinction is "inside" vs "outside", which can be done by IP ranges in ipv4 anyway. The ranges would have to be set up, but that would give an idea when traffic is crossing. We want to be able to count bytes. We want that in counts of access per time even to disk stuff, but in networks it becomes more obvious that bytecount is likely all one will have. Traffic when encrypted won't be readily analyzable but byte count will remain. The need to be able to control access to network and to be able to treat outside access as tainting programs that get significant code from outside so they get marked untrusted could be used to limit malware access. What gets written, to anywhere, after reading the network would not be writeable to allow something to be executable. This won't handle all in-memory cases but will inhibit some methods of spreading past initial bootstraps. By the way we want to make darn sure that untrusted programs cannot create suid or sgid files either. To get to program innards it will be necessary to go beyond this and start perhaps watching stacks. The foregoing is a possibly useful set of capabilities to start with though, and is suited to data loss prevention problems on servers. The philosophy is to keep sensitive data from getting to workstations in the first place, not putting it into places where a potentially hostile user might have physical access to it. (It would be useful to mention that on user systems where such data resides, it would be better to have it reside there encrypted than in open, so that only the known OS that decrypts it can get to the data without a lot of searching. In principle one could have the key reside on a network end, as long as it were accepted that the data could be available only when connected to a company network, but in practice there's sense in pulling anything sensitive onto a user desktop/laptop/etc only if it needs to be usable if the network is down.