Auth Ideas 
 Glenn C. Everhart Nov. 2, 2012

We can start with thinking of a program like VMS "Safety" for server OSs
but for current server systems - I suppose Linux for beginning - and designed
for large networks. It should allow control of more system access, not
just files (tho in Linux in a sense everything's a file), but access
controls should cover all kinds of object accesses. It should be possible
to control what a disallowed access sees - extend beyond just opening a different
file, certainly to different IP addresses and URLs and probably with different
identities (to make tracking easier and fit in with dbms access control)
and fit also with maybe preprocessing tests of access to objects. This would 
need to be flexible patterns of what is looked for.

User identity is more than 1 process: covers many such, over significant
time. Possibly start with everything in a uid since we will want to 
track what access software is used, and will want to be able to make
it sensitive to what's accessed (and to size or data amount) so we can
tell that some identity is accessing many objects that are marked with
elevated sensitivity or which are part of elevated sensitivity tests
(be able to see patterns of actions!) and want to be able to notice
when significant amounts of data get sent over boundaries (such access
is a pattern constituent) or when possibly large transactions get done
across such, so the access can be questioned if need be.

The structure(s) tracking user identity & object characteristics need
to be persistent so typical access is mostly some memory checks. Given
large cheap memory it's feasible to go much further than 20 yrs
ago, keep procedure kinds of checks available.

Sensitivity/auth strength (=log(success/failure)) (=log(correct/incorect))
=log(number successes that must be present per fail) should be the
measure kept for auth strength & sensitivity (keep it somewhat familiar
to users who have to calibrate this!). However note that channel, device,
and method all have strength and different kinds of risk contribute
to sensitivity, so "strength >= sensitivity" is rough only.
 It is important that the system check on many accesses that patterns
of "suspicious behavior" are not triggered, and system actions will
be important in that they can alter probabilities.

We don't just look for frequency of access to sensitive data but also
to patterns of access. A process that looks at only sensitive data
over time, though the rate may be slow, might be doing a stealth
extract of it.

Of course sensitivity markings need auto downgrade (via scanner
plus human override rules) to compensate for objects inheriting
sensitivity marks. 

New info might feed into structures in memory (disk-backed) when
evidence comes in for sensitivity of access. An API is needed so
daemons from elsewhere can feed this. Auth the daemons with remote
queries about their internals to be sure they don't get easily
tampered with. 

There'll be many low level alerts: you want to aggregate to let
humans judge these when they persist or get more frequent.

Trans amounts might need to come in from daemons outside a system
too. You want access protection close to what is being protected
but some behavior evidence will come from outside (as will info
about what software is run, maybe what commands are given to it
tho sys can't get too far into the weeds trying to track this.)

If access is coming via something like a daemon serving many it
will be necessary to figure what access comes from whom. That
may need something watching remote daemon to tell? If the
remote system has our auth ctl software, the remote sw can communicate
with local sw. The "firewall" gets distributed across many
systems that way (as in Safety the control on one disk could
communicate with that for other disks). (Include all the stuff
Safety has!). Greater care in being sure such comm is legit must
be done tho; there are more attacks now and the system might be
compromised in some nodes.

Do we want to try looking for funny patterns on stacks of returns
so we can spot infected progs/odd sys calls? Initial thought is
not, just to get something working. Doing such in my code to
protect it some: maybe. 

Protecting memory structures: yes, since they will be large and
control access. Guess they need to be stored encrypted, accessed
via some software that might morph and use implicit key storage
(keys computed from e.g. code hash) and have keys change
often. Bah...gonna mean that they need to get reencrypted
a lot (with moving bdy like shadow disk copy) and have
key change even oftener to work like newer virt dsk where
encrypt key gets crypted again by user key, so user key
can chg oftener). Underlying store needs to be crypted.

Memory store might be too slow if crypted...may have to
have it clear. If so must protect vs. knl attacks somehow, if
only moving it around, maybe having a common relatively fast
"gate" routine that gets one to given data. Store in a hash in
memory and let the gate figure where bits are?

A possible saving grace: the sys is for servers that are less
exposed to attack. Might do to be able to detect when one
gets infected...

There will need to be data areas per user, for longer term stuff, and per 
object, for sensitivity, accesses permitted limitations, what to do if
access is denied and the like. These would be inherited as objects
get created and (I would expect) checked and maybe lowered once
created by some catch-up scan process to check sensitivity. The
checks would run when an object got written to also (in case the
sensitive info got cleansed). Obviously if you check once a day, you 
check only once, even if the object was written to many times. Ditto
if the cycle is slower.

All this storage is to be on the servers, that had better be relatively 
safe. A signature of some kind of the data would be good to keep and
check so random priv'd processes on the servers could not just
scribble on this data. Remote checksums between servers would
help detect other corruption if any appeared.

User endpoint boxes should not be trusted basically. It is too hard
to ensure they aren't compromised, and trusted users are only
going to act tustworthy some of the time.

The interaction of these data areas (the ones representing users and those
representing objects) needs to be that access decisions will call some function(s)
which will have both as input and must generate: 1. decision: allow access, disallow
access, or demand more info, and 2. decision: what to do if access is denied. This can
be to return an error code, access something else instead, delay access (and tell
how long), or possibly edit the access somehow. 
In practice the per-object decision will need to contain the recipes for how
to react but in cases of rate limits there will need to be rate info in the
per-user data. 

The per-user data will need to persist too: it must be able to handle cases
where a suspicious process might pop up every 10 minutes with no user process
at all present between times for that user. (Might as well treat this by group
also since per user might be too hard to administer.)

Some accesses that need more info might be ok to say of them: allow now, but queue
request to get more info and when it arrives, deny further access.
This would imply access decisions on every access, at least for some objects.
(Usual is to decide at open only.) It would be most sensible to try and
do this when bulk media access is done (in case of syscalls that might be
reading files e.g., a byte at a time). Thus most sensible kind of limit might
be a "clock tick of next check" item, so checks get rate limited. (Care 
with clock resets here!) With such things access could be allowed.

Delayed access is just going to need to put caller into wait state and
delay. No help for that; the idea of such is to discourage too-often
accesses to sensitive data where prohibition is too harsh (e.g., because
the access can sometimes be legit). Access time of all accesses can count this
but a count per (delta time) can be used so long as the rate can be a bit fuzzy.
Say 10 accesses 10:00-10:05, 20 10:05-10:10, 30 10:10-10:15, 8 10:15-10:20. If
I want 32 or less within 10 min I need only the last 2 periods, can discard older
ones. Need a couple of these bins to be active so it's possible to dump a
whole one and not make threshold vary too much. More bins gives less
fluctuation. Each object would need time size of such bins. User roles
might have a multiplier. Counts must be per user. Let user multiplier
multiply time and another include limits multiplier (to allow some users
lots of access). 

Might need to go to a role structure as well some time so that limits on access
by object per user could be thought about.
That would mean while normally the object determines its sensitivity, we
might decide that we want the user to be factored in also. A not very trustworthy
user might be permitted lower access rates to data on a file, but normal
access rates to other files. Not sure it will happen (leave this for a V2)
but keep in mind.

Activity by users, things like access frequency to different object 
sensitivity classifications, should be either logged or possible to log
so that patterns of access might be discerned. This will want to be by
path so analysis will be feasible; inode and device or the like would
be too hard to track. It would be best if other "suspicion factors" could
be recalled - things like the kind of software used for the access perhaps -
but that much detail bids fair to produce massive data that cannot be
inspected. Better attempt some classifications in broad swaths, with
access software class, destination (inside/outside for example), and
counts of access to object sensitivity classes. These would just be
labels (in unix, extattribs) on files so they travel with the files. Keeping
another record to check for tampering & allow reset needs to be there but
you want to avoid using such in realtime. Ability to get more info for
certain user classes is likely to be asked for (as when you see lots
of access to sensitive stuff from somebody, want to be able to know just
what's being peeked at.)

There needs to be a structure defined which will keep the identity
(and behavior) evidence that someone is the person known and acting
as expected around. This would get seeded by login at first with
some (minimal perhaps) strength. It can be queried by other access
requests and additional strength needs fired off.
The means for behavior monitoring would be handled by behavior monitors
adjusting the id strength saved so that for example malware actions
running in code supposedly that of the user would be noted as acting
unlike the user should act and thus leading to the conclusion that the
actor was not the known user. Alerts about what was going on could
be provided and added to at need, once the framework is done.

To simplify setup I will I think use FUSE to get a kernel part which is
maintained and working (much faster than writing from scratch and will
make it easier for folks to rely on).

There need to be:
. Structure that reflects user identity. This needs to initialize what to
monitor from groups user is in (which also can tell what evidence daemons are
needed for the user). The strength of evidence (probably for and against 
identity; easier to set thresholds for pos and neg separately) should be kept
and recent behavior scores (initially re access frequency of sensitive stuff)
need to be kept around. This structure represents a person on a machine so
it is more than just groups, needs to be per individual.

. Structure to represent file sensitivity. I think this might be a set of
categories of sensitivity and a level, plus info about what is permitted
when it is opened & what to do if open is unauthorized.  (This can be used
for opening something else instead, bogus error returns, max priv levels
(in linux = sensitive group memberships), and so on. I expect this
to be on disk when inactive so any ACL type info would be here. Max access rates
might be here too. Initial values can inherit from creator user or directory
permissions; let a scan for sensitive info then be fired off to reset too-high
sensitivity levels.

.Structure to represent groups? This would correspond to roles, gets used to
initialize part of individual-identity structures but I am not sure it does
anything else. Initially it seems more sensible in Linux to use uid, not session
id, so that multiple logins by the same user get aggregated on a box. Unix
sometimes allows users to have only 8 or 16 groups for a user, so we need our
own aggregation of roles. We don't want to disrupt what is already in use and
taking up groups would do that. Put our groups in with the per-uid structures
and check those for per-file access.

There need to be daemons that run to gather behavior evidence about users to
flag suspicious behavior and update identity strength. Other daemons would
scan files for sensitive info. Categories on files would be marked so not
every file need be examined. Examining subsets of files should be done
when possible.

The scheme of having a bitmap of possibly sensitive file IDs (file id might
be a hash of pathname + a generation #) so most files could just go thru
seems likely useful. Implement a lot like Safety but we must make up
hashes. Underlying filenames would be all generated hashes, binned by
first N chars perhaps so we can avoid huge directories if the underlying
filesystem is inefficient then. That way user must go thru our abstraction.
We might encrypt too so that scanning underlying system becomes difficult.
Fortunately that is far less costly than it was back in 1979. Might make
this one optional (for benefit of the nervous). Initial auth picks some
evidence, applies it. Other daemons get other auth evidence when needed. Thus
we get several intercommunicating processes per filesystem (or several threads
depending on what's easiest).


---------------------------
To build a system I envision building on a crypto-disk, since there needs to
be some way to be sure acces control & auth code isn't just bypassed. Also
such systems have interception partly worked out.

Services to be provided:
1. Access allowed by identity evidence strength. This will be established
	at first access, for a given user ID, initially by some "prompt evidence"
	classes (is user logged in & strength of that, where is user physically
	far as we can tell (what machine is he on?), local time for user,
	what program is initially being used for the access. Once user record is
	set up we keep it around timestamped for last use. If it goes unused too long we
	will drop it. Do this instead of session ID to make it harder to evade.
2. Support file access passwords within this record. Need a utility to enter
	them. Files marked needing passwords don't open unless password has been entered.
3. Per file need spec of what to do on access fail: at least allow error code to be
	chosen, or alternate file to open (silently)
4. "Paranoid mode": Some programs get treated as super untrusted so when they run, user ID
	strength is made very low, while they run. Files they create get flagged
	as untrusted too. When a file is created, it inherits sensitivity tags
	so that opening a sensitive file flags program as sensitive. What is created
	by a sensitive program gets sensitive flag. What is created by untrusted
	program gets untrusted flag. These can be fixed up by scanning for
	sensitive info as batch downgrader program so we record which files are
	created to allow such utilities to run. User needs rights to access
	sensitive or untrusted files (different rights). Untrusted anything created
	has x bit turned off everywhere so it cannot be executed.
5. Access control: Strength of id-assurance or auth > risk of file generally needed. Sensitive
	files will have greater risk generally but idea is that sensitivity gets 
	inherited initially, fixed up if too high by downgrader. Sensitivity can be
	added to by other daemons that flag things like regulatory risk. Ditto
	auth strength can be altered (it is more of a "user trust" quantity) by
	daemons dealing with trust dimensions other than auth strength alone.
	Untrusted processes cannot create files with x field (though there needs
	to be a marking that allows upgraders/downgraders to run, or things like 
	fsck or defraggers to run without interference.)
6. Note that id assurance ("auth strength") depends also on non-prompt info which should
	get fed in be daemons.
7. When an initial access is done, auth fail can (configurable) request daemons to gather 
	more auth evidence whose strength may add to auth strength. Opening process
	in that case waits for such to run. Timeout means auth fail processing gets
	used.
8. Access rate (read rate) of sensitive files will be accumulated as a rate over
	some time period. File tags will have rate allowances; nonzero allowance
	means rate gets checked. Access by anything with user id gets counted. Excess
	rate means i/o gets slowed down (put process into wait state, pull it out 
	after a "jiffy" or a few such. Should this be per file or global?
	Per file seems good to do but a global limit for nonzero accesses can be used
	to thwart sneaky reads of a lot of different sources by one user. This all
	depends on user ID. (note uid, not euid.)
9. It is expected that mostly files will get tagged sensitive or not (too hard to
	administer gradations in most cases) but the math will still be used.
10. Read or write access should get the access check run also, either all the time
	or every so often (second?) so that dynamic auth changes ("he starts acting 
	weird") can be handled. This can mean access gets blocked for daemons (if
	this is allowed; I would have this function configurable per file (a site 
	would tend to have all markings one or the other default)) or stop after
	an open had succeeded. Report as i/o error generally. I am suspicious of
	idea of switching to some other file in these cases.

The per uid structures would be backed to disk in one place; per file ones should
be separate. Seems likely I might need to do per-file data based on inode number
and maj/min device. Want this stuff disk based in case someone contrives
to attack from many processes - run process, attack a little, exit, run another process
etc. - want the statistics to accumulate over time. Also want authN strength info
not to be lost. It should time out mostly. 

File sensitivity should mostly be slowly varying if it changes at all. Scan for 
sensitive info (look for many sets of regex's) to downgrade, but now & then scan
unscanned files to avoid missing anything.

Import from other sites will need to allow some bulk setting of flags. Thus we will
have a facility to set markings of a process that is being used for such, so
e.g. an sftp daemon might be set to have all it pulls in get some set of markings.

The foregoing is an initial part only. We need badly to control net access, and
that might not bother with the VFS system. At most a "file level" check might
tell you someone is accessing the network device. We should integrate probably
something like a firewall so it can be sensitive to the port and IP addresses.
I expect an initially useful distinction is "inside" vs "outside", which can be
done by IP ranges in ipv4 anyway. The ranges would have to be set up, but that
would give an idea when traffic is crossing. We want to be able to count
bytes. We want that in counts of access per time even to disk stuff, but in
networks it becomes more obvious that bytecount is likely all one will have. Traffic
when encrypted won't be readily analyzable but byte count will remain.

The need to be able to control access to network and to be able to treat outside
access as tainting programs that get significant code from outside so they get
marked untrusted could be used to limit malware access. What gets written, to
anywhere, after reading the network would not be writeable to allow something
to be executable. This won't handle all in-memory cases but will inhibit some
methods of spreading past initial bootstraps. By the way we want to make darn
sure that untrusted programs cannot create suid or sgid files either.

To get to program innards it will be necessary to go beyond this and start
perhaps watching stacks. The foregoing is a possibly useful set of capabilities
to start with though, and is suited to data loss prevention problems on
servers. The philosophy is to keep sensitive data from getting to workstations
in the first place, not putting it into places where a potentially hostile
user might have physical access to it. (It would be useful to mention that on
user systems where such data resides, it would be better to have it reside there
encrypted than in open, so that only the known OS that decrypts it can get to
the data without a lot of searching. In principle one could have the key reside
on a network end, as long as it were accepted that the data could be available
only when connected to a company network, but in practice there's sense in pulling
anything sensitive onto a user desktop/laptop/etc only if it needs to be usable
if the network is down.