This is a series of articles in which I will present various takeaways from Hadoop Summit 2013. The first in this series , Hive authorization models by Thejas Nair of Hortonworks: –
As you know currently hadoop authentication models are very limited with authentication being done via kerberos and authorization via ACLs.
Hive also tries to utilize similar settings into its authorization/authentication models. Unfortunately for hive, there is the issue of the metastore as well. Currently the metastore is separately secured using an RDBMS style authentication (MySQL/Postgres).
Existing models of authorization: –
- Traditional RDBMS style authorization “ Hive is like an RDBMS, managing its own data.
- Store permissions directly in RDBMS metastore. HDFS authorization will be still separate and will have two sources of truth and HDFS can override the RDBMS permissions.
- Storage based authorization “ Hadoop provides shared storage and hive is one of the tools to use this.
- HDFS based authorization as the only source of truth. The problem here is Hive concepts such as columns and views don’t map to files (coarse vs. fine grained authorization)
- No authorization “ In single user case or dev/poc environment
One potential solution is combining the two models (RDBMS and shared storage). Though this is the presenter’s personal opinion and more details have to go into how to do this.
Hive Jira tickets 3705 and 3720 talk about this in more detail.