Snowflake is proud to announce the open source release of SansShell, a non-interactive local host management agent. Its purpose is to enable strong authentication, authorization, and auditing of the management of servers of any type. The source code is available on GitHub.
At its core, SansShell is designed to provide a way to define complex server management actions in code and selectively expose those actions to remote clients. Each action is subject to an authorization policy that can limit access based on the caller, the type of action, and the content of the request. It is built to be both modular and extensible, with the ability to easily define new action types, remove unneeded actions, or replace the authentication and policy engines.
It is implemented as a lightweight gRPC server, with reference implementations of common administrative actions (read/modify files, run commands, install packages, etc.), support for mTLS for authentication, and authorization policies written in Rego.
The non-interactive nature of SansShell allows us to evaluate maintenance actions as “safe” in advance, code review them, and encode them directly in the policy. It also allows us to permit “less safe” actions with a very strong guarantee of being able to audit the exact granular action later. The design also allows us to require local, multi-party approval of those actions in advance (this feature is coming very soon!). This will make it suitable for ultra low-dependency emergency access, while retaining excellent audit logs of the necessary actions.
The SansShell proxy
SansShell may be widely deployed across many machines in an environment, perhaps even all machines. Changing its authentication and authorization policy should be done slowly and with care, to avoid any risk of an unforeseen coordinated outage. This makes it inherently risky to rapidly evolve the local policy for SanShell, such as during active troubleshooting or incident response. Furthermore, some actions (such as calculating the checksum of a particular file) are often performed across many machines, and would be time-consuming and error-prone to perform serially from a user’s workstation.
To address both of these problems, SansShell includes a smart proxy server that can be deployed between users and machines running SansShell. This proxy provides a single administration point for fine-grained policy enforcement and request logging, while providing the ability to “fanout” a single request to multiple SansShell servers in parallel. It also permits users to make authorized requests to Sanshell servers without needing to have direct network access to them, allowing each endpoint’s network policy to be more restrictive. Updates to the policy of the proxy layer can be converged more quickly, without risking changing all the endpoints that run the SansShell server.
Why build SansShell?
Snowflake is committed to providing the strongest possible protections for our customers’ data. Our existing method of managing cloud VMs leverages industry best practice tooling, based on OpenSSH with additional layers for additional access restriction and auditing. While most of our infrastructure is immutable, or updated in place by automation, we must maintain an emergency “break glass” access to rapidly respond to unforeseen circumstances that our automation cannot handle, or at a rate faster than we can recycle all of our immutable infrastructure. Migrating even these rare cases from an interactive session to a non-interactive action give us confidence our audits will be able to reliably detect even the most sophisticated threats we can imagine, by eliminating an entire class of attacks, while allowing us to retain the ability to rapidly respond to scenarios our automation is not prepared for.
To address these issues in our VM maintenance operations, we felt the need to move to a non-interactive API to modify production hosts. This is true for both periodic maintenance which has not yet been incorporated into our immutable infrastructure model, and unavoidable emergency maintenance during active incident response. We considered several options, but did not find any solution that was well suited to our needs.
Why open source SansShell?
When considering our guiding principles of Choosing Open Wisely, it was clear that this was a case where we can help lead the industry toward the highest levels of security, benefit from setting a new standard for host management that future Snowflakes might be familiar with, and hopefully benefit from the increased security scrutiny that the open source model can provide. Like FoundationDB, we believe this instance of “choosing open” provides benefits to industry and to us.
While we are excited about how SansShell improves our infrastructure management, Snowflake intends to remain focused on our simple customer API. We believe it’s important that these infrastructure layers, such as SansShell and FoundationDB, remain internal implementation details of Snowflake. This allows our data security and governance story to continue to improve over time, transparently to our users. Our commitment to simple, elegant, and stable public APIs enables our customers to focus on extracting value from their data, rather than managing infrastructure.