Building a Sagemaker Instance from Scratch (Part 2)

Author: Robert Fehrmann

Engineering, How to Use Snowflake

In part one of this four-part Snowflake/Sagemaker series, I outlined the benefits of machine learning (ML) and why it’s beneficial to store data for ML in Snowflake. Here, in part two, I’ll show you how to get started with Amazon Sagemaker.

Building a Sagemaker instance from scratch requires an AWS login that has a specific set of permissions. Whether or not you will be able to grant these permissions yourself depends upon you having access to the AWS root login. If you’ve created your own AWS account, you have access to the root login. If you are using a corporate sandbox, however, you probably don’t have access to the root login. In this case, ask your AWS security team for assistance.

If you already have an active AWS login, you can skip the “Creating a new user” section and go directly to the permissions section. Otherwise, just follow the steps below or ask your AWS security team to assist you.

You can review the entire blog series here: Part One > Part Two > Part Three > Part Four.  

Creating a new user

Creating an AWS login is handled through Identity and Access Management (IAM). After choosing a User name (see screenshot below), select the access type for your login. Here you want to select  “AWS Management Console access”. You’ll need this type of access later to create the Sagemaker instance via the AWS Web UI. (Note: selecting “Require password reset” is optional; however I recommend using this feature in case you forget your password).

Click the “Next” button to advance to the “Set user details” screen.

 

On the“Set user details” screen, shown below, select: “Attach existing policy directly”, then locate and select the policy called  “SagemakerFullAccess.” Next, you’ll need to select the checkbox alongside of the corresponding policy, then click the “Next” button.

The following “Review” screen appears so you can confirm your selections. Here you want to select “Create User.”

On the same confirmation screen, select “Send email”. This will trigger an automated email with instructions on how to connect to your AWS instance.

Follow the instructions in the email to connect to your AWS instance with your new login. When you login for the first time, you will be asked to change your password. After changing your password, click the “Services” Tab and locate Sagemaker via the search box.

 

 

This will point you to the Sagemaker UI. Here you’ll click on the “Create notebook instance” button.

 

 

Setting up permissions

When attempting to create a Notebook instance, you will likely encounter the “AccessDenied” error message. This happens, by default, because the necessary permissions for working with roles, policies, network interfaces and subnets haven’t been granted yet.

To resolve the “AccessDenied” issue, create an IAM policy with the permissions outlined below. Let’s call this “SagemakerAdditionalPolicy”.

{
    "Version": "2012-10-17",

    "Statement": [

        {

            "Sid": "VisualEditor0",

            "Effect": "Allow",

            "Action": [

                "ec2:DescribeNetworkInterfaces",

                "ec2:DescribeVpcs",

                "kms:ListAliases",

                "iam:ListRoles",

                "ec2:DescribeSubnets",

                "ec2:DescribeSecurityGroups",

                "ec2:CreateNetworkInterface",

                "iam:CreatePolicy",

                "ec2:DeleteNetworkInterface",

                "iam:CreateRole",

                "iam:AttachRolePolicy"

            ],

            "Resource": "*"

        }

    ]

}

After assigning the security policy to your login, the IAM permissions should look like the screen below. It includes the AWS standard password change policy, the SagemakerFullAccess policy and the custom policy you just created (SagemakerAdditionalPolicy).

Next, you need to create a S3 bucket. The S3 bucket acts as the location where you’ll store data for various ML processes, including passing training and test data to the ML algorithms, temporary data and output from the ML algorithms (e.g. model files). Be sure to create the S3 bucket in the same region that you intend to create the Sagemaker instance.

In relation to the processes we’re focusing on here, there is no need to open up access to your S3 bucket from the outside world. It’s also considered best practice to not allow external access to an S3 bucket unless it’s absolutely necessary.

 

The next step is to create a Sagemaker execution role, the purpose of which is to provide the execution engine with the proper permissions. Make sure to specify the S3 bucket you just created. The execution role (including appropriate permissions) will be created automatically when you click the “create role” button.

 

 

You are now ready to create your first Sagemaker instance. (Note: Selecting a VPC is optional).

 

Creating the new Sagemaker instance will take a couple of minutes. Once the process is completed, click the “Open” link in the Sagemaker console to view the Jupyter Notebook.

Security requirements

If you are working in a corporate AWS instance, your security team may classify the additional permissions as intrusive. Why? Because it enables you, through your login, the ability to not only create AWS roles or network interfaces for Sagemaker, but also for the AWS account in general. This may be acceptable in a Sandbox account, but not in a managed environment like Dev/QA/ or Prod. If this is the situation in your case, you can scale down the required permissions in two ways:

First, your login doesn’t require [ “iam:CreatePolicy”,  “iam:CreateRole”, “iam:AttachRolePolicy”] if your security team can create the Sagemaker execution role for you. Using their own credentials, the easiest way for your security team to create the Sagemaker execution role is to follow the steps outlined above.

Second, the actions [“ec2:CreateNetworkInterface”,  “ec2:DeleteNetworkInterface”] are only needed in case you are creating a Notebook instance in a VPC. Creating a Notebook instance in a VPC is only needed in case you are consuming additional resources like data from an EMR cluster.

Conclusion

In part 2, we’ve  stepped through the process of creating a Sagemaker instance from scratch – including creating a new AWS login, granting the necessary permissions, and creating the Sagemaker instance through the Sagemaker UI.

In part 3 three, I will outline how to connect Sagemaker and Snowflake through the Snowflake Python connector.

You can review the entire blog series here: Part One > Part Two > Part Three > Part Four.