Amazon Redshift | Server Management Tips

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. A data warehouse architecture consists of three tiers. The bottom tier of the architecture is the database server, where data is loaded and stored. The middle tier consists of the analytics engine that is used to access and analyze the data. The top tier is the front-end client that presents results via reporting, analysis, and data-mining tools.

Currently, there are many data warehouse services, including DWaaS, Actian, Amazon Web Services (AWS), Hewlett Packard Enterprise (HPE), IBM, Microsoft, and Oracle.
AWS Redshift is very easy to use compared to other EDW providers. Amazon Redshift automates the common administrative tasks to help manage, monitor, and scale your data warehouse with push-button simplicity. This eliminates the undifferentiated heavy lifting commonly encountered when managing a data warehouse and effectively liberates one to focus on analytics and core business needs.
Compared to more traditional legacy data warehouses, Amazon Redshift offers a blend of entry-level affordability and massive cost efficiency at scale. You can have unlimited users perform unlimited analytics on all your data for just $1,000 per terabyte per year.
Amazon Redshift manages provisioning, configuration, and patching. Data durability and availability are also ensured via automatic replication and backups to Amazon S3. Scaling is simplified by adding or removing nodes with a single API call or via the Amazon AWS management console.

How to start using Amazon Redshift?

The major steps for getting started with AWS Redshift are outlined below.

Setting up the AWS account from https://aws.amazon.com
Install SQL Client Drivers and Tools (You must install any third-party database tools that you want to use with your clusters; Amazon Redshift does not provide or install any third-party tools or libraries.)
Determine Firewall Rules (Amazon Redshift uses port 5439 by default)
Create an IAM role for Redshift.
Launching the Redshift Cluster.

On the Amazon Redshift Dashboard, choose Launch Cluster.
On the Cluster Details page, enter the following values and then choose Continue:
Cluster Identifier: type examplecluster.
Database Name: leave this box blank. Amazon Redshift will create a default database named dev.
Database Port: Enter the port number the database will accept connections on. You should have determined the port number in the prerequisite step of this tutorial. You cannot change the port after launching the cluster, so make sure you have an open port in your firewall so you can connect from SQL client tools to the database in the cluster.
Master User Name: type masteruser. You will use this username and password to connect to your database after the cluster is available.
Master User Password and Confirm Password: type a password for the master user account.

On the Node Configuration page, select the following values and then choose Continue:

Node Type: dc2.large
Cluster Type: Single Node

On the Additional Configuration page, you will see different options depending on your AWS account, which determines the type of platform the cluster uses. To keep things simple for this tutorial, you do not need to understand the distinction between these platforms, EC2-Classic and EC2-VPC.

Associate an IAM role with the cluster.

For AvailableRoles, select myRedshiftRole, then select Continue.

On the Review page, review the selections that you’ve made and then choose Launch Cluster.

Managing the Clusters

There are several ways to manage clusters. If you prefer a more interactive way to manage clusters, you can use the Amazon Redshift console or the AWS Command-Line Interface (AWS CLI). If you are an application developer, you can use the Amazon Redshift Query API or the AWS Software Development Kit (SDK) libraries to manage clusters programmatically. If you use the Amazon Redshift Query API, you must authenticate every HTTP or HTTPS request to the API by signing it.

Amazon Redshift manages all the work of setting up, operating and scaling a data warehouse: provisioning capacity, monitoring and backing up the cluster, and applying patches and upgrades to the Amazon Redshift engine.

You can determine the Amazon Redshift engine and database versions for your cluster in the Cluster Version field in the console. The first two sections of the number are the cluster version, and the last section is the specific revision number of the database in the cluster.

An Amazon Redshift cluster consists of nodes. Each cluster has a leader node and one or more compute nodes. The leader node receives queries from client applications, parses the queries, and develops query execution plans. The leader node then coordinates the parallel execution of these plans with the compute nodes and aggregates the intermediate results from these nodes. It then finally returns the results back to the client applications.

Compute nodes execute the query execution plans and transmit data among themselves to serve these queries. The intermediate results are sent to the leader node for aggregation before being sent back to the client applications.

If you require help, contact SupportPRO Server Admin