AWS Simplified #2 – #Databases

Rupali Giri

Published Aug 5, 2023

S.G.N.

· No SQL , Schema less , Key-value , high performant , highly available database

· Primary key is partition key + index key (global index on full table max. 20 allowed , local index on sort / index / secondary key – max. 5)

·By default data replicated in 3 AZ – eventual consistency – costly

·Need to configure throughput (no of read capacity units (RCU ) & write capacity units (WCU) per second )at the time of table creation

·Two types of throughput – provisioned with default 5 RCU & 5 WCU – recommended for predicted workload – On demand – unpredicted workload – costly

·Price basis throughput configured & storage used

·Performance is dependent on throughput configured . This database is low latency irrespective of size of data

·Record size limitation – 400 kb , SQL query not much flexible . cannot use joins / complex queries.

·Data at rest encryption

·Highly suitable for gaming, web, mobile & IOT

·Point in time recovery possible up to past 35 days

2)DynamoDB #Accelerator (DAX) :-

In memory cache – service helps to increase DynamoDB performance up to 10 times.

·It also helps reduce throughput read capacity unit cost.

·3 to 10 nodes. One primary rest read replicas.

·Placed inside VPC – across various subnets. Dynamodb is outside VPC accessed via endpoint.

·Client DynamoDB installed on EC2. It helps intercept request to DynamoDB and redirects it to DAX.

·All read req. will go to DAX. If read miss then directed to database and results written on DAX.

·All write will go to database and data updated in DAX.

·No table operations performed by DAX – create / delete / update etc.

·Data at rest encryption – KMS

3)#ElastiCache :-

·In memory database cache provides ultra high Read performance

·Mostly used with RDS & Open source (can be used with dynamo as well)

·Two types of Engines – Memcached (for session store) & Redis (more features)

·Three components – Node (fixed chunk of memory – ram) , Shard (up to 6 nodes) , cluster (1 -90 shards only if cluster mode enabled else only one shard)

·Best used for Gaming – score board, social media (session mgmt.) , google find information – best restaurant etc. and data analytics .

·Data at rest encrypted.

4)AWS Neptune Database Service :-

·Graphic database (can create complex query across various relationships of data)

·In cluster one primary rest all read replicas (up to 15 possible in one cluster) . If primary fails , any read replica can be promoted to Primary

·Each node has storage volume (they all see the same storage volume). If any defect in volume , it will self heal with help from other storage volumes.

·Database instance can be accessed via. Endpoint url – cluster (primary – read+write), Reader & Instance (points to specific instance) . Only one read url even if multiple read replicas. Round robin fashion – no load balancing.

5)Amazon #Redshift :-

·RDBMS based Datawarehouse – huge data up to petabytes – for data analysis

·Data from various sources are stored here to run analytics tools – various activities like data cleansing , ETL etc. can be performed to generate desired result and reports

·Each Redshift cluster has Database, Redshift engine, compute node and Leader Node

Recommended by LinkedIn

Query S3 Tables from AWS Lambda Using DuckDB and Glue…

Soumil S. 1 year ago

The Database of the Future

John Abele 8 years ago

☁️ Connecting Databricks with AWS S3 & Azure Data…

Jayrajsinh Zala 3 months ago

·Compute node come with various types (with various chunks of memory and CPU) . It is divided into node slices for parallel execution of tasks.

·Leader node will make execution plan based on request from external applications , tasks are then executed by compute nodes and data consolidated by leader node

·If query output already present in leader node cache , query not sent to compute nodes.

·Leader node is gateway between external applications and compute nodes .

·IAM roles needed if redshift wants to access data lake on S3

6)#DocumentDB (Mongo DB):-

·Compatible with Mongo DB that stores json like documents. Data migration service can be used to move mongo documents to DocumentDB.

·Indexing used to search document quickly.

·#Document DB is created in VPC

·It’s design same as Neptune – cluster with primary and read replicas.

·Each node see same storage volume . If primary fails , read replica promoted to primary .

·Database access via URL . Same as Neptune – cluster , Reader and instance

·DocumentDB provides daily automatic backup . Backup retention period can be set . For backup to take place it should be at least 1 day . If retention is 0 days , no backup takes place.

·Due to provision of backup , point in time recovery is possible .

7)Amazon KeySpace Database service (compatible with #Cassandra):-

·KeySpace means group of tables . to create this, we first have to create key space and then add tables.

·It is Serverless

·This is no SQL database . Uses Cassandra query language (CQL) similar to SQL

·Same like DynamoDB, this database is used for very high throughput , huge data , high scalability , availability etc.Comes with two modes of throughput– On demand (default, unpredictable workload) and provisioned (predictable) .

·Keyspace , like Cassandra is like cluster made up of nodes. Being serverless managed by AWS.

·Usecases : Route optimization applications , Trade monitoring (where low latency required)

·Pricing based on – what we use – read / write operations

8)Amazon Quantum Ledger Database (QLDB) :-

·Ledger database , stores immutable data in tables in ION document form (superset of json)

·Each change made to data is tracked and recorded inside Journal .

·Journal data encrypted with SHA 256 – cryptography to ensure high security and data integrity .

This service is Serverless

·Two types of storage : Journal and Index (documents are indexed for query purpose)

·Usecases :- Financial data storage , Payroll , Insurance claims

·This service is integrated with Amazon Kinesis – data streaming . quatam - ledger data streams are sent to kinesis for real time analysis and action – to drive events . For eg. Lamda function can be used when user’s account goes below some threshold value to send SNS notification.

9)Data Lake :-

· AWS #Lake formation service available to form data lake

·Data lake is one place where all types of data from various sources can be dumped .

·Main diff between data warehouse and data lake is that data warehouse is structed form of data – normalized – subset of data lake – meant for specific analysis purpose

·Data lake has both structured and unstructured data .

·S3 used as data storage which can store all forms of data

·Good data lake solution needs 5 things – storage , data movement , catalogue / data discovery (AWS glue can be used) , general analytics and predictive analytics

To view or add a comment, sign in

AWS Simplified #2 – #Databases

Rupali Giri

Recommended by LinkedIn

More articles by Rupali Giri

Others also viewed

Learn How to Build a Datalake with DuckLake, DuckDB, and AWS S3 Express One Zone

Microsoft Azure Data Storage

Introducing Catalog Federation for Apache Iceberg Tables in the AWS Glue Data Catalog

TDA#1: Amazon S3 Tables

Managing different document versions in the same collection of Azure Cosmos DB

Read/Write ( mount ) from AWS S3 from Databricks

Amazon S3 Tables: A (Very) High Level Overview

Data Services Evolution: Highlights of AWS Transformations in 2023 and Anticipations for 2024"

Architecting Services on Azure to handle heavy Web Traffic

Explore content categories

Recommended by LinkedIn

More articles by Rupali Giri

The Future Is Hybrid: A Story We’re All Stepping Into #TheFutureIsHybrid #HumanAndAI

#Spring #Container / #IOC

#1 - #Framework basics

#High Availability

AWS Networking Summary

#JVM - Java Virtual Machine

Others also viewed

Learn How to Build a Datalake with DuckLake, DuckDB, and AWS S3 Express One Zone

Microsoft Azure Data Storage

Introducing Catalog Federation for Apache Iceberg Tables in the AWS Glue Data Catalog

TDA#1: Amazon S3 Tables

Managing different document versions in the same collection of Azure Cosmos DB

Read/Write ( mount ) from AWS S3 from Databricks

Amazon S3 Tables: A (Very) High Level Overview

Data Services Evolution: Highlights of AWS Transformations in 2023 and Anticipations for 2024"

Architecting Services on Azure to handle heavy Web Traffic

Similar topics

How AWS Simplifies Cloud Architecture

Explore content categories