AWS Simplified #2  – #Databases

AWS Simplified #2 – #Databases

S.G.N.

1)#Dynamo DB :-

· No SQL , Schema less , Key-value , high performant , highly available database

· Primary key is partition key + index key (global index on full table max. 20 allowed , local index on sort / index / secondary key – max. 5)

·By default data replicated in 3 AZ – eventual consistency – costly

·Need to configure throughput (no of read capacity units (RCU ) & write capacity units (WCU) per second )at the time of table creation

·Two types of throughput – provisioned with default 5 RCU & 5 WCU – recommended for predicted workload – On demand – unpredicted workload – costly

·Price basis throughput configured & storage used

·Performance is dependent on throughput configured . This database is low latency irrespective of size of data

·Record size limitation – 400 kb , SQL query not much flexible . cannot use joins / complex queries.

·Data at rest encryption

·Highly suitable for gaming, web, mobile & IOT

·Point in time recovery possible up to past 35 days

 2)DynamoDB #Accelerator (DAX) :-

 In memory cache – service helps to increase DynamoDB performance up to 10 times.

·It also helps reduce throughput read capacity unit cost.

·3 to 10 nodes. One primary rest read replicas.

·Placed inside VPC – across various subnets. Dynamodb is outside VPC accessed via endpoint.

·Client DynamoDB installed on EC2. It helps intercept request to DynamoDB and redirects it to DAX.

·All read req. will go to DAX. If read miss then directed to database and results written on DAX.

·All write will go to database and data updated in DAX.

·No table operations performed by DAX – create / delete / update etc.

·Data at rest encryption – KMS

3)#ElastiCache :-

·In memory database cache provides ultra high Read performance

·Mostly used with RDS & Open source (can be used with dynamo as well)

·Two types of Engines – Memcached (for session store) & Redis (more features)

·Three components – Node (fixed chunk of memory – ram) , Shard (up to 6 nodes) , cluster (1 -90 shards only if cluster mode enabled else only one shard)

·Best used for Gaming – score board, social media (session mgmt.) , google find information – best restaurant etc. and data analytics .

·Data at rest encrypted.

4)AWS Neptune Database Service :-

·Graphic database (can create complex query across various relationships of data)

·In cluster one primary rest all read replicas (up to 15 possible in one cluster) . If primary fails , any read replica can be promoted to Primary

·Each node has storage volume (they all see the same storage volume). If any defect in volume , it will self heal with help from other storage volumes.

·Database instance can be accessed via. Endpoint url – cluster (primary – read+write), Reader & Instance (points to specific instance) . Only one read url even if multiple read replicas. Round robin fashion – no load balancing.

5)Amazon #Redshift :-

·RDBMS based Datawarehouse – huge data up to petabytes – for data analysis

·Data from various sources are stored here to run analytics tools – various activities like data cleansing , ETL etc. can be performed to generate desired result and reports

·Each Redshift cluster has Database, Redshift engine, compute node and Leader Node

·Compute node come with various types (with various chunks of memory and CPU) . It is divided into node slices for parallel execution of tasks.

·Leader node will make execution plan based on request from external applications , tasks are then executed by compute nodes and data consolidated by leader node

·If query output already present in leader node cache , query not sent to compute nodes.

·Leader node is gateway between external applications and compute nodes .

·IAM roles needed if redshift wants to access data lake on S3

6)#DocumentDB (Mongo DB):-

·Compatible with Mongo DB that stores json like documents. Data migration service can be used to move mongo documents to DocumentDB.

·Indexing used to search document quickly.

·#Document DB is created in VPC

·It’s design same as Neptune – cluster with primary and read replicas.

·Each node see same storage volume . If primary fails , read replica promoted to primary .

·Database access via URL . Same as Neptune – cluster , Reader and instance

·DocumentDB provides daily automatic backup . Backup retention period can be set . For backup to take place it should be at least 1 day . If retention is 0 days , no backup takes place.

·Due to provision of backup , point in time recovery is possible .

7)Amazon KeySpace Database service (compatible with #Cassandra):-

·KeySpace means group of tables . to create this, we first have to create key space and then add tables.

·It is Serverless

·This is no SQL database . Uses Cassandra query language (CQL) similar to SQL

·Same like DynamoDB, this database is used for very high throughput , huge data , high scalability , availability etc.Comes with two modes of throughput– On demand (default, unpredictable workload) and provisioned (predictable) .

·Keyspace , like Cassandra is like cluster made up of nodes. Being serverless managed by AWS.

·Usecases : Route optimization applications , Trade monitoring (where low latency required)

·Pricing based on – what we use – read / write operations

8)Amazon Quantum Ledger Database (QLDB) :-

·Ledger database , stores immutable data in tables in ION document form (superset of json)

·Each change made to data is tracked and recorded inside Journal .

·Journal data encrypted with SHA 256 – cryptography to ensure high security and data integrity .

This service is Serverless

·Two types of storage : Journal and Index (documents are indexed for query purpose)

·Usecases :- Financial data storage , Payroll , Insurance claims

·This service is integrated with Amazon Kinesis – data streaming . quatam - ledger data streams are sent to kinesis for real time analysis and action – to drive events . For eg. Lamda function can be used when user’s account goes below some threshold value to send SNS notification.

9)Data Lake :-

· AWS #Lake formation service available to form data lake

·Data lake is one place where all types of data from various sources can be dumped .

·Main diff between data warehouse and data lake is that data warehouse is structed form of data – normalized – subset of data lake – meant for specific analysis purpose

·Data lake has both structured and unstructured data .

·S3 used as data storage which can store all forms of data

·Good data lake solution needs 5 things – storage , data movement , catalogue / data discovery (AWS glue can be used) , general analytics and predictive analytics



To view or add a comment, sign in

More articles by Rupali Giri

  • The Future Is Hybrid: A Story We’re All Stepping Into #TheFutureIsHybrid #HumanAndAI

    A few years ago, the conversations in boardrooms and tech huddles sounded very different. People wondered: “Will AI…

  • #Spring #Container / #IOC

    [ Lot of content already available on net with various terminologies and understandings around the various related…

  • #1 - #Framework basics

    What is Framework ? - Framework is installable software (Api / libraries) that provides abstraction on one or more…

  • #High Availability

    High Availability means Designing For failure . We should design a system such that it is available in case of any…

  • AWS Networking Summary

    #AWS - #Networking quick notes 1) All subnets within VPC are allowed to talk to each other . They (public / private)…

  • #JVM - Java Virtual Machine

    S.G.

Others also viewed

Explore content categories