Data Engineering Certification Prep Series – Tip #11
Migrating Hive Metastore to AWS Glue Data Catalog for a Serverless Future
Problem
A company is planning to migrate its on-premises Apache Hadoop clusters to Amazon EMR. Along with this, the company needs to migrate its Hive metastore, which is currently stored on-premises. The new solution must be:
Options
A. Use AWS DMS to migrate the Hive metastore into Amazon S3. Configure AWS Glue Data Catalog to scan Amazon S3.
B. Configure a Hive metastore in Amazon EMR. Migrate the existing Hive metastore into Amazon EMR. Use AWS Glue Data Catalog to store the company’s external data catalog.
C. Configure an external Hive metastore in Amazon EMR. Migrate the existing Hive metastore into Amazon EMR. Use Amazon Aurora MySQL to store catalog.
D. Configure a new Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use the new metastore as the company’s data catalog.
Options Analysis
A. Use AWS DMS to migrate the Hive metastore into Amazon S3. Configure AWS Glue Data Catalog to scan Amazon S3.
B. Configure a Hive metastore in Amazon EMR. Migrate the existing Hive metastore into Amazon EMR. Use AWS Glue Data Catalog to store the company’s external data catalog.
Recommended by LinkedIn
C. Configure an external Hive metastore in Amazon EMR. Migrate the existing Hive metastore into Amazon EMR. Use Amazon Aurora MySQL to store catalog.
D. Configure a new Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use the new metastore as the company’s data catalog.
Correct Answer: B
Use AWS Glue Data Catalog as the persistent, serverless solution for Hive metastore migration.
Key Takeaways
References