Week 3: Storage and Databases

第三周:有空就学学,没空就不学

Instance Stores and Amazon Elastic Block Store (Amazon EBS)

Instance Store:

  • An Instance Store provides temporary block-level storage for an Amazon EC2 instance.
  • It is disk storage physically attached to the host computer for an EC2 instance, sharing the same lifespan as the instance.
  • When an EC2 instance is terminated, any data stored in the Instance Store is lost, making it suitable for temporary storage needs.

Amazon Elastic Block Storage (Amazon EBS):

  • Amazon EBS is a service that offers block-level storage volumes for use with Amazon EC2 instances.
  • When you stop or terminate an Amazon EC2 instance, all data on the attached EBS volume remains intact.
  • To create an EBS volume, you define its configuration, including volume size and type, and then provision it. Once created, an EBS volume can be attached to an Amazon EC2 instance.
  • Since EBS volumes are meant for data that needs to persist, it’s crucial to back up the data. Incremental backups of EBS volumes can be achieved by creating Amazon EBS snapshots.

Amazon EBS Snapshots:

  • An EBS snapshot is an incremental backup method. The initial backup of a volume copies all the data, while subsequent backups save only the blocks of data that have changed since the most recent snapshot.
  • Incremental backups differ from full backups, where all data in a storage volume is copied during each backup. Full backups include data that hasn’t changed since the last backup.

In summary, Instance Store is used for temporary data and is volatile, whereas Amazon EBS provides persistent block-level storage, requiring regular backups through EBS snapshots to protect data.

Amazon Simple Storage Service (Amazon S3)

  1. Amazon S3 Standard:

    • Designed for frequently accessed data.
    • Provides high availability, storing data in a minimum of three Availability Zones.
    • Suitable for a wide range of use cases like websites, content distribution, and data analytics.
    • Higher cost compared to storage classes intended for infrequently accessed data.
  2. Amazon S3 Standard-Infrequent Access (S3 Standard-IA):

    • Ideal for infrequently accessed data that requires high availability.
    • Lower storage cost but higher retrieval cost compared to the standard class.
    • Data is stored in a minimum of three Availability Zones, offering the same availability as the standard class.
    • Suitable for data like backups.
  3. Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA):

    • Stores data in a single Availability Zone, reducing costs.
    • Suitable if data can be easily reproduced in case of an Availability Zone failure.
    • Lacks redundancy across multiple Availability Zones.
  4. Amazon S3 Intelligent-Tiering:

    • Ideal for data with unknown or changing access patterns.
    • Automatically moves objects to different storage tiers based on access patterns to reduce costs.
    • Requires a small monthly monitoring and automation fee per object.
  5. Amazon S3 Glacier Instant Retrieval:

    • Works well for archived data that requires immediate access.
    • Objects can be retrieved within milliseconds, with performance similar to the standard class.
  6. Amazon S3 Glacier Flexible Retrieval:

    • Low-cost storage class designed for data archiving.
    • Objects can be retrieved within minutes to hours.
    • Suitable for long-term data retention and archiving.
  7. Amazon S3 Glacier Deep Archive:

    • Lowest-cost object storage class ideal for archiving.
    • Objects can be retrieved within 12 hours.
    • All objects are replicated and stored across at least three geographically dispersed Availability Zones.

quiz

Comparing Amazon EBS and Amazon S3

  1. 存储类别对比:

    • 块存储(Amazon Elastic Block Storage):最大可达16 tebibytes,可以在Amazon EC2实例终止时存活,有固态和旋转盘两种类型。
    • 区域对象object存储(Amazon Simple Storage Service):具有无限存储容量,单个对象可达5,000 gigabytes,适合一次写入多次读取,可靠性为99.999999999%。
  2. 用例1 - S3胜出:

    • 假设你运行一个照片分析网站,用户上传自己的照片,你的应用程序会找到与他们相似的动物照片。
    • 需要存储数百万张动物照片,需要索引和同时供成千上万人查看。
    • S3适合这个用例,因为它已经启用了Web,并且每个对象都有一个URL,可以控制访问权限。
    • S3在区域分布上具有极高的可靠性,不需要担心备份策略。并且S3成本较低,无需Amazon EC2实例。
  3. 用例2 - EBS胜出:

    • 假设你有一个80-gigabyte的视频文件需要进行编辑更正。
    • 对象存储(如S3)适用于整个对象的上传和消耗,但每次更改对象时,必须重新上传整个文件,没有增量更新。
    • 块存储(EBS)适用于将文件分解为小组件部分或块,因此对于频繁进行微小更改的情况非常适用。
    • EBS在这种情况下胜出,因为它只更新需要更改的块,而不必每次上传整个80-gigabyte文件。
  4. 结论:

    • 取决于个人工作负载和需求。
    • S3适合处理完整对象或偶尔更改的情况。
    • EBS适用于复杂的读写和更改操作。

EFS & EBS

Amazon Elastic File System(EFS) 是AWS提供的托管文件系统服务,适用于需要共享文件存储的企业应用,能够自动扩展、提供冗余性,并可供多个EC2实例同时访问。

Amazon Elastic Block Store(EBS)是AWS提供的块级存储服务,附加到特定的EC2实例,适用于单一实例的存储需求,不具备自动扩展功能,且限定在同一可用性区域内。

Amazon RDS database engines

Amazon RDS is available on six database engines, which optimize for memory, performance, or input/output (I/O).

“提取和迁移”(Lift-and-Shift)

Amazon Aurora

Amazon Aurora is an enterprise-grade relational database compatible with MySQL and PostgreSQL. It offers up to five times faster performance compared to standard MySQL databases and up to three times faster performance compared to standard PostgreSQL databases. Aurora reduces costs by optimizing input/output operations and ensures high availability by replicating data across three Availability Zones and backing up data to Amazon S3.

Amazon DynamoDB

Serverless, Automatic Scaling

Amazon DynamoDB是一种无需管理底层基础设施的NoSQL数据库服务。它适用于需要高可用性和高性能的应用程序。DynamoDB不需要预定义的表结构,可以灵活存储和查询数据。这是一种高度扩展的服务,可处理大型工作负载,响应速度非常快。但它不使用传统的SQL查询语言,而是提供简单的查询功能,适用于某些特定用途的应用程序。

In a key-value database, you can add or remove attributes from items in the table at any time. Additionally, not every item in the table has to have the same attributes.

Example of data in a nonrelational database:

Key Value
1 Name: Mary Major Address: 100 Main Street Birthday: July 5, 1994
2 Name: John Doe Address: 123 Any Street Favorite drink: Medium latte

DynamoDB 的核心概念简介

Comparing Amazon RDS and Amazon DynamoDB

RDS适用于需要处理复杂关系和联接的业务需求,而DynamoDB适用于需要高吞吐量和快速查询的用例,特别是对于单一表格的数据。

  1. Amazon RDS:

    • 适用于需要复杂关系和联接的用例,如业务分析。
    • 具有复杂的关系功能,适合处理跨多个表的数据。
    • 优势在于处理涉及复杂数据关系的业务需求,例如销售供应链管理系统。
  2. Amazon DynamoDB:

    • 适用于不需要复杂关系的用例,如键值对数据。
    • 具有高吞吐量和潜在的PB级数据规模。
    • 适用于构建快速数据库,无需复杂的联接功能,例如员工联系信息列表。

Amazon Redshift

Amazon Redshift是一种用于历史数据分析的高性能数据仓库服务

  1. 历史数据分析需求

    • 有时,业务需要不仅关注当前数据,还需要分析过去的数据。
    • 传统关系型数据库适用于实时读写操作,但无法轻松处理大量历史数据分析需求。
  2. 数据仓库

    • 数据仓库是专门设计用于处理大数据的数据库,用于历史分析而非操作性分析。
    • 数据仓库能够处理大量数据的查询,包括来自不同数据存储的数据,如库存、财务和零售销售系统。
  3. Amazon Redshift

    • Amazon Redshift是数据仓库即服务,可实现大规模扩展。
    • 可与Amazon Redshift Spectrum协作,直接查询数据湖中的海量非结构化数据。
    • 提供高达传统数据库的10倍性能的创新性能,特别适用于业务智能工作负载。
  4. 优势

    • Amazon Redshift减少了数据仓库团队的工作,使其能够专注于数据分析而非维护引擎。
    • 提供快速启动,减少等待结果的时间,更多时间用于获取答案。

Amazon Database Migration Service(DMS)

  1. 简介

    • Amazon DMS是AWS提供的数据库迁移服务,用于将现有数据库从本地或其他云平台迁移到AWS上。
  2. 迁移类型

    • 同构homogenous迁移:即source和目标target数据库类型相同,例如MySQL到Amazon RDS for MySQL,Microsoft SQL Server到Amazon RDS for SQL Server,或Oracle到Amazon RDS for Oracle。
    • 异构heterogeneous迁移:即源和目标数据库类型不同,需要使用AWS Schema Conversion Tool将源架构schema和代码转换为目标数据库兼容格式,然后使用DMS迁移数据。
  3. 迁移用途

    • DMS可用于不同场景,包括数据库迁移、开发和测试数据库迁移、数据库合并consolidation以及持续数据库复制。

    Enabling developers to test applications against production data without affecting production users.

    Database consolidation is when you have several databases and want to consolidate them into one central database.

    Continuous replication: Sending ongoing copies of your data to other target sources instead of doing a one-time migration.

  4. 关键优势

    • 源数据库在迁移过程中保持完全可操作,最大程度减少了与数据库迁移相关的应用程序停机时间downtime
    • DMS支持多种源和目标数据库类型的迁移。
    • 迁移任务可轻松创建,启动迁移操作并由DMS自动管理。

Additional Database Services

  1. 数据库

    • Amazon DocumentDB

      • 适用于需要快速、高可用、高度可扩展的键-值对数据库的用例。
      • 常用于应用程序数据、用户配置、会话存储等需要快速读写的工作负载。
    • Amazon DocumentDB

      • 适用于需要处理半结构化文档数据的用例,如内容管理系统、目录、用户配置文件等。
      • 提供MongoDB兼容性,使MongoDB应用程序可以无缝迁移到Amazon DocumentDB。
    • Amazon Neptune

      • 适用于需要构建和查询图形数据结构的用例,如社交网络、推荐引擎、欺诈检测等。
      • 支持图数据库的功能,用于处理复杂的关系数据。
    • **Amazon QLDBQuantum Ledger Database**:

      • 适用于需要不可变的、不可篡改的数据记录的用例,如供应链跟踪、金融记录、法律合同等。
      • 提供不可变的账本,确保每个数据条目都可以进行审计,不可删除或修改。You can use Amazon QLDB to review a complete history of all the changes that have been made to your application data. 虽然数据本身不能被修改,但您可以查看数据的不同版本,以了解数据的演进。
  2. 加速选项:提高数据库读取速度,特别适用于常见请求的快速读取。

    • Amazon ElastiCache

      • a service that adds caching layers on top of your databases to help improve the read times of common requests.
      • It supports two types of data stores: Redis and Memcached.
    • Amazon DynamoDB Accelerator

      • an in-memory cache for DynamoDB.
      • It helps improve response times from single-digit milliseconds to microseconds.
  3. 总结:确保选择适合特定工作负载的最佳工具,而不是迫使数据适应数据库的要求。根据业务需求,可以选择不同类型的数据库和加速选项。


甚至还有: Amazon Managed Blockchain —— a distributed ledger system that lets multiple parties run transactions and share data without a central authority.