Learn the Kinesis, Glue, Athena, EMR, DataSync, Lake Formation, and transfer-pattern choices AWS tests for SAA-C03 ingestion and transformation scenarios.
This newer SAA-C03 task group is about building data paths that scale cleanly from ingestion through transformation and analytics. The exam is not testing you as a dedicated data engineer. It is testing whether you can choose the right AWS-managed path for transfer, streaming, transformation, and analysis requirements.
The exam guide points to analytics and visualization services, ingestion patterns, transfer services such as DataSync and Storage Gateway, transformation services such as Glue, secure access to ingestion points, streaming services such as Kinesis, and format transformation choices.
| Requirement | Strongest first fit | Why |
|---|---|---|
| Real-time streaming ingestion | Kinesis | Purpose-built for streaming pipelines |
| Managed ETL and cataloging | Glue | Strong fit for transformation workflows and data catalog integration |
| Query data in place in S3 | Athena | Fast analytical query pattern without managing clusters |
| Large-scale data processing cluster | EMR | Better fit for heavier distributed processing needs |
| Online or batch transfer into AWS storage | DataSync or Storage Gateway | Stronger than custom copy scripts for transfer patterns |
| Stage | Typical service fit | What the exam is really asking |
|---|---|---|
| Transfer into AWS | DataSync or Storage Gateway | How the data gets there reliably |
| Real-time streaming | Kinesis | How events flow continuously |
| Transformation and cataloging | Glue | How raw data becomes usable |
| Query and visualization | Athena, QuickSight, or another analytics layer | How users and systems consume the result |
flowchart LR
I["Ingestion"] --> T["Transform"]
T --> L["Lake or storage layer"]
L --> A["Analytics or visualization"]
The exam often asks which stage is the real decision point. If the problem is transfer, Glue is usually not the answer. If the problem is transformation, DataSync is usually not the answer.
1Resources:
2 AppEventsStream:
3 Type: AWS::Kinesis::Stream
4 Properties:
5 Name: app-events
6 ShardCount: 2
7 RetentionPeriodHours: 24
What to notice:
| Symptom | Strongest first check | Why |
|---|---|---|
| Data arrives slowly from on-premises systems | Transfer method and network path | This is usually a transfer problem before it is a Glue or Athena problem |
| Data is present in S3 but analysts cannot query it effectively | Catalog and format layer | Query services work best when the data is shaped and described correctly |
| The team is managing clusters for simple transformation work | EMR versus managed ETL fit | The exam often prefers managed transformation when cluster control is unnecessary |
| Streaming consumers fall behind | Stream throughput and consumer design | This is an ingestion-capacity and consumer-scaling question, not a pure storage question |
Move next into 4. Cost-Optimized Architectures to study how the same storage, compute, database, and network choices change when cost becomes the deciding constraint.