Amazon Athena
A serverless interactive query service that lets you run full SQL queries directly on data stored in S3.
Amazon S3 Select
A feature of S3 that lets you retrieve only a subset of data from a single object using simple SQL expressions.
Key Differences
| Feature | Athena | S3 Select |
|---|---|---|
| Scope | Query across multiple files | Query within a single object |
| SQL Support | Full ANSI SQL | Limited SQL (simple SELECT, WHERE) |
| Use Case | Analytics, reporting, BI | Efficient object-level filtering |
| Performance | Scans full dataset (optimized by partitioning) | Reads only selected data from object |
| Pricing | Per TB scanned | Per GB scanned + data returned |
| Schema | Requires table definition (Glue/Data Catalog) | No external catalog needed |
| Joins | Yes | No |
| Aggregations | Yes | Very limited |
When to Use Each
Use Athena when:
- You need to query large datasets across many files
- You need joins, aggregations, grouping
- You're connecting BI tools (e.g., QuickSight)
- You want serverless analytics without managing infrastructure
Example:
SELECT customer_id, SUM(amount)
FROM transactions
GROUP BY customer_id;
Use S3 Select when:
- You need to fetch a small portion of a single large file
- You want to reduce network transfer
- You’re building an application that reads filtered object data
- You need low-latency object-level filtering
Example:
SELECT * FROM s3object s WHERE s.status = 'active'
Cost Consideration
- Athena can become expensive if queries scan large unpartitioned datasets.
- S3 Select is often cheaper when extracting small pieces of large objects.
Simple Mental Model
- Athena = Data warehouse-style querying over S3
- S3 Select = Smart “grep” inside one S3 file
No comments:
Post a Comment