Tuesday, February 24, 2026

Athena Vs S3 Select

Amazon Athena

A serverless interactive query service that lets you run full SQL queries directly on data stored in S3.

Amazon S3 Select

A feature of S3 that lets you retrieve only a subset of data from a single object using simple SQL expressions.


Key Differences

FeatureAthenaS3 Select
ScopeQuery across multiple filesQuery within a single object
SQL SupportFull ANSI SQLLimited SQL (simple SELECT, WHERE)
Use CaseAnalytics, reporting, BIEfficient object-level filtering
PerformanceScans full dataset (optimized by partitioning)Reads only selected data from object
PricingPer TB scannedPer GB scanned + data returned
SchemaRequires table definition (Glue/Data Catalog)No external catalog needed
JoinsYesNo
AggregationsYesVery limited

When to Use Each

Use Athena when:
  • You need to query large datasets across many files
  • You need joins, aggregations, grouping
  • You're connecting BI tools (e.g., QuickSight)
  • You want serverless analytics without managing infrastructure

Example:

SELECT customer_id, SUM(amount)
FROM transactions
GROUP BY customer_id;

Use S3 Select when:

  • You need to fetch a small portion of a single large file
  • You want to reduce network transfer
  • You’re building an application that reads filtered object data
  • You need low-latency object-level filtering

Example:

SELECT * FROM s3object s WHERE s.status = 'active'

Cost Consideration

  • Athena can become expensive if queries scan large unpartitioned datasets.
  • S3 Select is often cheaper when extracting small pieces of large objects.


Simple Mental Model

  • Athena = Data warehouse-style querying over S3
  • S3 Select = Smart “grep” inside one S3 file

No comments:

Post a Comment

Node | Cluster Vs Worker Threads

Cluster: Multiple processes (scale app across CPU cores) Worker Threads: Multiple threads (handle CPU-heavy work inside one process) Cluster...