Skip to content
sagi.org

How intelligent is S3 Intelligent-Tiering?

Amazon S3 Intelligent-Tiering is the storage class you point your data at when you don’t want to think about lifecycle policies. S3 moves objects to cheaper tiers as access drops off, and there are no retrieval fees on the lower tiers.

But how intelligent is it, really? About as intelligent as it can be, given what S3 actually knows — which isn’t much.

An object stored in Intelligent-Tiering starts in the Frequent Access tier; if it hasn’t been accessed for 30 days it is automatically moved to the Infrequent Access tier, and after 90 days with no access it is moved to the Archive Instant Access tier. Two further tiers are opt-in: Archive Access (90+ days without access) and Deep Archive Access (180+ days). Objects in the opt-in archive tiers need to be restored before they can be read.

If an object is later accessed, S3 automatically promotes it back into the Frequent Access tier, and the cycle starts again.

Object size matters

Intelligent-Tiering charges a small per-object automation fee on top of the storage rate, currently $2.50 per million objects per month in us-east-1. Whether that’s expensive really depends on the average object size — for small objects, it can be significant relative to the storage fee itself. (The fee only applies to objects of 128 KB or larger; smaller objects aren’t monitored or tiered, and stay in Frequent Access.)

To illustrate, suppose you keep a petabyte of data in the Archive Instant Access tier. The storage cost for that would be approximately $4,000/month. If the average object size is 1 MB and you have 1 billion objects, then the Intelligent-Tiering automation charge is $2,500/month — that’s a hefty 62% increase on top of the raw storage. But if the average object size is 100 MB and you have just 10 million objects, then the automation charge is just $25/month — less than 1% overhead.

Beyond automation, Intelligent-Tiering avoids two things you’d run into with the standalone storage classes: retrieval fees on the lower tiers, and a minimum storage duration.

So Intelligent-Tiering isn’t just a storage-class choice. It’s also a bet that per-object monitoring is worth paying for.

Last access is doing all the work

Why not just do this yourself with a lifecycle rule, or an S3 Batch Operations job over S3 Inventory?

Last access is the only signal Intelligent-Tiering uses — and S3 doesn’t expose it natively. Intelligent-Tiering gets to use it internally; to reproduce the same behavior yourself, you’d have to analyze access data (either S3 Access Logs or CloudTrail Data Events) and aggregate the last access time per object — which is expensive to do at scale.

When access history is the right signal

Take a video editor working on a small project. The raw materials might be 5 TB of data. After doing the editing work locally, they upload the files to S3 for archival. If they use Intelligent-Tiering, those files will take 180 days to finally land in the Deep Archive Access tier. But if the editor knows the project is done and is unlikely to need those files anytime soon — or has a local copy and treats S3 only as a backup — then storing them in Intelligent-Tiering is wasteful. You end up overpaying for many months until the files land in the optimal storage class. The right signal here isn’t “has this object been read recently?” — it’s “the project is complete.”

Or a publication with an archive of newspaper pages. They have the raw, lossless files stored in S3 Glacier Deep Archive, and smaller derivatives — web-resolution copies of each page — stored in online tiers that power their website. They have a modernization project and want to run new OCR software on their entire archive, which requires pulling the lossless files. But each file needs to be processed just once. If those files were in Intelligent-Tiering instead, they’d remain in a higher tier for 180 days, which is wasteful.

In both cases, the context that matters lives outside S3. Intelligent-Tiering only has access patterns to work with, so that’s what it uses.

On the other hand, the online derivatives in the newspaper example are probably a great fit for Intelligent-Tiering. Most images in such a collection are accessed rarely if ever, and the ones that are tend to come back. Here, the limitation isn’t that we could provide better context but haven’t — it’s that there genuinely isn’t any to provide. When prediction is impossible, an access-based heuristic is the best you can do.

Knowing is not enough

People still reach for Intelligent-Tiering when the access pattern is perfectly predictable. AWS’s own framing calls it “the ideal storage class for data with unknown, changing, or unpredictable access patterns” — meaning the predictable case isn’t what it’s for. People use it anyway, because it’s easy: set it once, stop thinking about it, and assume your storage costs are near-optimal.

Does doing the right thing have to be that much harder? If you know how an object is going to be used, why should it be hard to act on that?

The problem is often not a matter of knowledge — teams working with data usually know how it’s supposed to be accessed. The problem is that S3 gives them nowhere useful to put that knowledge. This ties back to my previous post on the lack of a dataset entity and the inability to specify metadata at that level. Take the video editor. The files they’re uploading are part of a specific project, probably stored under a project prefix in an S3 bucket. If they could attach metadata to that prefix — “this project is done, archive everything under it” — Intelligent-Tiering would be irrelevant. But S3 doesn’t support this.

For many use cases, teams use S3 directly with no system on top. In media, some do use a Media Asset Management system — one that might know when a project is complete and trigger archive transitions. But even there, the lifecycle integration is often optional or unused, and the MAM only covers the assets it manages. The same gap shows up in a data warehouse, in ML training data, in archived logs — and in every pile of backups and exports nobody quite remembers anymore.

The lazy choice is sometimes the right one

If “do the right thing” means a project that never gets prioritized, letting Intelligent-Tiering eat the cost is often better than nothing.

But the lack of context tends to bite outside of tiering too. Six months from now, when someone needs to find this data, they won’t know where to look. A year from now, when you want to clean up, you won’t know what’s safe to delete. Storage cost problems in S3 are often deletion problems in disguise.

Once you’ve decided you want context for those reasons — discoverability, retention, deletion — you’ve already done the important work. Using it for tiering is almost a side effect.

The heuristic is not the problem

The hard part isn’t making the heuristic smarter. It’s that the systems that understand the context aren’t connected to the storage, and the systems that understand the storage aren’t connected to the business context.

Pointing an AI agent at the bucket wouldn’t fix this on its own. The agent only sees what S3 exposes — keys, sizes, last-modified timestamps, access logs. You could of course wire it up to a project tracker, a pipeline, or any other system where context lives — but at that point you’re building a bespoke integration per team and per workflow. The agent’s intelligence isn’t the gap. The gap is that storage has no place where context can live.

Until something sits between them, storage decisions keep getting made from the handful of signals S3 has — instead of from what people actually know about their data. Tiering is one of those decisions. There are plenty of others.



Next Post
Buckets and objects are not enough