When CloudFront Works but Your Domain Doesn't

I recently migrated this blog from one CloudFront distribution to another. Nothing exotic: a static site fronted by CloudFront, Route 53 for DNS, and Terraform managing everything. The goal was simple: low-cost static hosting with global delivery. I had the apex, www, and blog hostnames all set up with redirects from apex to blog.

The new CloudFront distribution worked perfectly when accessed directly via its *.cloudfront.net domain. But accessing the site through the custom domain was inconsistent. Some requests worked, some didn't, and some machines behaved differently from others.

That's usually the point where people say "DNS propagation" and wait. This wasn't that.

The Symptom#

From my laptop:

https://dxxxxxxxxx.cloudfront.net: works
https://www.edwardsmatt.com: sometimes works, sometimes doesn't
dig www.edwardsmatt.com A: returns nothing
dig www.edwardsmatt.com AAAA: returns CloudFront IPv6 addresses

From public resolvers:

dig @1.1.1.1 www.edwardsmatt.com A +short
dig @8.8.8.8 www.edwardsmatt.com A +short

Both returned valid IPv4 answers. Route 53 showed the records exactly as expected: A and AAAA records pointing to CloudFront, both targeting the same distribution. Terraform agreed.

At this point the system looked correct, but the behaviour clearly wasn't.

Initial Hypotheses (All Wrong)#

"It's ACM:" A common culprit, but HTTPS was working fine on the CloudFront domain. Certificates were issued, valid, and attached.

"It's CloudFront propagation:" CloudFront changes propagate quickly, and again: the distribution itself was fine.

"It's Route 53 TTLs:" Alias records don't behave like normal TTL-bound records, and public resolvers were already correct.

"Terraform must be out of sync:" Possible, but state matched reality and no drift was reported.

Each explanation was plausible, but none actually explained why IPv6 worked consistently while IPv4 worked from some resolvers but not others.

The Actual Problem#

Two separate issues overlapped.

1. IPv4 and IPv6 had briefly diverged#

During the migration, I had a window where the A record pointed at one CloudFront distribution and the AAAA record pointed at another. That was corrected fairly quickly, but not before some resolvers cached the situation.

2. My laptop was using a local DNS forwarder#

Running scutil --dns showed my Mac was using nameserver: 172.20.x.x. This turned out to be a local network DNS forwarder (e.g. a 172.20.x.x address). That resolver had cached a negative A response (no IPv4 record) and continued serving it.

So, while public DNS resolvers, Route 53, CloudFront, and Terraform were all correct, my laptop was confidently wrong. dig without an explicit resolver was faithfully querying the wrong place.

Why This Was Confusing#

Negative DNS responses are cacheable. Resolvers are allowed to remember "this record does not exist" for a period of time.

That means:

Fixing DNS doesn't necessarily fix your machine.
IPv6 can appear "more reliable" than IPv4.
dig output can be misleading if you don't know which resolver you're asking.

The system wasn't broken. My understanding of what I was observing was.

The Fix#

Nothing dramatic:

Ensure A and AAAA records both pointed to the same CloudFront distribution.
Let Terraform fully own the final state.
Verify behaviour using known public resolvers.
Change local DNS to bypass the stale resolver.

Once the local DNS was updated, everything worked immediately: the apex to blog redirect, www, blog, and both IPv4 and IPv6. No waiting required.

Takeaways#

A few reminders I'll keep for next time:

Always check which resolver you're querying. dig without @resolver is only as good as your local DNS path.
IPv6 doubles your failure modes. A and AAAA drifting even briefly can create hard-to-see bugs.
"DNS propagation" is often a lazy explanation. If public resolvers are correct, propagation is not the issue.
Infrastructure-as-code doesn't eliminate debugging. It just moves the bugs into different layers.

This wasn't a CloudFront problem or a Terraform problem. It was a debugging problem. And like most debugging problems, the system was telling the truth: just not to me.