Skip to content
Back to blog
Part 1 of 3 in series
Measuring Without Surveillance

A curated collection of articles exploring this topic in depth.

4 min read

Privacy as a Design Constraint

Why privacy belongs in the architecture, not bolted on after.


When I decided to add analytics to this site, the natural starting point was a tracking script. Every service I evaluated worked the same way: add a snippet of JavaScript to the page, and the provider handles collection, aggregation, and dashboards. The appeal is obvious.

But the more I looked at what that involved, the less it made sense here. Not because the data would be uninteresting, but because the default approach requires accepting trade-offs that are difficult to justify when there is no commercial reason to accept them.

The problem with the default#

Even privacy-conscious analytics services require trade-offs that are easy to overlook. When I examined what embedding a tracking script actually involves, three things stood out:

  1. Third-party code runs in the visitor's browser. The script has access to the page context. I would be trusting the provider not to change its behaviour in ways I had not anticipated.
  2. Data handling is delegated to an external party. Even services that minimise data collection still process request metadata on infrastructure I do not control. The privacy guarantees would be contractual, not architectural.
  3. Analytics become a dependency on an external service. Reliability and privacy are only as strong as the third party's infrastructure, policies, and business model.

For a personal site with no commercial objectives, none of these trade-offs are necessary. The question that clarified things for me was not "which analytics provider respects privacy the most?" but "what do I actually need to measure, and can I do it without any of this?"

Privacy as a constraint, not a feature#

There is a meaningful difference between a system that has privacy features and a system that is constrained by privacy requirements.

A privacy feature is something added to an existing design: cookie consent banners, anonymisation layers, data retention policies bolted onto a pipeline that was built to collect everything. These features are necessary when the underlying architecture assumes surveillance. They are mitigations, not solutions.

A privacy constraint is a boundary condition that shapes the architecture from the start. When I started with the requirement that no individual-level data would ever be collected, stored, or published, entire categories of design decisions disappeared. There is no consent banner because there is nothing to consent to. There is no data retention policy for user records because no user records exist.

For this site, the constraint set is concrete and testable:

  • No client-side tracking scripts. Nothing executes in the visitor's browser for the purpose of measurement.
  • No cookies. The site sets no cookies of any kind.
  • No individual-level data in any output. Published metrics are aggregated. No IP addresses, full user agents, or per-session data appear in any file served to the public.
  • Minimum count thresholds. Pages or referrers below a configurable count threshold are excluded from published data. The value can be tuned as traffic grows; the mechanism ensures no individual visit is ever surfaced.
  • Short-lived raw data. Server-side access logs are deleted automatically after thirty days.

Each constraint exists as code or configuration. The queries never select client_ip or user_agent. An S3 lifecycle policy deletes raw logs on a schedule. The aggregation process has no public endpoint to accidentally expose. Weakening the privacy model would require deliberate infrastructure changes.

What I gave up#

Treating privacy as a hard constraint meant accepting that certain kinds of measurement were off the table. There is no real-time dashboard. There are no user journeys, session recordings, or conversion funnels. There is no way to know which individual pages a specific visitor read, because that data is never collected.

For a personal blog, this is not a meaningful loss. The questions that matter are simpler: which posts are people reading? Where is traffic coming from? Is the site healthy? These can be answered with aggregated, anonymous server-side data.

What I gained#

The system is explainable. I can describe exactly what is measured, how it is measured, and what happens to the data, without caveats or qualifications. There is no "we collect this but we promise not to look at it." There is no data that exists in a form I would be uncomfortable publishing. The stats dashboard for this site is the direct output of this approach.

This clarity is not just an ethical position. It simplifies the architecture, reduces the operational surface area, and eliminates an entire class of compliance concerns. When the constraint is "never collect individual data," the resulting system is smaller, cheaper, and easier to reason about.