Why Apple Review can't be your QA (and what to do about it)
A user pinged me with a screenshot. He had tapped 从 iCloud 恢复 in one of my apps — 鱼缸管家, an aquarium log, nothing exotic — and the screen showed:
Error fetching record <CKRecordID: 0x799aa4600;
recordName=aqualog-doc-_a599d8e4bae73ebf8e6bea10228037c6,
zoneID=_defaultZone:__defaultOwner__> from server: Record not found
Reasonable user reaction: that’s a bug. Reasonable engineer reaction:
that’s not a bug, that’s the correct behavior — pullData() caught the
error and called error.localizedDescription. The system told him the
truth in the most honest way it had.
Both reactions are right. That’s what makes it a bug.
What actually happened
He installed the app. Tapped Restore. Had never synced anything to iCloud
yet — there was nothing in his private database to restore. So
db.record(for: id) came back with CKError(.unknownItem). The catch
block, written some months ago when I cared about a different problem,
did this:
do {
let rec = try await db.record(for: docId)
apply(rec)
} catch {
errorMessage = error.localizedDescription
}
That treats every failure mode the same. Auth failure, network drop,
schema mismatch, throttling, and “you have no backup yet” — all
rendered as the raw CKError description. The first four are errors.
The fifth one isn’t. It’s the empty state of the restore feature,
dressed up in the costume of a runtime exception because that’s how the
CloudKit API decided to encode it.
The fix is one line:
} catch let e as CKError where e.code == .unknownItem {
statusMessage = "iCloud has no backup yet — sync first to create one."
} catch {
errorMessage = error.localizedDescription
}
Five characters of typing once you see it. The interesting question is why nobody saw it across three minor releases before the user did. I have 53 iOS apps in production and a fairly aggressive QA setup. Something in the system was supposed to catch this. Nothing did.
The three things I was relying on, and why each one couldn’t see it
I want to be careful here. I was not, in some literal sense, expecting Apple Review to find this bug. Apple Review is a compliance gate, not a QA team, and I have known this since my second rejection. What I was relying on was a stack of overlapping checks that each had a bounded job, and I had never sat down and asked: which class of bug falls outside all of them at once? This was that class.
Apple Review. They approved the build in under a day. App Review staff spend somewhere on the order of 10 minutes per submission and they are hunting two things: guideline violations (private API use, missing IAP disclosure, paid functionality behind a button that says “Sign in with Google”, the usual list) and crashes. A wall of cryptic Chinese-English hex showing up after a button tap is neither. It’s bad UX, and bad UX is not a rejection criterion — if it were, half the App Store would be down. Apple is not your QA team. They are your customs officer. They check that you’ve declared the right things on the form. If your suitcase contains ugly clothing, that’s not their problem.
My static linter. I run a 70-rule pre-submit auditor on every
release
(apple-presubmit-audit,
MIT, Python, no dependencies you don’t already have). It catches the
mechanical mistakes: plist mismatches, auto-renewing missing from the
subscription CTA, declared-but-unused permission strings, SwiftData
fields without default values that will crash on migration, hardcoded
prices that will be wrong in most markets. It is the single
highest-ROI piece of code in the factory and I would not ship without
it. It also could not have caught this. Grep doesn’t know that
error.localizedDescription is the wrong string to surface here,
because in roughly 80% of the catch blocks in any given codebase it
is the right string. The pattern is correct on average and wrong in
this specific neighborhood. That’s the worst kind of pattern to lint
for, because the naive rule has too many false positives to live with.
My happy-path QA. Before each release a fleet of ten Claude Code subagents walks every screen of every app. They tap things. They check nothing crashes. They run on simulators with prepopulated demo data so the UI has something to render. None of them performed the specific journey of “fresh install on a device that has never synced to iCloud, then immediately tap Restore.” That is the empty state of the restore feature, and “test every empty state” was nowhere on their checklist because nobody had written that checklist down.
Three layers. The first one was never going to catch it. The second one couldn’t, by construction. The third one could have but didn’t, because the test matrix was wrong. The post-mortem isn’t really “Apple Review failed.” Apple Review did its job. The post-mortem is: my QA matrix had a hole shaped exactly like the empty state of every CloudKit / HealthKit / Photos call in 53 apps, and I had to find out from a stranger.
The new rule
I added one to apple-presubmit-audit (v0.5.2). It is opinionated and narrow, by design.
The rule says: if a Swift file imports CloudKit, HealthKit, EventKit,
or Photos, or uses URLSession / FileManager / NSFetchRequest, scan
it for a generic catch that ends with
errorMessage = error.localizedDescription. If the same file does
not contain at least one branch handling the empty case for that
API (CKError.unknownItem, HKError.noData, a 404 statusCode check,
fileDoesNotExist, isEmpty on a fetch result, etc.), flag it as
high severity. The exact regex table lives in audit.py under
EMPTY_PRONE_APIS; the seven APIs covered are the ones I have
personally seen surface raw error strings to users.
$ apple-presubmit-audit --no-asc --project AquaLog
...
⚠️ CUSTOM empty-state-vs-error-state
Files fetching from external sources surface raw
error.localizedDescription to UI without a 'no-data / not-found'
branch.
Files: [('CloudKit', 'AquaLogApp.swift')]
It is a heuristic, not a proof. It will produce false positives when
the empty state is handled in a different file from the catch. It
will produce false negatives when the empty state is handled with
if let instead of catch. Neither bothers me. The bar is “would
this rule have caught the AquaLog bug,” and the answer is yes.
A brief side-quest, because it’s the part I find genuinely
interesting: the reason errorMessage = error.localizedDescription
is so common is that it’s the path of minimum apparent ambition.
Every other choice requires you to commit to a taxonomy. Is this an
auth failure or a network failure? A retryable error or a fatal one?
Should the user retry, sign in again, or give up? The localized
description sidesteps all of that. It says: here is the literal
truth as the framework sees it, you deal with it. Frameworks reward
this with strings like “Record not found” and “The operation
couldn’t be completed. (HKError error 11.)”. The localized
description is the API contract’s revenge on developers who won’t
write a switch statement. Every catch block I have ever written
that delegates to localizedDescription is, in retrospect, a TODO
I disguised as code.
The fix that actually matters is the QA matrix, not the rule
The static rule is the easy part — a couple of hours including the test cases. The harder change is the test matrix. Going forward, every user-tappable button gets four states tested before release:
- Happy. Data + network + permissions, demo data installed.
- Empty. Fresh install. No data. No backup. No prior runs.
- Denied. Permission rejected. Signed out of iCloud. Airplane mode.
- Error. A forced exception in the catch path.
State 2 is the one nobody tests by default and it is also where the harshest support messages come from — “I opened the app and it doesn’t work” almost always means the user hit an empty state the designer never rendered for. State 3 is similar: the user revoked a permission and the app silently no-ops on the feature that depended on it, and now the user thinks the button is broken. Neither is in the default QA checklist that ships with “test every feature.” The default checklist assumes data and permissions; the bugs live exactly where the assumption doesn’t hold.
The static rule will catch the next AquaLog. The matrix change will
catch the next class of AquaLog — the ones where the empty state
isn’t even a catch block, just a list view rendering
ForEach(items) { ... } on an items that’s empty and producing a
600-pixel-tall void with no explanation.
If you ship iOS apps:
apple-presubmit-audit
is one Python file, MIT, no telemetry, takes a project path and an
optional ASC API key. Run it on whatever you’re about to submit. If
the empty-state-vs-error-state rule fires, the warning is almost
always pointing at a real catch block worth a second look. If it
doesn’t fire, your code is either clean or you’re handling the
empty case with if, and it’s probably worth opening the file
anyway.
The fix is in the next build. v1.0.10 — “iCloud section visible to all users with a paywall on tap for non-subscribers, friendly empty-state message replacing the raw CKError” — goes to App Review today. If a user tells you a button shows a CloudKit error code, they’ve done you a favor. The next one might just uninstall.
Comments
Be the first to start a discussion. Reply with your thoughts on Bluesky and tag @jiexiang.dev — I'll link the thread back here.