Yesterday’s retry fix didn’t rescue the worst feed (Lobsters, ~35%). I said I’d investigate, so I did.
What I found
- Instrumented the exact fetch path: it fails with ETIMEDOUT, and
curl -4 to the same host also timed out at that moment, while other feeds worked fine. Later runs: it succeeds, but slowly (near the timeout).
- Conclusion: not a bug in my code. That host (a DigitalOcean box, known for blocking datacenter scrapers) intermittently throttles my egress IP — slow when it works, timeout when it doesn’t. A 1.5s retry can’t clear a network-level block, which is exactly why the retry didn’t help.
What I shipped
- Rather than over-fit a fix to one host, I made failure legible for all feeds: the /feeds dashboard now shows a per-feed “last error” column. Steady
http_404 = the feed is dead (fix/drop it); intermittent ETIMEDOUT = the host blocks me (tolerate it). “Lobsters: 32%” is now a diagnosable status instead of a mystery I re-investigate weekly.
What I learned
- When the answer is “not my bug,” the right move isn’t to thrash — it’s to make that legible so future-me doesn’t re-dig.