Ask AI to Build a Networked Tool? Watch Out for the Invisible Pitfalls
Have AI build you a tool that "scrapes GitHub Trending and auto-generates articles." Sounds awesome, right?
I thought so too. Then I spent three days debugging.
The problem wasn't the code logic AI wrote—the logic was basically fine. The problem was what AI can't "see": network timeouts, API rate limits, anti-scraping mechanisms, environment differences. When writing code, AI either completely ignores these or writes overly idealized versions.
Three Blind Spots in AI-Written Network Code
Blind Spot 1: Networks are unstable
When AI writes code, it assumes the network is stable. The fetch request goes out, data comes back. No timeouts, no disconnections, no DNS failures.
Reality: GitHub Trending might take 5 seconds to load. Some APIs are directly unreachable from certain networks. On bad WiFi, TCP connections drop for no apparent reason.
A scraper script AI wrote worked perfectly locally but failed completely on GitHub Actions. The runner's network environment was totally different from local—some requests were blocked by firewalls. AI's code didn't account for this at all.
Blind Spot 2: APIs rate-limit you
Many APIs have rate limits. GitHub API allows 60 unauthenticated requests per hour. Tencent Cloud Translation API has per-second call limits.
AI-written code rarely handles rate limiting proactively. It doesn't add delays between requests, doesn't parse 429 status codes, doesn't implement exponential backoff retries.
Result: script runs fine, then suddenly all requests fail simultaneously, and you're staring at confusing error logs with no idea what happened.
Blind Spot 3: Anti-scraping is real
Many websites have anti-scraping measures. Douyin's API requires dynamic cookies. Kuaishou's pages are client-rendered. Baidu Hot Search's HTML structure changes frequently.
AI-written scrapers typically use the simplest approach—directly fetch the URL, then parse the HTML. This works for static pages but completely fails against anti-scraping sites.
Worse, AI doesn't know what it doesn't know. The code it gives you looks perfect but fetches zero data when actually run.
Real Pitfall Records
Pitfall 1: Missing timeout settings
A scraping script AI wrote used Node.js fetch but no timeout. One day the target site responded extremely slowly. The script just hung there—no error, no progress, waiting.
It waited 40 minutes until GitHub Actions' 6-hour timeout killed it.
The fix was simple: add an AbortController with a 15-second timeout. But AI's initial code didn't include it.
Pitfall 2: Overly simple retry logic
AI's retry logic went like this: fail, wait 1 second, retry. Fail, wait 1 second, retry. Fail, wait 1 second, retry. Three failures total, then error out.
The problem: if the failure is caused by server overload (502/503), retrying every second just adds more pressure, making things worse. The correct approach is exponential backoff—wait 1s, 2s, 4s, 8s, giving the server time to recover.
Pitfall 3: Environment variables and path issues
AI's script ran fine locally but broke in CI because:
- Local uses Windows, CI uses Linux—different path separators
- Local has .env file, CI uses secrets
- Local Node.js is version 24, CI defaults to 20
These environment differences never cross AI's mind when writing code. It has no idea what environment your code will run in.
The Right Approach to Building Networked Tools
1. Always set timeouts
Regardless of language or framework, every network request must have a timeout. 15 seconds is a reasonable default—adjust per scenario.
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 15000);
const response = await fetch(url, { signal: controller.signal });
clearTimeout(timeoutId);
AI won't write this proactively. You must add it.
2. Implement exponential backoff retries
Don't retry linearly. Use exponential backoff:
- Retry 1: wait 1 second
- Retry 2: wait 2 seconds
- Retry 3: wait 4 seconds
- Retry 4: wait 8 seconds
- Retry 5: wait 16 seconds
Also check response status codes. If it's 429 (rate limited), wait longer. If it's 404 (not found), retrying is pointless—error out immediately.
3. Handle each data source independently
Don't put all network requests in one try/catch. Each data source should have its own error handling so that when one source fails, others keep working.
AI tends to pile all logic together. You need to manually separate them.
4. Test in the target environment
Code that runs locally doesn't guarantee it'll work on a server. Always test at least once in the actual deployment environment.
CI environments like GitHub Actions have completely different network environments, file systems, and variable handling. Many pitfalls only surface in these environments.
What AI Can and Can't Do
What AI is good at for networked tools:
- Writing basic request logic
- Parsing JSON/XML/HTML data
- Organizing code structure
- Writing comments and documentation
What AI is bad at:
- Handling network exceptions and edge cases
- Dealing with anti-scraping mechanisms
- Adapting to different runtime environments
- Designing reliable retry and fallback strategies
So the right division of labor: let AI write the main logic, you add exception handling. Let AI write the first draft, you add timeouts, retries, and error handling.
A Checklist
Every time AI writes network-related code, run through this checklist:
- Do all network requests have timeout settings?
- Is there a retry mechanism? Linear or exponential backoff?
- Does each data source have independent error handling?
- Is 429 rate limiting handled?
- Are file paths written in a cross-platform way?
- Do environment variables have defaults or checks?
- Has it been tested in the actual deployment environment?
If any answer is "no," that code shouldn't go live.
Wrapping Up
AI's coding ability is genuinely strong, but it writes code for an "ideal world"—where networks are always stable, APIs are always available, and environments are always consistent.
The real world doesn't work that way. Networks drop, APIs break, environments change. These "non-ideal" parts are AI's blind spots—and exactly where you need to take over.
Let AI help you write code, but don't let AI do your thinking. Especially not for networking.
Common Error Patterns in AI-Written Network Code
Beyond the major pitfalls already covered, there are several recurring error patterns. Hardcoded URLs and configurations. AI often hardcodes URLs, port numbers, and configuration values directly into the code. The right approach is to extract these into environment variables or configuration files. Ignoring SSL certificate verification. In some environments, AI-generated code either disables certificate verification entirely or fails when the certificate chain changes. Missing connection pooling. AI typically creates a new connection for each request. In production, you want connection pooling to reuse existing connections and reduce latency. Infinite loops without guardrail. When implementing retry logic, AI sometimes creates loops that can run indefinitely. Always include a maximum retry count and a total timeout.
Production Readiness Checklist
Before deploying any AI-written networked tool to production, walk through this expanded checklist. All network requests have configurable timeout settings with sensible defaults. Retry logic uses exponential backoff with jitter to prevent thundering herd problems. Each external data source has independent error handling and fallback paths. Rate limit detection with appropriate wait times and logging is implemented. File paths use cross-platform compatible functions. Environment variables have validation and sensible defaults. Logging is in place for debugging production issues. Configuration is externalized, not hardcoded. The code has been tested in the target deployment environment. Security considerations have been reviewed. Circuit breaker patterns are in place for services that could become unavailable. Going through this checklist takes about fifteen minutes per tool but saves hours of debugging in production every time.
Testing Network Code
Unit tests should mock network responses. Integration tests need a production-mirror environment. Always test server errors timeouts malformed JSON rate limits and slow responses. Use Toxiproxy to simulate packet loss locally.
AI联网工具的潜在风险
AI联网工具虽然强大,但存在许多用户看不见的风险。
数据安全风险:AI可能将敏感信息发送到外部服务器,未加密的数据可能被拦截,外部API的安全性无法控制。
隐私风险:工具可能收集超出必要范围的用户数据,用户数据可能被分享给第三方,数据存储在外部服务器用户失去控制权。
内容风险:联网获取的信息可能不准确,训练数据中的偏见可能通过联网工具放大,可能无意中获取或传播有害信息。
如何保护自己
仔细检查工具请求的权限范围,发送前移除敏感信息,确保所有数据传输都经过加密,定期检查工具的使用记录,使用有良好声誉的联网工具。
