Ask AI to Build a Networked Tool? Watch Out for the Invisible Pitfalls

Have AI build you a tool that "scrapes GitHub Trending and auto-generates articles." Sounds awesome, right?

I thought so too. Then I spent three days debugging.

The problem wasn't the code logic AI wrote—the logic was basically fine. The problem was what AI can't "see": network timeouts, API rate limits, anti-scraping mechanisms, environment differences. When writing code, AI either completely ignores these or writes overly idealized versions.

Three Blind Spots in AI-Written Network Code

Blind Spot 1: Networks are unstable

When AI writes code, it assumes the network is stable. The fetch request goes out, data comes back. No timeouts, no disconnections, no DNS failures.

Reality: GitHub Trending might take 5 seconds to load. Some APIs are directly unreachable from certain networks. On bad WiFi, TCP connections drop for no apparent reason.

A scraper script AI wrote worked perfectly locally but failed completely on GitHub Actions. The runner's network environment was totally different from local—some requests were blocked by firewalls. AI's code didn't account for this at all.

Blind Spot 2: APIs rate-limit you

Many APIs have rate limits. GitHub API allows 60 unauthenticated requests per hour. Tencent Cloud Translation API has per-second call limits.

AI-written code rarely handles rate limiting proactively. It doesn't add delays between requests, doesn't parse 429 status codes, doesn't implement exponential backoff retries.

Result: script runs fine, then suddenly all requests fail simultaneously, and you're staring at confusing error logs with no idea what happened.

Blind Spot 3: Anti-scraping is real

Many websites have anti-scraping measures. Douyin's API requires dynamic cookies. Kuaishou's pages are client-rendered. Baidu Hot Search's HTML structure changes frequently.

AI-written scrapers typically use the simplest approach—directly fetch the URL, then parse the HTML. This works for static pages but completely fails against anti-scraping sites.

Worse, AI doesn't know what it doesn't know. The code it gives you looks perfect but fetches zero data when actually run.

Real Pitfall Records

Pitfall 1: Missing timeout settings

A scraping script AI wrote used Node.js fetch but no timeout. One day the target site responded extremely slowly. The script just hung there—no error, no progress, waiting.

It waited 40 minutes until GitHub Actions' 6-hour timeout killed it.

The fix was simple: add an AbortController with a 15-second timeout. But AI's initial code didn't include it.

Pitfall 2: Overly simple retry logic

AI's retry logic went like this: fail, wait 1 second, retry. Fail, wait 1 second, retry. Fail, wait 1 second, retry. Three failures total, then error out.

The problem: if the failure is caused by server overload (502/503), retrying every second just adds more pressure, making things worse. The correct approach is exponential backoff—wait 1s, 2s, 4s, 8s, giving the server time to recover.

Pitfall 3: Environment variables and path issues

AI's script ran fine locally but broke in CI because:

Local uses Windows, CI uses Linux—different path separators
Local has .env file, CI uses secrets
Local Node.js is version 24, CI defaults to 20

These environment differences never cross AI's mind when writing code. It has no idea what environment your code will run in.

The Right Approach to Building Networked Tools

1. Always set timeouts

Regardless of language or framework, every network request must have a timeout. 15 seconds is a reasonable default—adjust per scenario.

const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 15000);
const response = await fetch(url, { signal: controller.signal });
clearTimeout(timeoutId);

AI won't write this proactively. You must add it.

2. Implement exponential backoff retries

Don't retry linearly. Use exponential backoff:

Retry 1: wait 1 second
Retry 2: wait 2 seconds
Retry 3: wait 4 seconds
Retry 4: wait 8 seconds
Retry 5: wait 16 seconds

Also check response status codes. If it's 429 (rate limited), wait longer. If it's 404 (not found), retrying is pointless—error out immediately.

3. Handle each data source independently

Don't put all network requests in one try/catch. Each data source should have its own error handling so that when one source fails, others keep working.

AI tends to pile all logic together. You need to manually separate them.

4. Test in the target environment

Code that runs locally doesn't guarantee it'll work on a server. Always test at least once in the actual deployment environment.

CI environments like GitHub Actions have completely different network environments, file systems, and variable handling. Many pitfalls only surface in these environments.

What AI Can and Can't Do

What AI is good at for networked tools:

Writing basic request logic
Parsing JSON/XML/HTML data
Organizing code structure
Writing comments and documentation

What AI is bad at:

Handling network exceptions and edge cases
Dealing with anti-scraping mechanisms
Adapting to different runtime environments
Designing reliable retry and fallback strategies

So the right division of labor: let AI write the main logic, you add exception handling. Let AI write the first draft, you add timeouts, retries, and error handling.

A Checklist

Every time AI writes network-related code, run through this checklist:

Do all network requests have timeout settings?
Is there a retry mechanism? Linear or exponential backoff?
Does each data source have independent error handling?
Is 429 rate limiting handled?
Are file paths written in a cross-platform way?
Do environment variables have defaults or checks?
Has it been tested in the actual deployment environment?

If any answer is "no," that code shouldn't go live.

Wrapping Up

AI's coding ability is genuinely strong, but it writes code for an "ideal world"—where networks are always stable, APIs are always available, and environments are always consistent.

The real world doesn't work that way. Networks drop, APIs break, environments change. These "non-ideal" parts are AI's blind spots—and exactly where you need to take over.

Let AI help you write code, but don't let AI do your thinking. Especially not for networking.

Common Error Patterns in AI-Written Network Code

Beyond the major pitfalls already covered, there are several recurring error patterns. Hardcoded URLs and configurations. AI often hardcodes URLs, port numbers, and configuration values directly into the code. The right approach is to extract these into environment variables or configuration files. Ignoring SSL certificate verification. In some environments, AI-generated code either disables certificate verification entirely or fails when the certificate chain changes. Missing connection pooling. AI typically creates a new connection for each request. In production, you want connection pooling to reuse existing connections and reduce latency. Infinite loops without guardrail. When implementing retry logic, AI sometimes creates loops that can run indefinitely. Always include a maximum retry count and a total timeout.

Production Readiness Checklist

Before deploying any AI-written networked tool to production, walk through this expanded checklist. All network requests have configurable timeout settings with sensible defaults. Retry logic uses exponential backoff with jitter to prevent thundering herd problems. Each external data source has independent error handling and fallback paths. Rate limit detection with appropriate wait times and logging is implemented. File paths use cross-platform compatible functions. Environment variables have validation and sensible defaults. Logging is in place for debugging production issues. Configuration is externalized, not hardcoded. The code has been tested in the target deployment environment. Security considerations have been reviewed. Circuit breaker patterns are in place for services that could become unavailable. Going through this checklist takes about fifteen minutes per tool but saves hours of debugging in production every time.
Testing Network Code

Unit tests should mock network responses. Integration tests need a production-mirror environment. Always test server errors timeouts malformed JSON rate limits and slow responses. Use Toxiproxy to simulate packet loss locally.

AI联网工具的潜在风险

AI联网工具虽然强大，但存在许多用户看不见的风险。

数据安全风险：AI可能将敏感信息发送到外部服务器，未加密的数据可能被拦截，外部API的安全性无法控制。

隐私风险：工具可能收集超出必要范围的用户数据，用户数据可能被分享给第三方，数据存储在外部服务器用户失去控制权。

内容风险：联网获取的信息可能不准确，训练数据中的偏见可能通过联网工具放大，可能无意中获取或传播有害信息。

如何保护自己

仔细检查工具请求的权限范围，发送前移除敏感信息，确保所有数据传输都经过加密，定期检查工具的使用记录，使用有良好声誉的联网工具。

Ask AI to Build a Networked Tool? Watch Out for the Invisible Pitfalls

Ask AI to Build a Networked Tool? Watch Out for the Invisible Pitfalls

Three Blind Spots in AI-Written Network Code

Real Pitfall Records

The Right Approach to Building Networked Tools

What AI Can and Can't Do

A Checklist

Wrapping Up

Common Error Patterns in AI-Written Network Code

Production Readiness Checklist

AI联网工具的潜在风险

如何保护自己

Related Articles

面试官问你：如何解决大模型的上下文长度限制——标准回答框架

大模型上下文长度限制完全指南：从原理到工程落地的 4 种方案

面试官问你：RAG 如何处理 PDF——别再说转文本切片了