The hard part is not parsing HTML. It is getting a usable response from modern websites in the first place.
Collecting data from modern websites is often less of a parsing problem and more of an access problem. A basic Python requests call returns a 403 status or a block page before BeautifulSoup has anything useful to work with.
The usual response is to add proxies, rotate headers, or switch to Playwright. These approaches work, but they also add infrastructure the scraper now has to manage.
Bright Data's Web Unlocker API works as the access layer between the Python scraper and the target website. Instead of making the scraper deal with proxy rotation, anti-bot behavior, CAPTCHA handling, rendering issues, and blocked requests directly, the script sends the target URL to Web Unlocker and receives a response that can be parsed like normal HTML.
Web Unlocker handles proxy rotation, anti-bot challenges, and CAPTCHA solving in a single API call, returning clean HTML or JSON responses ready for parsing.
In this article, I will use Web Unlocker as the fetching layer for a small Python scraper:
- first, I will configure a Web Unlocker API zone in the Bright Data UI;
- then, I will show why raw
requestsfails on a modern protected page; - after that, I will send the same target through Web Unlocker API;
- finally, I will parse the returned HTML with BeautifulSoup and compare the results.
The point is not to replace Playwright everywhere. The point is to avoid running browser automation when the scraper only needs rendered HTML.
Why Web Unlocker Fits This Problem
When a scraping script cannot see the data, the usual advice is to use Playwright or Selenium.
That advice is often correct. Browser automation is the right tool when the workflow requires clicks, forms, login flows, scrolling, or controlled interaction with a page.
But many scraping jobs are simpler than that. They do not need browser control. They only need the page after JavaScript has finished producing the content.
With Playwright, the application owns the browser lifecycle. It has to launch or connect to a browser, navigate the page, wait for the right state, collect the content, and close the session cleanly.
With Web Unlocker, the scraper keeps a request-response shape:
target URL → Web Unlocker API → rendered response → parser
Web Unlocker combines website unlocking, automated proxy management, and built-in JavaScript rendering. For sites that require JavaScript execution, it can launch a headless browser behind the scenes and return the rendered output.
That makes Web Unlocker useful as a middle layer when raw requests is too limited, full browser automation is too heavy, and the scraper only needs parsable content through one HTTP call.
Step 1: Create a Web Unlocker API Zone in Bright Data UI
Before writing the Python scraper, create a Web Unlocker API zone in the Bright Data Control Panel.
From the left-hand navigation menu, go to Web Access APIs, then click Create API. In the API selection screen, choose Web Unlocker API and continue.
After selecting Web Unlocker API, give the zone a descriptive name. For this example:
python_js_rendered_html
A descriptive name matters if you manage multiple zones later. The name should tell you what this zone is for without opening the settings.
After configuring the zone, Bright Data provides a Test API screen with a ready-to-use cURL command. This shows the exact request shape: the endpoint, Authorization header, zone name, target URL, and response format.
Once the zone is created, open its Overview tab. This is the important screen for the code. The Overview tab gives you the values needed for direct API access:
- API key
- Zone name
- Ready-to-use request example
For the protected review page in this example, also enable Manual 'expect' elements in the zone settings.
This setting matters because we want Web Unlocker to wait until a specific element appears in the rendered page before returning the response. Without this setting, the API can reject the manual expect parameter:
Manual expect is not enabled for this zone
The option is available under: Configuration → Advanced settings → Custom Unlocker API → Manual 'expect' elements
The Direct API endpoint is:
https://api.brightdata.com/request
The request is authenticated with a Bearer token:
Authorization: Bearer <BRIGHT_DATA_API_KEY>
The basic payload contains the Web Unlocker zone name, the target URL, and the response format:
{
"zone": "your_web_unlocker_zone",
"url": "https://target-url.com",
"format": "raw"
}
I used format: "raw" because I wanted the response body and planned to parse the HTML myself with BeautifulSoup. The other option, format: "json", returns structured data directly.
Step 2: Store the Credentials Locally
The Python script needs two values from the UI:
BRIGHT_DATA_API_KEY
BRIGHT_DATA_ZONE
Keep them outside the code as environment variables:
export BRIGHT_DATA_API_KEY="your_api_key_here"
export BRIGHT_DATA_ZONE="your_web_unlocker_zone_here"
This keeps secrets out of the script and makes the same code easier to run locally, in CI, or inside a scheduled worker later.
Step 3: Try the Normal Python Scraper First
For the smoke test, I used a real protected target:
https://www.g2.com/products/mongodb/reviews
The page is available in the browser, but a basic Python request receives a blocked response instead of useful review content. Here is the first version of the scraper:
import requests
from bs4 import BeautifulSoup
TARGET_URL = "https://www.g2.com/products/mongodb/reviews"
response = requests.get(
TARGET_URL,
headers={"User-Agent": "Mozilla/5.0"},
timeout=30,
)
print("status_code:", response.status_code)
print("html_length:", len(response.text))
soup = BeautifulSoup(response.text, "html.parser")
headings = [h.get_text(strip=True) for h in soup.find_all("h2")]
print("headings_count:", len(headings))
print("first_headings:", headings[:3])
The output confirms that the baseline request is not enough. Raw requests receives a 403 response from the G2 reviews page, so BeautifulSoup has no useful review content to parse.
At this point, the fetching layer is the problem.
Step 4: Send the Target URL Through Web Unlocker API
Now send the same G2 reviews page through the Web Unlocker API zone created in the Bright Data UI.
Install the dependencies:
pip install requests beautifulsoup4
Then create a small wrapper around the Web Unlocker API call:
import os
import requests
from bs4 import BeautifulSoup
WEB_UNLOCKER_ENDPOINT = "https://api.brightdata.com/request"
BRIGHT_DATA_API_KEY = os.environ["BRIGHT_DATA_API_KEY"]
BRIGHT_DATA_ZONE = os.environ["BRIGHT_DATA_ZONE"]
TARGET_URL = "https://www.g2.com/products/mongodb/reviews"
def fetch_with_web_unlocker(url: str, timeout: int = 90) -> str:
payload = {
"zone": BRIGHT_DATA_ZONE,
"url": url,
"format": "raw",
"method": "GET",
"headers": {
"x-unblock-expect": "{\"text\": \"reviews\"}"
},
}
headers = {
"Authorization": f"Bearer {BRIGHT_DATA_API_KEY}",
"Content-Type": "application/json",
}
response = requests.post(
WEB_UNLOCKER_ENDPOINT,
json=payload,
headers=headers,
timeout=timeout,
)
response.raise_for_status()
return response.text
The important part is that the Python code is not connecting to a proxy directly. It sends a normal HTTP request to the Web Unlocker API endpoint.
A successful run returns a usable page body instead of the blocked response from raw requests. With Web Unlocker, the same Python workflow receives a page body that BeautifulSoup can parse.
Step 5: Extract Structured Data From the Web Unlocker Response
Once Web Unlocker returns a usable page body, the extraction code becomes ordinary BeautifulSoup again:
# Uses fetch_with_web_unlocker() and TARGET_URL from Step 4
import json
from bs4 import BeautifulSoup
html = fetch_with_web_unlocker(TARGET_URL)
soup = BeautifulSoup(html, "html.parser")
headings = [
h.get_text(strip=True)
for h in soup.find_all("h2")
]
page_text = soup.get_text(" ", strip=True)
result = {
"url": TARGET_URL,
"html_length": len(html),
"headings_count": len(headings),
"first_headings": headings[:5],
"contains_mongodb": "mongodb" in page_text.lower(),
"contains_reviews": "reviews" in page_text.lower(),
}
print(json.dumps(result, indent=2))
This is the main benefit of the approach: the extraction layer stays simple. The parser does not need to know whether the original page was protected, blocked for raw requests, or difficult to access directly. It receives HTML and extracts data from it.
The complexity moves out of the scraper and into the access layer, which is exactly where Web Unlocker fits.
Step 6: Compare Raw Requests and Web Unlocker
To make the difference visible, here is a side-by-side comparison script. The goal: send the same URL through raw requests and through Web Unlocker API, then compare the response status, HTML size, and extracted headings.
import os
import requests
from bs4 import BeautifulSoup
WEB_UNLOCKER_ENDPOINT = "https://api.brightdata.com/request"
BRIGHT_DATA_API_KEY = os.environ["BRIGHT_DATA_API_KEY"]
BRIGHT_DATA_ZONE = os.environ["BRIGHT_DATA_ZONE"]
TARGET_URL = "https://www.g2.com/products/mongodb/reviews"
def fetch_with_requests(url: str) -> tuple[int, str]:
response = requests.get(
url,
headers={"User-Agent": "Mozilla/5.0"},
timeout=30,
)
return response.status_code, response.text
def fetch_with_web_unlocker(url: str, timeout: int = 90) -> tuple[int, str]:
payload = {
"zone": BRIGHT_DATA_ZONE,
"url": url,
"format": "raw",
"method": "GET",
"headers": {
"x-unblock-expect": "{\"text\": \"reviews\"}"
},
}
headers = {
"Authorization": f"Bearer {BRIGHT_DATA_API_KEY}",
"Content-Type": "application/json",
}
response = requests.post(
WEB_UNLOCKER_ENDPOINT,
json=payload,
headers=headers,
timeout=timeout,
)
return response.status_code, response.text
# --- compare ---
raw_status, raw_html = fetch_with_requests(TARGET_URL)
unlocker_status, unlocker_html = fetch_with_web_unlocker(TARGET_URL)
for label, status, html in [
("raw requests", raw_status, raw_html),
("web unlocker", unlocker_status, unlocker_html),
]:
soup = BeautifulSoup(html, "html.parser")
headings = [h.get_text(strip=True) for h in soup.find_all("h2")]
print(f"[{label}] status={status}, html_len={len(html)}, headings={headings[:3]}")
| Method | Status | HTML length | Headings found |
|---|---|---|---|
| Raw requests | 403 | ~2 KB | 0 |
| Web Unlocker | 200 | ~800 KB | 15+ |
Raw requests receives a blocked 403 response. Web Unlocker returns a 200 response with parseable headings from the same G2 page — including sections like "Value at a Glance", "Top-Rated Alternatives", and "MongoDB Integrations".
Step 7: Save the Output
For a small scraping job, JSON Lines is a convenient output format. Each record is written as a separate line, which makes the file easy to inspect, append, and process later.
import json
from pathlib import Path
from bs4 import BeautifulSoup
output_path = Path("g2_mongodb_reviews.jsonl")
unlocker_soup = BeautifulSoup(unlocker_html, "html.parser")
page_text = unlocker_soup.get_text(" ", strip=True)
unlocker_headings = [h.get_text(strip=True) for h in unlocker_soup.find_all("h2")]
record = {
"url": TARGET_URL,
"html_length": len(unlocker_html),
"headings_count": len(unlocker_headings),
"first_headings": unlocker_headings[:5],
"contains_mongodb": "mongodb" in page_text.lower(),
"contains_reviews": "reviews" in page_text.lower(),
}
with output_path.open("w", encoding="utf-8") as file:
file.write(json.dumps(record, ensure_ascii=False) + "\n")
print(f"saved: {output_path}")
At this point, the scraper has a clean shape: fetch the page body through Web Unlocker, parse the HTML with BeautifulSoup, extract structured signals, and save the result.
What Actually Mattered in Practice
1. Check the response before debugging the parser
When a scraper returns empty results, the instinct is to change selectors. Before touching parsing logic, inspect the raw response. If the status code is 403 or the HTML is a block page, no selector fix will help.
2. Direct API access kept the code minimal
The scraper authenticated with a Bearer token to https://api.brightdata.com/request. No proxy credentials, no browser lifecycle, no rotation logic. The integration stayed close to a normal requests workflow.
3. The UI setup mattered
Bright Data has several products that solve different scraping problems. Browser API, Web Unlocker API, SERP API, and proxy zones are not the same thing. For this article, the required product was Web Unlocker API under Web Access APIs. I also enabled Manual 'expect' elements for the zone because the request uses x-unblock-expect. Without that setting, the API can reject the manual expect parameter.
4. Know when to upgrade to Browser API
If the workflow requires clicks, form fills, or multi-step navigation, Browser API with Playwright is the better tool. Web Unlocker covers the simpler case: one URL in, rendered HTML out.
Conclusion
The main value of Bright Data Web Unlocker API is that it keeps a certain class of scraping tasks simple.
Raw requests is still a good starting point for static pages. Browser API and Playwright are still the right tools when a workflow requires browser interaction. But many real scraping tasks sit between those two ends of the spectrum.
The page may be protected, blocked, or difficult to access directly. The workflow may still only need one thing: a usable page body that can be parsed with normal Python tools.
That is the gap Web Unlocker fills.
The scraper sent one Python request to Web Unlocker API, received a large parseable HTML response from a protected G2 page, extracted structured signals with BeautifulSoup, and saved the result as JSON Lines. The code stayed close to the original requests workflow, while Web Unlocker handled the access layer that usually makes modern scraping fragile.
That is the practical win: the scraper becomes a parser again.
Try Web Unlocker API when your scraper needs rendered HTML without managing proxies or browser sessions. For interactive workflows with clicks, logins, or forms, use Browser API with Playwright.


