Tony Wang5 min readHow to Scrape Real Estate Listings in 2026 (API & Python)
How to scrape real estate listings in 2026 — DIY Python, no-code tools, or a structured API for Zillow property data — with the legal basics and portal tips.
The fastest way to scrape real estate listings in 2026 is to call a structured API that returns normalized JSON — search results and home details like price, beds, baths, and address — instead of parsing JavaScript-heavy portal pages and fighting anti-bot defenses. You can build a DIY scraper, but the big portals actively block automation, and real estate carries copyright risks most guides skip. This guide covers all three approaches, the four major US portals, and the legal basics. For the portal-specific deep dive, see how to scrape Zillow.
Is it legal to scrape real estate listings?
Scraping public listing facts (price, address, beds/baths, status) is generally lower-risk public-web scraping — in the US, hiQ v. LinkedIn held that accessing public data isn’t a CFAA violation, and facts like prices aren’t copyrightable. But real estate has its own twist worth taking seriously:
- Listing photos and descriptions are copyrighted, and enforced. CoStar — which owns LoopNet, Apartments.com, and Homes.com — is the most litigious player in the space. It has sued Zillow over tens of thousands of allegedly copied listing photos, and it won against CREXi after that company accessed CoStar’s password-protected data and copied photos and listings. The lesson: stick to public, factual fields; never copy listing photos or agent descriptions.
- Never bypass a login or an access block. The CREXi case turned on accessing password-protected content and ignoring blocking notices — that’s where liability spikes, separate from reading a public page.
- Fair-housing rules apply to how you use the data (e.g. lead targeting), not just how you collect it.
Use public, factual data, respect each portal’s terms, and see is web scraping legal. Not legal advice.
The four major US portals at a glance
| Portal | Official API? | Anti-bot | Best data |
|---|---|---|---|
| Zillow | Unofficial only (Bridge API is partner-gated) | High — Imperva (Incapsula) | Zestimate, price history, tax assessment |
| Realtor.com | No public API | High — Akamai | MLS-accurate active listings, open houses |
| Redfin | Partial — offers data/CSV downloads | Medium — Cloudflare + rate limits | Sold data, Redfin Estimate, HOA, year built |
| Trulia | No (Zillow-owned) | Medium-High — shares Zillow’s Imperva stack | Neighborhood insights: crime, commute, noise |
None offers an open public listings API, which is why teams scrape — and why a structured or managed API is usually the path of least resistance.
Option 1: DIY in Python (and why it breaks)
Real estate portals render with heavy JavaScript and defend aggressively, so you reach for a headless browser:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
page = p.chromium.launch().new_page()
page.goto("https://www.zillow.com/austin-tx/")
# then parse the embedded JSON blob, page by map region, and clear the CAPTCHA...
It demos and then breaks. Zillow runs Imperva (Incapsula) with JavaScript challenges, fingerprinting, and behavioral analysis; Realtor.com adds Akamai sensor checks; Redfin layers Cloudflare and rate limiting. A naive requests.get() is blocked instantly, and even a stealth browser needs constant upkeep as those defenses update. (Why fetching, not parsing, is the real bottleneck: see AI vs traditional web scraping.)
Option 2: No-code tools
Visual extractors (point-and-click) export CSV/JSON and suit one-off pulls, but get expensive and brittle on protected portals and aren’t ideal for in-product pipelines with predictable fields.
Option 3: A structured real estate API
For repeatable workflows, a real estate data API returns normalized JSON with no browser to run. Crawlora’s supported portal today is Zillow. Resolve a location, then search:
curl -s "https://api.crawlora.net/api/v1/zillow/search?location=Austin,%20TX" \
-H "x-api-key: $CRAWLORA_API_KEY"
Fetch a single listing by ZPID in Python:
import requests
prop = requests.get(
"https://api.crawlora.net/api/v1/zillow/property/12345678",
headers={"x-api-key": "YOUR_API_KEY"},
).json()["data"]
print(prop.get("address"), prop.get("price"), prop.get("bedrooms"))
A response is normalized JSON you can store directly (fields are illustrative — check the docs):
{
"code": 200,
"msg": "OK",
"data": [
{
"zpid": "12345678",
"address": "123 Example St, Austin, TX",
"price": 625000,
"bedrooms": 3,
"bathrooms": 2,
"status": "FOR_SALE"
}
]
}
What you can collect
Where the public listing exposes them: ZPID, address, price, beds, baths, living area, lot size, home type, status, and broker, plus the search or ZPID context you requested. Resolve locations first with the autocomplete endpoint for the most stable request shape.
Portal by portal: what’s there, and how hard
- Zillow is the most data-rich (Zestimate, full price history, tax assessment, schools) and the most protected. Roughly 40% of the useful data is in the page’s JSON-LD; the rest lives in an embedded
__NEXT_DATA__-style blob that changes shape. This is the portal Crawlora supports today — see the how to scrape Zillow deep dive. - Redfin is the friendliest: it publishes downloadable data for search results and has lighter bot detection, so sold prices, HOA, lot size, year built, and the Redfin Estimate are the most accessible.
- Realtor.com pulls directly from MLS, making it the most accurate for active listings (MLS numbers, listing office, open houses) — but Akamai makes it one of the hardest to collect at scale.
- Trulia (Zillow-owned) shares the same data and stack; its differentiator is neighborhood data — crime, commute times, noise, and local reviews.
- LoopNet / CoStar (commercial real estate) is a special case: rich data, but the most aggressive legal enforcement in the industry. Treat it with extra caution.
For portals Crawlora doesn’t yet document, you’ll use a general scraping setup or another tool — and the same legal basics apply. Tell us which portals you need and we’ll prioritize coverage.
Anti-bot reality at scale
Running real estate scraping in production means accepting a few realities the demos skip:
- Residential proxies are mandatory. Datacenter IPs are burned within hours; you need US residential IPs, with sticky sessions for Zillow (which serves different data by location).
- Pace yourself. Space requests several seconds apart with jitter; unproxied, practitioners cap around 20–50 detail pages per day per IP before blocks.
- Bypasses rot. Imperva, Akamai, Cloudflare, and PerimeterX update continuously, so open-source workarounds last weeks, not months.
- Listings change daily. Prices, status, and photos shift constantly, so you re-scrape on a schedule — which multiplies every cost above.
This anti-bot, proxy, and re-scrape burden is exactly what a structured or managed API absorbs behind one key — so you spend time on the data, not the defenses.
Where this gets used
- Investor deal-finding — track listings, prices, and inventory by market and score deals.
- Comparables & market research — pull comparable listings for an area. See property market intelligence.
- Lead and territory mapping — combine listing context with local data for real-estate workflows.
Sources
Start collecting
Test the search endpoint in the Playground, check the schema in the API docs, and see the real estate data API. See also how to scrape Zillow and is web scraping legal.
Frequently asked questions
What is the easiest way to scrape real estate listings?
Call a structured API that returns search results and home details as JSON, instead of running a headless browser against portal HTML. Crawlora's Zillow endpoints return normalized property records (price, beds, baths, address) from one API key.
Is it legal to scrape real estate listings?
Collecting public listing facts (price, address, beds/baths, status) is lower-risk, but portal terms usually prohibit automated access, and photos and descriptions can be copyrighted. Keep public factual data, respect fair-housing rules and terms, and do not republish media. See our guide on whether web scraping is legal. Not legal advice.
Which real estate sites are hardest to scrape?
Zillow (and Trulia, which shares its stack) run Imperva, and Realtor.com runs Akamai, so they are the toughest at scale; Redfin is lighter (Cloudflare plus rate limits) and even publishes downloadable data. All require residential proxies and careful pacing — datacenter IPs get blocked within hours.
Is it legal to scrape real estate listing photos?
Treat photos and agent descriptions as copyrighted — don't copy or republish them. CoStar (LoopNet, Apartments.com, Homes.com) aggressively enforces listing-photo copyright and has sued over copied images. Stick to public factual fields like price, address, and beds/baths, and never bypass a login or access block.
Which real estate portals can Crawlora scrape?
Crawlora's documented real-estate endpoints today cover Zillow — search, property detail, and autocomplete. For other portals such as Redfin, Realtor.com, Trulia, and LoopNet you will use a general setup or another tool; tell us which portals to prioritize.
Can I scrape Zillow specifically?
Yes — resolve a location with autocomplete, call Zillow search by location, and fetch a listing by ZPID. See the dedicated how to scrape Zillow guide for portal-specific detail.