Tony Wang4 min readHow to Scrape Shopify Stores in 2026 (API & Python)
Three ways to scrape Shopify store products and collections in 2026 — DIY Python, no-code tools, or a structured API — what each returns and the legal basics.
The fastest way to scrape Shopify stores in 2026 is to call a structured API that returns normalized JSON — products, variants, prices, collections, and store metadata — instead of crawling each storefront and parsing it yourself. DIY in Python is possible because most Shopify stores expose a public products feed, but variant handling, pagination, and per-store quirks make a maintained endpoint the simpler path at scale.
Shopify's own Admin and Storefront APIs require the store owner's credentials — they're for stores you control. To research storefronts you don't own, you collect the public product surface, which is what a structured scraping API normalizes for you.
Is it legal to scrape Shopify stores?
Storefront product pages are public, and collecting public data is generally treated differently from accessing private accounts — with the usual conditions:
- Collect only public storefront data — no admin or checkout access.
- Respect each store's terms and robots directives, plus your local law.
- Don't reuse product imagery or copy beyond what your use case and law allow.
- You are responsible for lawful, good-faith use of what you collect.
Not legal advice — see Is web scraping legal in 2026? for the full picture.
Option 1: DIY in Python (and why it breaks)
Many Shopify stores expose a public products.json feed, so a first pass looks easy:
import csv, requests
resp = requests.get(
"https://www.allbirds.com/products.json",
params={"limit": 250, "page": 1}, # walk page=1,2,... until the list is empty
headers={"User-Agent": "Mozilla/5.0"},
)
rows = []
for p in resp.json()["products"]:
for v in p["variants"]: # flatten nested variants into rows
rows.append({"title": p["title"], "handle": p["handle"],
"variant": v["title"], "price": v["price"], "available": v["available"]})
with open("shopify.csv", "w", newline="") as f:
w = csv.DictWriter(f, fieldnames=rows[0].keys()); w.writeheader(); w.writerows(rows)
# ...then handle HTTP 430 rate limits and stores that disable /products.json
Where it gets expensive:
- Inconsistent exposure — some stores disable the public feed, and Shopify returns HTTP
430when you hit its rate limit, so you need backoff, proxies, and fallbacks. - Variant flattening — each product has nested variants, options, and images to normalize into rows.
- Pagination —
/products.jsoncaps at 250 per page; large catalogs span many pages to walk and dedupe, and/collections.jsonis a second crawl. - Per-store differences — currencies, availability, and metafields vary across themes.
Option 2: No-code and ready-made tools
Point-and-click exporters can dump one store, but catalog and price monitoring means re-checking many stores on a schedule and storing history — a pipeline that an API serves better than a manual tool.
Option 3: A structured Shopify API
Crawlora's Shopify API wraps product, collection, and store endpoints behind one API key, returning normalized JSON. Point it at a store URL:
curl -G "https://api.crawlora.net/api/v1/shopify/products" \
-H "x-api-key: $CRAWLORA_API_KEY" \
--data-urlencode "url=https://www.allbirds.com" \
--data-urlencode "limit=50"
import requests
resp = requests.get(
"https://api.crawlora.net/api/v1/shopify/products",
headers={"x-api-key": "YOUR_API_KEY"},
params={"url": "https://www.allbirds.com", "limit": 50},
)
for product in resp.json()["data"]["products"]:
print(product["title"], product["price"], product["handle"])
A response is normalized JSON you can store directly (fields are illustrative — confirm the schema in the docs):
{
"code": 200,
"msg": "OK",
"data": {
"products": [
{
"handle": "wool-runner",
"title": "Wool Runner",
"price": 98.0,
"currency": "USD",
"available": true,
"variants": [{ "title": "US 9", "price": 98.0, "available": true }]
}
]
}
}
From there, map the catalog with collections, pull full product detail by handle, or read store metadata — all from the same key (every endpoint takes the store url):
h = {"x-api-key": "YOUR_API_KEY"}
base, store = "https://api.crawlora.net/api/v1/shopify", "https://www.allbirds.com"
collections = requests.get(f"{base}/collections", headers=h, params={"url": store}).json()["data"]
product = requests.get(f"{base}/products/wool-runner", headers=h, params={"url": store}).json()["data"]
meta = requests.get(f"{base}/store", headers=h, params={"url": store}).json()["data"]
Use /collections/{handle}/products to walk one collection, and the sitemap endpoints to discover every product and collection URL on the storefront — handy when /products.json is disabled.
What you can collect
- Products: title, handle, price, availability, images, and variants
- Collections and the products within a collection
- Store metadata and storefront sitemaps
- Search suggestions and product recommendations
Limitations and common challenges
- Not every store exposes the feed. Some disable
/products.jsonor rate-limit it (HTTP430); the store endpoint can fall back to a public*.myshopify.comdomain, but coverage isn't guaranteed — use the sitemap endpoints as a backup for discovery. - Variants and metafields. Products nest variants, options, and images; flatten them into rows and expect theme-specific metafields to differ across stores.
- Pagination. Products and collections page at 250 max per page; walk and dedupe across pages.
- Imagery and copy are copyrighted. Prices and availability are facts you can collect; product photos and descriptions carry copyright — don't republish beyond what your use case and law allow.
Sources
Where this fits
Try it first, free: run any public URL through the Free Web Scraper, or check whether a site blocks bots with the Anti-Bot Checker — no signup.
Shopify data powers catalog monitoring, competitor price tracking, and assortment research. Combine it with the Shop.app API for the consumer marketplace view and the Amazon scraping API for cross-channel pricing, all under the e-commerce product intelligence workflow. For the same playbook on other marketplaces, see how to scrape Amazon product data and how to scrape eBay, or how to choose a web scraping API.
Get started by testing the endpoint in the Playground, reading the request and response schema in the API docs, and reviewing credit costs on the pricing page.
Frequently asked questions
Can I scrape Shopify stores without getting blocked?
Crawlora handles proxy routing, pacing, retries, and fallbacks behind the API and returns normalized JSON, including for stores that rate-limit (HTTP 430) or disable the public products feed.
Does every Shopify store have a products.json feed?
Most do — /products.json is a credential-free endpoint returning up to 250 products per page — but some stores disable it or rate-limit it (HTTP 430). When the vanity domain blocks it, Crawlora can fall back to a public *.myshopify.com domain, and the sitemap endpoints discover product and collection URLs as a backup.
Doesn't Shopify already have an API?
Shopify's Admin and Storefront APIs require the store owner's credentials and are for stores you control. To research storefronts you don't own, Crawlora collects the public product surface as normalized JSON.
What data can I collect?
Products with variants, prices, availability, and images; collections and the products within them; store metadata; static pages; storefront sitemaps; search suggestions; and product recommendations. Prices and availability are facts; product imagery and copy are copyrighted.
How do I target a specific store?
Pass the store URL to the products endpoint. Use the product handle (with the store url) for full detail, the collections endpoint to map catalog structure, and the sitemap endpoints to enumerate URLs.
Can I monitor prices and catalog changes?
Yes. Re-run product and collection calls on a schedule and store the history to track price moves and assortment changes.