Tony WangJune 19, 20264 min read

How to Scrape Shopify Stores in 2026 (API & Python)

Three ways to scrape Shopify store products and collections in 2026 — DIY Python, no-code tools, or a structured API — what each returns and the legal basics.

Shopify Guide Web Scraping API

The fastest way to scrape Shopify stores in 2026 is to call a structured API that returns normalized JSON — products, variants, prices, collections, and store metadata — instead of crawling each storefront and parsing it yourself. DIY in Python is possible because most Shopify stores expose a public products feed, but variant handling, pagination, and per-store quirks make a maintained endpoint the simpler path at scale.

Shopify's own Admin and Storefront APIs require the store owner's credentials — they're for stores you control. To research storefronts you don't own, you collect the public product surface, which is what a structured scraping API normalizes for you.

Is it legal to scrape Shopify stores?

Storefront product pages are public, and collecting public data is generally treated differently from accessing private accounts — with the usual conditions:

Collect only public storefront data — no admin or checkout access.
Respect each store's terms and robots directives, plus your local law.
Don't reuse product imagery or copy beyond what your use case and law allow.
You are responsible for lawful, good-faith use of what you collect.

Not legal advice — see Is web scraping legal in 2026? for the full picture.

Option 1: DIY in Python (and why it breaks)

Many Shopify stores expose a public products.json feed, so a first pass looks easy:

import csv, requests

resp = requests.get(
    "https://www.allbirds.com/products.json",
    params={"limit": 250, "page": 1},  # walk page=1,2,... until the list is empty
    headers={"User-Agent": "Mozilla/5.0"},
)
rows = []
for p in resp.json()["products"]:
    for v in p["variants"]:                       # flatten nested variants into rows
        rows.append({"title": p["title"], "handle": p["handle"],
                     "variant": v["title"], "price": v["price"], "available": v["available"]})

with open("shopify.csv", "w", newline="") as f:
    w = csv.DictWriter(f, fieldnames=rows[0].keys()); w.writeheader(); w.writerows(rows)
# ...then handle HTTP 430 rate limits and stores that disable /products.json

Where it gets expensive:

Inconsistent exposure — some stores disable the public feed, and Shopify returns HTTP 430 when you hit its rate limit, so you need backoff, proxies, and fallbacks.
Variant flattening — each product has nested variants, options, and images to normalize into rows.
Pagination — /products.json caps at 250 per page; large catalogs span many pages to walk and dedupe, and /collections.json is a second crawl.
Per-store differences — currencies, availability, and metafields vary across themes.

Option 2: No-code and ready-made tools

Point-and-click exporters can dump one store, but catalog and price monitoring means re-checking many stores on a schedule and storing history — a pipeline that an API serves better than a manual tool.

Option 3: A structured Shopify API

Crawlora's Shopify API wraps product, collection, and store endpoints behind one API key, returning normalized JSON. Point it at a store URL:

curl -G "https://api.crawlora.net/api/v1/shopify/products" \
  -H "x-api-key: $CRAWLORA_API_KEY" \
  --data-urlencode "url=https://www.allbirds.com" \
  --data-urlencode "limit=50"

import requests

resp = requests.get(
    "https://api.crawlora.net/api/v1/shopify/products",
    headers={"x-api-key": "YOUR_API_KEY"},
    params={"url": "https://www.allbirds.com", "limit": 50},
)
for product in resp.json()["data"]["products"]:
    print(product["title"], product["price"], product["handle"])

A response is normalized JSON you can store directly (fields are illustrative — confirm the schema in the docs):

{
  "code": 200,
  "msg": "OK",
  "data": {
    "products": [
      {
        "handle": "wool-runner",
        "title": "Wool Runner",
        "price": 98.0,
        "currency": "USD",
        "available": true,
        "variants": [{ "title": "US 9", "price": 98.0, "available": true }]
      }
    ]
  }
}

From there, map the catalog with collections, pull full product detail by handle, or read store metadata — all from the same key (every endpoint takes the store url):

h = {"x-api-key": "YOUR_API_KEY"}
base, store = "https://api.crawlora.net/api/v1/shopify", "https://www.allbirds.com"

collections = requests.get(f"{base}/collections", headers=h, params={"url": store}).json()["data"]
product = requests.get(f"{base}/products/wool-runner", headers=h, params={"url": store}).json()["data"]
meta = requests.get(f"{base}/store", headers=h, params={"url": store}).json()["data"]

Use /collections/{handle}/products to walk one collection, and the sitemap endpoints to discover every product and collection URL on the storefront — handy when /products.json is disabled.

What you can collect

Products: title, handle, price, availability, images, and variants
Collections and the products within a collection
Store metadata and storefront sitemaps
Search suggestions and product recommendations

Limitations and common challenges

Not every store exposes the feed. Some disable /products.json or rate-limit it (HTTP 430); the store endpoint can fall back to a public *.myshopify.com domain, but coverage isn't guaranteed — use the sitemap endpoints as a backup for discovery.
Variants and metafields. Products nest variants, options, and images; flatten them into rows and expect theme-specific metafields to differ across stores.
Pagination. Products and collections page at 250 max per page; walk and dedupe across pages.
Imagery and copy are copyrighted. Prices and availability are facts you can collect; product photos and descriptions carry copyright — don't republish beyond what your use case and law allow.

Sources

Where this fits

Try it first, free: run any public URL through the Free Web Scraper, or check whether a site blocks bots with the Anti-Bot Checker — no signup.

Shopify data powers catalog monitoring, competitor price tracking, and assortment research. Combine it with the Shop.app API for the consumer marketplace view and the Amazon scraping API for cross-channel pricing, all under the e-commerce product intelligence workflow. For the same playbook on other marketplaces, see how to scrape Amazon product data and how to scrape eBay, or how to choose a web scraping API.

Get started by testing the endpoint in the Playground, reading the request and response schema in the API docs, and reviewing credit costs on the pricing page.

Frequently asked questions

Can I scrape Shopify stores without getting blocked?

Crawlora handles proxy routing, pacing, retries, and fallbacks behind the API and returns normalized JSON, including for stores that rate-limit (HTTP 430) or disable the public products feed.

Does every Shopify store have a products.json feed?

Most do — /products.json is a credential-free endpoint returning up to 250 products per page — but some stores disable it or rate-limit it (HTTP 430). When the vanity domain blocks it, Crawlora can fall back to a public *.myshopify.com domain, and the sitemap endpoints discover product and collection URLs as a backup.

Doesn't Shopify already have an API?

Shopify's Admin and Storefront APIs require the store owner's credentials and are for stores you control. To research storefronts you don't own, Crawlora collects the public product surface as normalized JSON.

What data can I collect?

Products with variants, prices, availability, and images; collections and the products within them; store metadata; static pages; storefront sitemaps; search suggestions; and product recommendations. Prices and availability are facts; product imagery and copy are copyrighted.

How do I target a specific store?

Pass the store URL to the products endpoint. Use the product handle (with the store url) for full detail, the collections endpoint to map catalog structure, and the sitemap endpoints to enumerate URLs.

Can I monitor prices and catalog changes?

Yes. Re-run product and collection calls on a schedule and store the history to track price moves and assortment changes.

Tony WangJune 19, 20264 min read

How to Scrape Shopify Stores in 2026 (API & Python)

Three ways to scrape Shopify store products and collections in 2026 — DIY Python, no-code tools, or a structured API — what each returns and the legal basics.

Shopify Guide Web Scraping API

Is it legal to scrape Shopify stores?

Storefront product pages are public, and collecting public data is generally treated differently from accessing private accounts — with the usual conditions:

Collect only public storefront data — no admin or checkout access.
Respect each store's terms and robots directives, plus your local law.
Don't reuse product imagery or copy beyond what your use case and law allow.
You are responsible for lawful, good-faith use of what you collect.

Not legal advice — see Is web scraping legal in 2026? for the full picture.

Option 1: DIY in Python (and why it breaks)

Many Shopify stores expose a public products.json feed, so a first pass looks easy:

import csv, requests

resp = requests.get(
    "https://www.allbirds.com/products.json",
    params={"limit": 250, "page": 1},  # walk page=1,2,... until the list is empty
    headers={"User-Agent": "Mozilla/5.0"},
)
rows = []
for p in resp.json()["products"]:
    for v in p["variants"]:                       # flatten nested variants into rows
        rows.append({"title": p["title"], "handle": p["handle"],
                     "variant": v["title"], "price": v["price"], "available": v["available"]})

with open("shopify.csv", "w", newline="") as f:
    w = csv.DictWriter(f, fieldnames=rows[0].keys()); w.writeheader(); w.writerows(rows)
# ...then handle HTTP 430 rate limits and stores that disable /products.json

Where it gets expensive:

Inconsistent exposure — some stores disable the public feed, and Shopify returns HTTP 430 when you hit its rate limit, so you need backoff, proxies, and fallbacks.
Variant flattening — each product has nested variants, options, and images to normalize into rows.
Pagination — /products.json caps at 250 per page; large catalogs span many pages to walk and dedupe, and /collections.json is a second crawl.
Per-store differences — currencies, availability, and metafields vary across themes.

Option 2: No-code and ready-made tools

Option 3: A structured Shopify API

Crawlora's Shopify API wraps product, collection, and store endpoints behind one API key, returning normalized JSON. Point it at a store URL:

curl -G "https://api.crawlora.net/api/v1/shopify/products" \
  -H "x-api-key: $CRAWLORA_API_KEY" \
  --data-urlencode "url=https://www.allbirds.com" \
  --data-urlencode "limit=50"

import requests

resp = requests.get(
    "https://api.crawlora.net/api/v1/shopify/products",
    headers={"x-api-key": "YOUR_API_KEY"},
    params={"url": "https://www.allbirds.com", "limit": 50},
)
for product in resp.json()["data"]["products"]:
    print(product["title"], product["price"], product["handle"])

A response is normalized JSON you can store directly (fields are illustrative — confirm the schema in the docs):

{
  "code": 200,
  "msg": "OK",
  "data": {
    "products": [
      {
        "handle": "wool-runner",
        "title": "Wool Runner",
        "price": 98.0,
        "currency": "USD",
        "available": true,
        "variants": [{ "title": "US 9", "price": 98.0, "available": true }]
      }
    ]
  }
}

From there, map the catalog with collections, pull full product detail by handle, or read store metadata — all from the same key (every endpoint takes the store url):

h = {"x-api-key": "YOUR_API_KEY"}
base, store = "https://api.crawlora.net/api/v1/shopify", "https://www.allbirds.com"

collections = requests.get(f"{base}/collections", headers=h, params={"url": store}).json()["data"]
product = requests.get(f"{base}/products/wool-runner", headers=h, params={"url": store}).json()["data"]
meta = requests.get(f"{base}/store", headers=h, params={"url": store}).json()["data"]

Use /collections/{handle}/products to walk one collection, and the sitemap endpoints to discover every product and collection URL on the storefront — handy when /products.json is disabled.

What you can collect

Products: title, handle, price, availability, images, and variants
Collections and the products within a collection
Store metadata and storefront sitemaps
Search suggestions and product recommendations

Limitations and common challenges

Not every store exposes the feed. Some disable /products.json or rate-limit it (HTTP 430); the store endpoint can fall back to a public *.myshopify.com domain, but coverage isn't guaranteed — use the sitemap endpoints as a backup for discovery.
Variants and metafields. Products nest variants, options, and images; flatten them into rows and expect theme-specific metafields to differ across stores.
Pagination. Products and collections page at 250 max per page; walk and dedupe across pages.
Imagery and copy are copyrighted. Prices and availability are facts you can collect; product photos and descriptions carry copyright — don't republish beyond what your use case and law allow.

Sources

Where this fits

Try it first, free: run any public URL through the Free Web Scraper, or check whether a site blocks bots with the Anti-Bot Checker — no signup.

Get started by testing the endpoint in the Playground, reading the request and response schema in the API docs, and reviewing credit costs on the pricing page.

Frequently asked questions

Can I scrape Shopify stores without getting blocked?

Crawlora handles proxy routing, pacing, retries, and fallbacks behind the API and returns normalized JSON, including for stores that rate-limit (HTTP 430) or disable the public products feed.

Does every Shopify store have a products.json feed?

Doesn't Shopify already have an API?

What data can I collect?

How do I target a specific store?

Can I monitor prices and catalog changes?

Yes. Re-run product and collection calls on a schedule and store the history to track price moves and assortment changes.

How to Scrape Shopify Stores in 2026 (API & Python)

Is it legal to scrape Shopify stores?

Option 1: DIY in Python (and why it breaks)

Option 2: No-code and ready-made tools

Option 3: A structured Shopify API

What you can collect

Limitations and common challenges

Sources

Where this fits

Frequently asked questions

How to Scrape eBay in 2026 (API & Python)

How to Scrape Google Trends in 2026 (API & Python)

How Paywalls Actually Work: The Engineering Behind Them

Scraping Sites That Block Bots: Cloudflare, DataDome & PerimeterX

How to Scrape Brave Search in 2026 (API & Python)

AI vs Traditional Web Scraping: Which Wins, When

How to Scrape Shopify Stores in 2026 (API & Python)

Is it legal to scrape Shopify stores?

Option 1: DIY in Python (and why it breaks)

Option 2: No-code and ready-made tools

Option 3: A structured Shopify API

What you can collect

Limitations and common challenges

Sources

Where this fits

Frequently asked questions

How to Scrape eBay in 2026 (API & Python)

How to Scrape Google Trends in 2026 (API & Python)

How Paywalls Actually Work: The Engineering Behind Them

Scraping Sites That Block Bots: Cloudflare, DataDome & PerimeterX

How to Scrape Brave Search in 2026 (API & Python)

AI vs Traditional Web Scraping: Which Wins, When