Datasets API endpoint
Use Crawlora's Search the GitHub users dataset API to search or inspect stored structured datasets as JSON. This page includes request parameters, cURL examples, response schema, validation behavior, credit cost, and a Playground link for testing before integration. Dataset endpoints read indexed records and do not apply proxy routing.
/datasets/github-users/searchSearches enriched public GitHub user profiles stored in a search index. influence_tier enum: `nano`, `micro`, `mid`, `macro`, `mega`. Sort enum: `relevance`, `rank_score_desc`, `followers_desc`, `account_age_desc`, `account_age_asc`, `distance_asc`. Developers commonly use this endpoint for repeatable dataset search, filtering, facets, local business enrichment, analytics, exports, and internal tools that need structured records beyond the limited manual refinement available in the Google Maps app. Authentication uses the x-api-key header, usage is metered with the credit cost shown on this page, and the request does not trigger live scraping or proxy routing.
Request parameters are generated from the active endpoint catalog. Dataset parameters filter, page, facet, or locate stored structured records; they do not configure a live scraper or proxy path.
| Parameter | Type | Required | Default | Description | Example |
|---|---|---|---|---|---|
| q | string | No | Full-text query over login, name, company, bio and location, max 256 characters | ||
| login | string | No | Exact login filter, max 128 characters | ||
| company | string | No | Exact normalized-company filter, max 128 characters | ||
| influence_tier | string | No | Follower-tier enum: nano, micro, mid, macro, mega | ||
| country | string | No | Exact geocoded country filter, max 128 characters | ||
| country_code | string | No | Exact ISO country-code filter, max 128 characters | ||
| state | string | No | Exact geocoded state filter, max 128 characters | ||
| city | string | No | Exact geocoded city filter, max 128 characters | ||
| domain | string | No | Interest-domain tag filter (e.g. ml-ai, web, devops), max 128 characters | ||
| has_email | boolean | No | Filter by public email presence | ||
| has_twitter | boolean | No | Filter by public Twitter/X handle presence | ||
| has_blog | boolean | No | Filter by public blog/website presence | ||
| reachable | boolean | No | Filter by any public contact channel | ||
| active_90d | boolean | No | Filter by activity within the last 90 days | ||
| hireable | boolean | No | Filter by the GitHub available-for-hire flag | ||
| is_org | boolean | No | Organization filter (normally false; the crawl indexes individuals) | ||
| is_bot | boolean | No | Bot filter (normally false; the crawl skips bots) | ||
| min_followers | integer | No | Minimum follower count | ||
| max_followers | integer | No | Maximum follower count | ||
| min_repos | integer | No | Minimum public repository count | ||
| min_rank_score | integer | No | Minimum composite rank score | ||
| min_account_age_years | number | No | Minimum account age in years | ||
| max_account_age_years | number | No | Maximum account age in years | ||
| lat | number | No | Latitude for radius filtering or distance sort | ||
| lon | number | No | Longitude for radius filtering or distance sort | ||
| radius_m | integer | No | Radius in meters, 1 through 50000; requires lat and lon when supplied | ||
| sort | string | No | Sort enum: relevance, rank_score_desc, followers_desc, account_age_desc, account_age_asc, distance_asc | ||
| page | integer | No | 1 | Page number, defaults to 1 | |
| page_size | integer | No | 20 and maxes at 100 | Page size, defaults to 20 and maxes at 100; page * page_size must be <= 10000 | |
| x-api-key (header) | string | Yes | API key required |
curl -X GET "https://api.crawlora.net/api/v1/datasets/github-users/search?q=coffee&country=us&has_email=true&has_twitter=true&has_blog=true&reachable=true&active_90d=true&hireable=true&is_org=true&is_bot=true&page=1" \ -H "x-api-key: $CRAWLORA_API_KEY"
Send your scraping API key in the x-api-key header. Use the console API Keys page to rotate or select the active key.
Endpoint usage is metered in credits. The plan prices, included credits, limits, and overage rates below match the active backend billing configuration.
| Plan | Price | Included credits | Daily cap | Rate limit | Overage |
|---|---|---|---|---|---|
| Free | $0/mo | 2,000 | 500 daily credits | 5/min | No overage |
| Starter | $9/mo | 20,000 | 5,000 daily credits | 15/min | $0.75/1,000 overage credits when enabled |
| Growth | $29/mo | 100,000 | 25,000 daily credits | 45/min | $0.45/1,000 overage credits when enabled |
| Pro | $79/mo | 400,000 | No daily cap | 120/min | $0.30/1,000 overage credits |
| Business | $199/mo | 1,200,000 | No daily cap | 300/min | $0.20/1,000 overage credits |
| Enterprise | $499/mo | 5,000,000 | No daily cap | 1,000/min | $0.12/1,000 overage credits |
This endpoint reads stored indexed dataset records. It does not execute a live upstream Google Maps request, browser session, or proxy-routed scraping job.
- Defaults to `relevance` sort when `q` is supplied, otherwise `rank_score_desc`. - `lat` and `lon` must be supplied together. `distance_asc` requires `lat` and `lon`, but does not require `radius_m`. - `min_followers` must not exceed `max_followers`. - The maximum result window is `10000`; `page * page_size` must not exceed `10000`. - Invalid enum values return the standard invalid params envelope. - Returns an empty `items` array (not an error) when the dataset has no matches yet. - Does not trigger live scraping. Example response: ```json { "code": 200, "msg": "OK", "data": { "dataset": "github-users", "items": [ { "login": "octodev", "name": "Octo Dev", "company_normalized": "google", "influence_tier": "mid", "geo": { "country": "Germany", "country_code": "DE", "city": "Berlin" }, "followers": 1200, "reachable": true, "has_email": true, "domains": ["ml-ai"], "rank_score": 88 } ], "page": 1, "page_size": 20, "total": 1, "sort": "rank_score_desc" } } ```
Crawlora does not silently return invalid dataset search results when filters, pagination, coordinates, or stored record lookups cannot be satisfied.
| Status | Common failure case |
|---|---|
| 400 | Invalid input, missing required parameter, invalid enum, bad coordinate pair, or result window beyond the dataset limit |
| 404 | Requested stored dataset item is not present |
| 429 | Plan or endpoint rate limit exceeded |
| 500 | Internal dataset query or storage error |
When possible, Crawlora returns structured error context so your integration can adjust filters, page size, location inputs, or lookup identifiers.
| Status | Description | Schema |
|---|---|---|
| 400 | Bad Request | #/definitions/app.Response |
| 429 | Too Many Requests | #/definitions/app.Response |
| 500 | Internal Server Error | #/definitions/app.Response |
{
"code": 200,
"msg": "OK",
"data": {
"dataset": "github-users",
"items": [
{
"login": "octodev",
"name": "Octo Dev",
"company_normalized": "google",
"influence_tier": "mid",
"geo": {
"country": "Germany",
"country_code": "DE",
"city": "Berlin"
},
"followers": 1200,
"reachable": true,
"has_email": true,
"domains": [
"ml-ai"
],
"rank_score": 88
}
],
"page": 1,
"page_size": 20,
"total": 1,
"sort": "rank_score_desc"
}
}Request schema
No body schema
Response schema
#/definitions/datasets.githubUsersSearchResponseDoc
| Field | Type | Required | Enum | Bounds | Example | Description |
|---|---|---|---|---|---|---|
| code | integer | No | 200 | |||
| data | datasets.GithubUserSearchResponse | No | ||||
| data.dataset | string | No | ||||
| data.items | array | No | ||||
| data.items[].account_age_years | number | No | ||||
| data.items[].active_90d | boolean | No | ||||
| data.items[].avatar_url | string | No | ||||
| data.items[].bio | string | No | ||||
| data.items[].blog | string | No | ||||
| data.items[].company | string | No | ||||
| data.items[].company_normalized | string | No | ||||
| data.items[].crawled_at | string | No | ||||
| data.items[].created_at | string | No | ||||
| data.items[].distance_m | number | No | ||||
| data.items[].domains | array | No | ||||
| data.items[].email | string | No | ||||
| data.items[].follower_following_ratio | number | No | ||||
| data.items[].followers | integer | No | ||||
| data.items[].following | integer | No | ||||
| data.items[].geo | es.GithubGeo | No | ||||
| data.items[].geo.city | string | No | ||||
| data.items[].geo.country | string | No | ||||
| data.items[].geo.country_code | string | No | ||||
| data.items[].geo.location | es.GithubGeoPoint | No | ||||
| data.items[].geo.location.lat | number | No | ||||
| data.items[].geo.location.lon | number | No | ||||
| data.items[].geo.state | string | No | ||||
| data.items[].has_blog | boolean | No | ||||
| data.items[].has_email | boolean | No | ||||
| data.items[].has_twitter | boolean | No | ||||
| data.items[].hireable | boolean | No | ||||
| data.items[].html_url | string | No | ||||
| data.items[].id | integer | No | ||||
| data.items[].influence_tier | string | No | ||||
| data.items[].is_bot | boolean | No | ||||
| data.items[].is_org | boolean | No | ||||
| data.items[].last_active_at | string | No | ||||
| data.items[].location_raw | string | No | ||||
| data.items[].login | string | No | ||||
| data.items[].name | string | No | ||||
| data.items[].prs_30d | integer | No | ||||
| data.items[].public_gists | integer | No | ||||
| data.items[].public_repos | integer | No | ||||
| data.items[].pushes_30d | integer | No | ||||
| data.items[].rank_score | integer | No | ||||
| data.items[].reachable | boolean | No | ||||
| data.items[].reviews_30d | integer | No | ||||
| data.items[].schema_version | integer | No | ||||
| data.items[].social_accounts | array | No | ||||
| data.items[].social_accounts[].provider | string | No | ||||
| data.items[].social_accounts[].url | string | No | ||||
| data.items[].social_count | integer | No | ||||
| data.items[].twitter_username | string | No | ||||
| data.items[].type | string | No | ||||
| data.page | integer | No | ||||
| data.page_size | integer | No | ||||
| data.sort | string | No | ||||
| data.total | integer | No | ||||
| msg | string | No | OK |
Use environment variables for secrets and keep Crawlora API keys server-side.
curl -X GET "https://api.crawlora.net/api/v1/datasets/github-users/search?q=coffee&country=us&has_email=true&has_twitter=true&has_blog=true&reachable=true&active_90d=true&hireable=true&is_org=true&is_bot=true&page=1" \
-H "x-api-key: $CRAWLORA_API_KEY"Crawlora is designed for responsible structured public web data workflows. Customers are responsible for using Crawlora in compliance with applicable laws, third-party rights, target-platform rules, and Crawlora terms.
Read Crawlora terms