Geospatial acquisition for pickup points, lockers, and logistics nodes in the CIS
There is no single magical endpoint that gives a complete, clean, and durable list of Wildberries, Ozon, and CDEK pickup points. A useful system is built by combining public geodata, first-party APIs, seller-scoped endpoints, map providers, and scraping only where the business case justifies the cost and legal review.
This page turns that collection problem into an engineering plan: where coordinates come from, what each source is good at, and how to normalize them into one usable registry.
On this page
Automation boundary
No scraping logic is placed inside `websites/koveh`. The route is descriptive; the collectors are operational.
Executive summary
The reliable answer is a stack, not a source.
For CIS pickup-point intelligence, the best starting layer is public and reproducible: OpenStreetMap via Overpass, plus the sources that expose operationally useful coordinates without forcing you into a closed seller workflow. CDEK is structurally friendlier because its network is meant to be embedded into checkout flows. Ozon and Wildberries are harder because their seller APIs are designed around merchant operations, warehouse logic, and order lifecycles, not around publishing a public geospatial index of every active point. That means a serious implementation collects what is public, keeps seller-specific collectors separate, records source provenance for every coordinate, and resolves duplicates later instead of pretending the upstream world is already normalized.
Start with open coverage
OpenStreetMap and Overpass provide the fastest way to build a first registry of branded points, especially when you need wide-area coverage without commercial lock-in.
Treat seller APIs as scoped operational feeds
Wildberries and Ozon APIs are valuable, but they are usually tied to account context, order flows, or fulfillment settings instead of exposing a universal public network dump.
Store confidence and source lineage
A coordinate without provenance becomes impossible to trust later. Keep the provider, method, timestamp, and matching evidence next to every point.
Collection stack
Use each source for the part of the problem it actually solves.
No single vendor covers every need. The practical approach is to assign each source a clear role: public baseline, authenticated operational feed, licensed directory, or fallback scraping layer. That keeps cost, legal exposure, and maintenance visible from day one.
OpenStreetMap / Overpass
Best public baseline for brand-based discovery, large-area exports, and repeatable GeoJSON generation. Good for first-pass coverage and geographic clustering.
CDEK first-party API
Useful when you need precise operational delivery-point metadata from a carrier whose architecture is built for third-party integration.
Ozon and Wildberries seller APIs
Useful for merchant-side workflows, warehouse logic, and order-linked pickup metadata. Not a substitute for a universal public PVZ directory.
ApiShip and similar aggregators
Useful when the business wants one normalized gateway across several carriers instead of building each connector separately.
Yandex Places / 2GIS Places
Useful when you need commercial directory breadth, brand search, and richer business metadata, with the caveat that licensing and caching rules matter.
Headless map scraping
Useful as the last fallback when the needed attributes are visible in consumer interfaces but blocked in official APIs or priced beyond the project budget.
Source matrix
Choose the source by access model, not by hype.
The matrix below is the engineering decision table. It keeps the team honest about what is public, what is credentialed, what returns native coordinates, and what will turn into a licensing or maintenance burden.
OpenStreetMap / Overpass
Public
Best for
Wide-area baseline, OSINT, rapid GeoJSON export
Constraints
Coverage depends on community and organized edits. Large national extracts are better done in regional chunks or from local PBF files.
CDEK API
Credentialed
Best for
Carrier-grade point lookup, integration into logistics flows
Constraints
Authentication and carrier-specific semantics still apply, so it should be modeled as one provider, not the whole market.
Ozon / Wildberries seller APIs
Seller-scoped
Best for
Merchant operations, warehouse routing, order-linked pickup metadata
Constraints
Not designed as a global public pickup-point index. Rate limits and account scope complicate broad extraction.
ApiShip
Credentialed aggregator
Best for
One connector for multiple logistics providers
Constraints
You inherit aggregator coverage rules and schema choices instead of raw carrier semantics.
Yandex Places / 2GIS Places
Commercial API
Best for
Directory breadth, ranking, enrichment, business metadata
Constraints
Pricing, caching, and data-retention rules must be reviewed before building a persistent internal registry from them.
Headless map scraping
Automation / browser layer
Best for
Filling the last gaps when official interfaces are blocked or impractical
Constraints
Highest maintenance cost and the highest legal-review requirement. Anti-bot changes can break the collector at any time.
Normalization pipeline
Collect first, deduplicate second, publish last.
Teams often fail by trying to make every upstream source perfect. The better pattern is a layered pipeline: ingest raw payloads, normalize fields, attach provenance, then resolve overlap with spatial and address logic. That is how you end up with a registry you can trust in routing, market analysis, or warehouse planning.
01
Collect
Pull raw points from public, seller, carrier, and licensed sources into a raw layer. Keep the original payload and the request context.
02
Normalize
Map every source to one schema: provider, point type, latitude, longitude, address fields, provider identifier, source timestamp, and source method.
03
Conflate
Merge duplicates with distance checks, brand matching, normalized addresses, and state identifiers such as FIAS where available.
04
Publish
Expose the mastered registry to maps, BI, route engines, and internal analytics only after every record has source lineage and confidence.
Unified schema
`provider`, `provider_point_id`, `brand`, `point_type`, `latitude`, `longitude`, `address`, `region`, `city`, `fias_guid`, `source_method`, `source_url`, `collected_at`, `confidence`.
Collector layout
The website stays descriptive. The collectors stay in /root/automation.
The `/pvz` route in `websites/koveh` is only the brief and the explanation layer. The executable collection logic is kept outside the app so scraping, credentials, and extraction workflows do not leak into the Next.js project. The new PVZ collector hub lives under `/root/automation/pvz-collectors` and also points to existing Yandex collector code already present in the workspace.
Example public baseline run
python /root/automation/pvz-collectors/collect_osm_overpass.py --area "Moscow" --brand ozon --brand wildberries --brand cdek --output /root/automation/pvz-collectors/output/moscow.geojson/root/automation/pvz-collectors/collect_osm_overpass.py
Implemented public collector
A standard-library Python CLI that queries Overpass, expands PVZ brand aliases, and exports GeoJSON or raw JSON for the public baseline layer.
/root/automation/pvz-collectors/provider_matrix.json
Registry of sources
A machine-readable list of implemented, existing, and planned collectors so the team can track access model, coordinate fidelity, and current status.
/root/automation/yandex_maps_scraper/main.py
Existing Yandex Maps collector
The workspace already includes a dedicated Yandex Maps scraper that can be reused when the project needs browser-based directory extraction.
/root/automation/maps_rpa/09-rpa-selenium
Existing Selenium RPA variants
There is also a separate bank of Yandex-oriented Selenium collectors and experiments for heavier browser automation scenarios.
Reference set
Official docs and public references behind the current layout
This page was structured around the official or primary sources that are most relevant to the collection strategy. The goal is not to flatten them into one claim-heavy article, but to keep a traceable path from engineering decision to source material.
Operational details and commercial terms can change. For production collectors, the source-specific connector should always be checked against the live documentation before implementation or rollout.
Koveh
Use the website as the brief. Use automation as the execution layer.
That split keeps the product cleaner. The page explains what is possible and how the data should be modeled. The automation folder owns collectors, credentials, scraping logic, and source-specific maintenance.