PVZ Intelligence

Geospatial acquisition for pickup points, lockers, and logistics nodes in the CIS

There is no single magical endpoint that gives a complete, clean, and durable list of Wildberries, Ozon, and CDEK pickup points. A useful system is built by combining public geodata, first-party APIs, seller-scoped endpoints, map providers, and scraping only where the business case justifies the cost and legal review.

This page turns that collection problem into an engineering plan: where coordinates come from, what each source is good at, and how to normalize them into one usable registry.

Open DWH service Discuss the project

Public baseline firstCoordinates plus provenanceCollectors live in /root/automation

On this page

Automation boundary

No scraping logic is placed inside `websites/koveh`. The route is descriptive; the collectors are operational.

Executive summary

The reliable answer is a stack, not a source.

For CIS pickup-point intelligence, the best starting layer is public and reproducible: OpenStreetMap via Overpass, plus the sources that expose operationally useful coordinates without forcing you into a closed seller workflow. CDEK is structurally friendlier because its network is meant to be embedded into checkout flows. Ozon and Wildberries are harder because their seller APIs are designed around merchant operations, warehouse logic, and order lifecycles, not around publishing a public geospatial index of every active point. That means a serious implementation collects what is public, keeps seller-specific collectors separate, records source provenance for every coordinate, and resolves duplicates later instead of pretending the upstream world is already normalized.

Start with open coverage

OpenStreetMap and Overpass provide the fastest way to build a first registry of branded points, especially when you need wide-area coverage without commercial lock-in.

Treat seller APIs as scoped operational feeds

Wildberries and Ozon APIs are valuable, but they are usually tied to account context, order flows, or fulfillment settings instead of exposing a universal public network dump.

Store confidence and source lineage

A coordinate without provenance becomes impossible to trust later. Keep the provider, method, timestamp, and matching evidence next to every point.

Collection stack

Use each source for the part of the problem it actually solves.

No single vendor covers every need. The practical approach is to assign each source a clear role: public baseline, authenticated operational feed, licensed directory, or fallback scraping layer. That keeps cost, legal exposure, and maintenance visible from day one.

OpenStreetMap / Overpass

Best public baseline for brand-based discovery, large-area exports, and repeatable GeoJSON generation. Good for first-pass coverage and geographic clustering.

CDEK first-party API

Useful when you need precise operational delivery-point metadata from a carrier whose architecture is built for third-party integration.

Ozon and Wildberries seller APIs

Useful for merchant-side workflows, warehouse logic, and order-linked pickup metadata. Not a substitute for a universal public PVZ directory.

ApiShip and similar aggregators

Useful when the business wants one normalized gateway across several carriers instead of building each connector separately.

Yandex Places / 2GIS Places

Useful when you need commercial directory breadth, brand search, and richer business metadata, with the caveat that licensing and caching rules matter.

Headless map scraping

Best for

Filling the last gaps when official interfaces are blocked or impractical

Constraints

Highest maintenance cost and the highest legal-review requirement. Anti-bot changes can break the collector at any time.

Normalization pipeline

Collect first, deduplicate second, publish last.

Teams often fail by trying to make every upstream source perfect. The better pattern is a layered pipeline: ingest raw payloads, normalize fields, attach provenance, then resolve overlap with spatial and address logic. That is how you end up with a registry you can trust in routing, market analysis, or warehouse planning.

Collect

Pull raw points from public, seller, carrier, and licensed sources into a raw layer. Keep the original payload and the request context.

Normalize

Map every source to one schema: provider, point type, latitude, longitude, address fields, provider identifier, source timestamp, and source method.

Conflate

Merge duplicates with distance checks, brand matching, normalized addresses, and state identifiers such as FIAS where available.

Publish

Expose the mastered registry to maps, BI, route engines, and internal analytics only after every record has source lineage and confidence.

Unified schema

`provider`, `provider_point_id`, `brand`, `point_type`, `latitude`, `longitude`, `address`, `region`, `city`, `fias_guid`, `source_method`, `source_url`, `collected_at`, `confidence`.

Collector layout

The website stays descriptive. The collectors stay in /root/automation.

The `/pvz` route in `websites/koveh` is only the brief and the explanation layer. The executable collection logic is kept outside the app so scraping, credentials, and extraction workflows do not leak into the Next.js project. The new PVZ collector hub lives under `/root/automation/pvz-collectors` and also points to existing Yandex collector code already present in the workspace.

Example public baseline run

python /root/automation/pvz-collectors/collect_osm_overpass.py --area "Moscow" --brand ozon --brand wildberries --brand cdek --output /root/automation/pvz-collectors/output/moscow.geojson

/root/automation/pvz-collectors/collect_osm_overpass.py

Implemented public collector

A standard-library Python CLI that queries Overpass, expands PVZ brand aliases, and exports GeoJSON or raw JSON for the public baseline layer.

/root/automation/pvz-collectors/provider_matrix.json

Registry of sources

A machine-readable list of implemented, existing, and planned collectors so the team can track access model, coordinate fidelity, and current status.

/root/automation/yandex_maps_scraper/main.py

Existing Yandex Maps collector

The workspace already includes a dedicated Yandex Maps scraper that can be reused when the project needs browser-based directory extraction.

/root/automation/maps_rpa/09-rpa-selenium

Existing Selenium RPA variants

There is also a separate bank of Yandex-oriented Selenium collectors and experiments for heavier browser automation scenarios.

Reference set

Official docs and public references behind the current layout

This page was structured around the official or primary sources that are most relevant to the collection strategy. The goal is not to flatten them into one claim-heavy article, but to keep a traceable path from engineering decision to source material.

Wildberries in-store pickup API Wildberries sandbox environment Ozon Seller API intro CDEK API portal ApiShip: GET /lists/points OpenStreetMap: Ozon organised editing activity Yandex Places API quick start 2GIS Places API overview

Operational details and commercial terms can change. For production collectors, the source-specific connector should always be checked against the live documentation before implementation or rollout.

Koveh

Use the website as the brief. Use automation as the execution layer.

That split keeps the product cleaner. The page explains what is possible and how the data should be modeled. The automation folder owns collectors, credentials, scraping logic, and source-specific maintenance.

Open DWH service Discuss the project