PVZ Intelligence

Geospatial acquisition for pickup points, lockers, and logistics nodes in the CIS

There is no single magical endpoint that gives a complete, clean, and durable list of Wildberries, Ozon, and CDEK pickup points. A useful system is built by combining public geodata, first-party APIs, seller-scoped endpoints, map providers, and scraping only where the business case justifies the cost and legal review.

This page turns that collection problem into an engineering plan: where coordinates come from, what each source is good at, and how to normalize them into one usable registry.

Public baseline firstCoordinates plus provenanceCollectors live in /root/automation

On this page

Automation boundary

No scraping logic is placed inside `websites/koveh`. The route is descriptive; the collectors are operational.

Executive summary

The reliable answer is a stack, not a source.

For CIS pickup-point intelligence, the best starting layer is public and reproducible: OpenStreetMap via Overpass, plus the sources that expose operationally useful coordinates without forcing you into a closed seller workflow. CDEK is structurally friendlier because its network is meant to be embedded into checkout flows. Ozon and Wildberries are harder because their seller APIs are designed around merchant operations, warehouse logic, and order lifecycles, not around publishing a public geospatial index of every active point. That means a serious implementation collects what is public, keeps seller-specific collectors separate, records source provenance for every coordinate, and resolves duplicates later instead of pretending the upstream world is already normalized.

Start with open coverage

OpenStreetMap and Overpass provide the fastest way to build a first registry of branded points, especially when you need wide-area coverage without commercial lock-in.

Treat seller APIs as scoped operational feeds

Wildberries and Ozon APIs are valuable, but they are usually tied to account context, order flows, or fulfillment settings instead of exposing a universal public network dump.

Store confidence and source lineage

A coordinate without provenance becomes impossible to trust later. Keep the provider, method, timestamp, and matching evidence next to every point.

Collection stack

Use each source for the part of the problem it actually solves.

No single vendor covers every need. The practical approach is to assign each source a clear role: public baseline, authenticated operational feed, licensed directory, or fallback scraping layer. That keeps cost, legal exposure, and maintenance visible from day one.

OpenStreetMap / Overpass

Best public baseline for brand-based discovery, large-area exports, and repeatable GeoJSON generation. Good for first-pass coverage and geographic clustering.

CDEK first-party API

Useful when you need precise operational delivery-point metadata from a carrier whose architecture is built for third-party integration.

Ozon and Wildberries seller APIs

Useful for merchant-side workflows, warehouse logic, and order-linked pickup metadata. Not a substitute for a universal public PVZ directory.

ApiShip and similar aggregators

Useful when the business wants one normalized gateway across several carriers instead of building each connector separately.

Yandex Places / 2GIS Places

Useful when you need commercial directory breadth, brand search, and richer business metadata, with the caveat that licensing and caching rules matter.

Headless map scraping

Useful as the last fallback when the needed attributes are visible in consumer interfaces but blocked in official APIs or priced beyond the project budget.

Source matrix

Choose the source by access model, not by hype.

The matrix below is the engineering decision table. It keeps the team honest about what is public, what is credentialed, what returns native coordinates, and what will turn into a licensing or maintenance burden.

OpenStreetMap / Overpass

Public

Native point coordinates or centroids for mapped objects

Best for

Wide-area baseline, OSINT, rapid GeoJSON export

Constraints

Coverage depends on community and organized edits. Large national extracts are better done in regional chunks or from local PBF files.

CDEK API

Credentialed

Native latitude and longitude in delivery-point responses

Best for

Carrier-grade point lookup, integration into logistics flows

Constraints

Authentication and carrier-specific semantics still apply, so it should be modeled as one provider, not the whole market.

Ozon / Wildberries seller APIs

Seller-scoped

Sometimes direct, often indirect through addresses and operational entities

Best for

Merchant operations, warehouse routing, order-linked pickup metadata

Constraints

Not designed as a global public pickup-point index. Rate limits and account scope complicate broad extraction.

ApiShip

Credentialed aggregator

Normalized point payloads with geo fields

Best for

One connector for multiple logistics providers

Constraints

You inherit aggregator coverage rules and schema choices instead of raw carrier semantics.

Yandex Places / 2GIS Places

Commercial API

Business and place coordinates in licensed directory search

Best for

Directory breadth, ranking, enrichment, business metadata

Constraints

Pricing, caching, and data-retention rules must be reviewed before building a persistent internal registry from them.

Headless map scraping

Automation / browser layer

Usually extracted from network responses or rendered app data

Best for

Filling the last gaps when official interfaces are blocked or impractical

Constraints

Highest maintenance cost and the highest legal-review requirement. Anti-bot changes can break the collector at any time.

Normalization pipeline

Collect first, deduplicate second, publish last.

Teams often fail by trying to make every upstream source perfect. The better pattern is a layered pipeline: ingest raw payloads, normalize fields, attach provenance, then resolve overlap with spatial and address logic. That is how you end up with a registry you can trust in routing, market analysis, or warehouse planning.

01

Collect

Pull raw points from public, seller, carrier, and licensed sources into a raw layer. Keep the original payload and the request context.

02

Normalize

Map every source to one schema: provider, point type, latitude, longitude, address fields, provider identifier, source timestamp, and source method.

03

Conflate

Merge duplicates with distance checks, brand matching, normalized addresses, and state identifiers such as FIAS where available.

04

Publish

Expose the mastered registry to maps, BI, route engines, and internal analytics only after every record has source lineage and confidence.

Unified schema

`provider`, `provider_point_id`, `brand`, `point_type`, `latitude`, `longitude`, `address`, `region`, `city`, `fias_guid`, `source_method`, `source_url`, `collected_at`, `confidence`.

Collector layout

The website stays descriptive. The collectors stay in /root/automation.

The `/pvz` route in `websites/koveh` is only the brief and the explanation layer. The executable collection logic is kept outside the app so scraping, credentials, and extraction workflows do not leak into the Next.js project. The new PVZ collector hub lives under `/root/automation/pvz-collectors` and also points to existing Yandex collector code already present in the workspace.

Example public baseline run

python /root/automation/pvz-collectors/collect_osm_overpass.py --area "Moscow" --brand ozon --brand wildberries --brand cdek --output /root/automation/pvz-collectors/output/moscow.geojson

/root/automation/pvz-collectors/collect_osm_overpass.py

Implemented public collector

A standard-library Python CLI that queries Overpass, expands PVZ brand aliases, and exports GeoJSON or raw JSON for the public baseline layer.

/root/automation/pvz-collectors/provider_matrix.json

Registry of sources

A machine-readable list of implemented, existing, and planned collectors so the team can track access model, coordinate fidelity, and current status.

/root/automation/yandex_maps_scraper/main.py

Existing Yandex Maps collector

The workspace already includes a dedicated Yandex Maps scraper that can be reused when the project needs browser-based directory extraction.

/root/automation/maps_rpa/09-rpa-selenium

Existing Selenium RPA variants

There is also a separate bank of Yandex-oriented Selenium collectors and experiments for heavier browser automation scenarios.

Reference set

Official docs and public references behind the current layout

This page was structured around the official or primary sources that are most relevant to the collection strategy. The goal is not to flatten them into one claim-heavy article, but to keep a traceable path from engineering decision to source material.

Operational details and commercial terms can change. For production collectors, the source-specific connector should always be checked against the live documentation before implementation or rollout.

Koveh

Use the website as the brief. Use automation as the execution layer.

That split keeps the product cleaner. The page explains what is possible and how the data should be modeled. The automation folder owns collectors, credentials, scraping logic, and source-specific maintenance.