PyPI package metadata cache

For package collections, among which is Python Package Index aka PyPI, it's crucial to provide complete metadata on published packages in an easily accessible way and easily processable format. For instance, it's required by Repology to be able to report outdated versions of Python modules packaged in distributions native repositories.

Unfortunately, PyPI does not currently provide such data in an usable way. According to the FAQ and PyPI API Reference, there are several ways to access package metadata:

None of these meet basic usability requirements. A simple single-file package metadata dump would be sufficient, but for some reason PyPI developers do not care (related upstream issues pypa/warehouse#347, pypa/warehouse#7403, pypa/warehouse#8802) to provide such file, so this service was set up to provide at least something - that is, a metadata dump for recently changed packages only.

The format of the following file is zstandard-compressed JSON containing an array of outputs of Project PyPI JSON API endpoint. Each package entry is additionally processed to remove description field and releases which are older than the latest release (as specified by the version field) to reduce the size of the dump.

Download the latest dump

Format: JSON compressed with zstd
Size: 68.75 MiB
Generated at 2024-04-18 20:02 UTC
Contains 97496 packages

Details of operation

This service works by polling XML-RPC changelog method to discover all package changes since the previous iteration, and then retrieves fresh metadata for each of them from JSON API. This information is then stored in the database and periodically dumped into a single JSON file.

Source code is located on GitHub.

Warranty

Note that this service by design provides incomplete data, no consistency guarantee is ever provided and you're using this data at your own risk. Additionally, note that XML-RPC API of PyPI is also deprecated with a suggested replacement of Latest Updates RSS feed, which only provides 40 latest changes without a mechanism to request larger history of updates, which cannot be used in a way that no updates are lost, so this service will be discontinued as soon as XML-RPC API is disabled.