Metadata-Version: 2.1
Name: libzim
Version: 3.6.0
Summary: A python-facing API for creating and interacting with ZIM files
Author-email: openZIM <dev@kiwix.org>
License: GPL-3.0-or-later
Project-URL: Homepage, https://github.com/openzim/python-libzim
Project-URL: Donate, https://www.kiwix.org/en/support-us/
Classifier: Development Status :: 5 - Production/Stable
Classifier: Topic :: Utilities
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Archiving
Classifier: Topic :: System :: Archiving :: Compression
Classifier: Topic :: System :: Archiving :: Mirroring
Classifier: Topic :: System :: Archiving :: Backup
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Cython
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Stubs Only
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX
Requires-Python: <3.14,>=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: scripts
Requires-Dist: invoke==2.2.0; extra == "scripts"
Provides-Extra: lint
Requires-Dist: black==24.10.0; extra == "lint"
Requires-Dist: ruff==0.6.9; extra == "lint"
Requires-Dist: libzim; extra == "lint"
Requires-Dist: libzim[build]; extra == "lint"
Provides-Extra: check
Requires-Dist: pyright==1.1.384; extra == "check"
Requires-Dist: libzim; extra == "check"
Requires-Dist: libzim[build]; extra == "check"
Requires-Dist: libzim[test]; extra == "check"
Requires-Dist: types-setuptools; extra == "check"
Provides-Extra: test
Requires-Dist: pytest==8.3.3; extra == "test"
Requires-Dist: coverage==7.6.2; extra == "test"
Requires-Dist: libzim[build]; extra == "test"
Provides-Extra: build
Requires-Dist: setuptools==75.1.0; extra == "build"
Requires-Dist: wheel==0.44.0; extra == "build"
Requires-Dist: cython==3.0.11; extra == "build"
Requires-Dist: delocate==0.11.0; platform_system == "Windows" and extra == "build"
Provides-Extra: dev
Requires-Dist: pre-commit==4.0.1; extra == "dev"
Requires-Dist: ipython==8.28.0; extra == "dev"
Requires-Dist: types-setuptools; extra == "dev"
Requires-Dist: libzim[scripts]; extra == "dev"
Requires-Dist: libzim[lint]; extra == "dev"
Requires-Dist: libzim[test]; extra == "dev"
Requires-Dist: libzim[check]; extra == "dev"
Requires-Dist: libzim[build]; extra == "dev"

# python-libzim

`libzim` module allows you to read and write [ZIM
files](https://openzim.org) in Python. It provides a shallow python
interface on top of the [C++ `libzim` library](https://github.com/openzim/libzim).

It is primarily used in [openZIM](https://github.com/openzim/) scrapers like [`sotoki`](https://github.com/openzim/sotoki) or [`youtube2zim`](https://github.com/openzim/youtube).

[![Build Status](https://github.com/openzim/python-libzim/workflows/test/badge.svg?query=branch%3Amain)](https://github.com/openzim/python-libzim/actions?query=branch%3Amain)
[![CodeFactor](https://www.codefactor.io/repository/github/openzim/python-libzim/badge)](https://www.codefactor.io/repository/github/openzim/python-libzim)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![PyPI version shields.io](https://img.shields.io/pypi/v/libzim.svg)](https://pypi.org/project/libzim/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/libzim.svg)](https://pypi.org/project/libzim)
[![codecov](https://codecov.io/gh/openzim/python-libzim/branch/main/graph/badge.svg)](https://codecov.io/gh/openzim/python-libzim)

## Installation

```sh
pip install libzim
```

Our [PyPI wheels](https://pypi.org/project/libzim/) bundle a [recent release](https://download.openzim.org/release/libzim/) of the C++ libzim and are available for the following platforms:

- macOS for `x86_64` and `arm64`
- GNU/Linux for `x86_64`, `armhf` and `aarch64`
- Linux+musl for `x86_64` and `aarch64`
- Windows for `x64`

Wheels are available for CPython only (but can be built for Pypy).

Users on other platforms can install the source distribution (see [Building](#Building) below). 


## Contributions

```sh
git clone git@github.com:openzim/python-libzim.git && cd python-libzim
# hatch run test:coverage
```

See [CONTRIBUTING.md](./CONTRIBUTING.md) for additional details then [Open a ticket](https://github.com/openzim/python-libzim/issues/new) or submit a Pull Request on Github 🤗!

## Usage

### Read a ZIM file

```python
from libzim.reader import Archive
from libzim.search import Query, Searcher
from libzim.suggestion import SuggestionSearcher

zim = Archive("test.zim")
print(f"Main entry is at {zim.main_entry.get_item().path}")
entry = zim.get_entry_by_path("home/fr")
print(f"Entry {entry.title} at {entry.path} is {entry.get_item().size}b.")
print(bytes(entry.get_item().content).decode("UTF-8"))

# searching using full-text index
search_string = "Welcome"
query = Query().set_query(search_string)
searcher = Searcher(zim)
search = searcher.search(query)
search_count = search.getEstimatedMatches()
print(f"there are {search_count} matches for {search_string}")
print(list(search.getResults(0, search_count)))

# accessing suggestions
search_string = "kiwix"
suggestion_searcher = SuggestionSearcher(zim)
suggestion = suggestion_searcher.suggest(search_string)
suggestion_count = suggestion.getEstimatedMatches()
print(f"there are {suggestion_count} matches for {search_string}")
print(list(suggestion.getResults(0, suggestion_count)))
```

### Write a ZIM file

```py
from libzim.writer import Creator, Item, StringProvider, FileProvider, Hint


class MyItem(Item):
    def __init__(self, title, path, content = "", fpath = None):
        super().__init__()
        self.path = path
        self.title = title
        self.content = content
        self.fpath = fpath

    def get_path(self):
        return self.path

    def get_title(self):
        return self.title

    def get_mimetype(self):
        return "text/html"

    def get_contentprovider(self):
        if self.fpath is not None:
            return FileProvider(self.fpath)
        return StringProvider(self.content)

    def get_hints(self):
        return {Hint.FRONT_ARTICLE: True}


content = """<html><head><meta charset="UTF-8"><title>Web Page Title</title></head>
<body><h1>Welcome to this ZIM</h1><p>Kiwix</p></body></html>"""

item = MyItem("Hello Kiwix", "home", content)
item2 = MyItem("Bonjour Kiwix", "home/fr", None, "home-fr.html")

with Creator("test.zim").config_indexing(True, "eng") as creator:
    creator.set_mainpath("home")
    creator.add_item(item)
    creator.add_item(item2)
    illustration = pathlib.Path("icon48x48.png").read_bytes()
    creator.add_illustration(48, illustration)
    for name, value in {
        "creator": "python-libzim",
        "description": "Created in python",
        "name": "my-zim",
        "publisher": "You",
        "title": "Test ZIM",
        "language": "eng",
        "date": "2024-06-30"
    }.items():

        creator.add_metadata(name.title(), value)
```

#### Thread safety

> The reading part of the libzim is most of the time thread safe. Searching and creating part are not. [libzim documentation](https://libzim.readthedocs.io/en/latest/usage.html#introduction)

`python-libzim` disables the [GIL](https://wiki.python.org/moin/GlobalInterpreterLock) on most of C++ libzim calls. You **must prevent concurrent access** yourself. This is easily done by wrapping all creator calls with a [`threading.Lock()`](https://docs.python.org/3/library/threading.html#lock-objects)

```py
lock = threading.Lock()
with Creator("test.zim") as creator:

    # Thread #1
    with lock:
        creator.add_item(item1)

    # Thread #2
    with lock:
        creator.add_item(item2)
```

#### Type hints

`libzim` being a binary extension, there is no Python source to provide types information. We provide them as type stub files. When using `pyright`, you would normally receive a warning when importing from `libzim` as there could be discrepencies between actual sources and the (manually crafted) stub files.

You can disable the warning via `reportMissingModuleSource = "none"`.

## Building

`libzim` package building offers different behaviors via environment variables

| Variable                         | Example                                  | Use case |
| -------------------------------- | ---------------------------------------- | -------- |
| `LIBZIM_DL_VERSION`              | `8.1.1` or `2023-04-14`                     | Specify the C++ libzim binary version to download and bundle. Either a release version string or a date, in which case it downloads a nightly |
| `USE_SYSTEM_LIBZIM`              | `1`                                      | Uses `LDFLAG` and `CFLAGS` to find the libzim to link against. Resulting wheel won't bundle C++ libzim. |
| `DONT_DOWNLOAD_LIBZIM`           | `1`                                      | Disable downloading of C++ libzim. Place headers in `include/` and libzim dylib/so in `libzim/` if no using system libzim. It will be bundled in wheel. |
| `PROFILE`                        | `0`                                      | Enable profile tracing in Cython extension. Required for Cython code coverage reporting. |
| `SIGN_APPLE`                     | `1`                                      | Set to sign and notarize the extension for macOS. Requires following informations |
| `APPLE_SIGNING_IDENTITY`         | `Developer ID Application: OrgName (ID)` | Required for signing on macOS |
| `APPLE_SIGNING_KEYCHAIN_PATH`    | `/tmp/build.keychain`                    | Path to the Keychain containing the certificate to sign for macOS with |
| `APPLE_SIGNING_KEYCHAIN_PROFILE` | `build`                                  | Name of the profile in the specified Keychain |


### Building on Windows

On Windows, built wheels needs to be fixed post-build to move the bundled DLLs (libzim and libicu)
next to the wrapper (Windows does not support runtime path).

After building you wheel, run

```ps
python setup.py repair_win_wheel --wheel=dist/xxx.whl --destdir wheels\
```

Similarily, if you install as editable (`pip install -e .`), you need to place those DLLs at the root
of the repo.

```ps
Move-Item -Force -Path .\libzim\*.dll -Destination .\
```

### Examples

##### Default: downloading and bundling most appropriate libzim release binary

```sh
python3 -m build
```

#### Using system libzim (brew, debian or manually installed) - not bundled

```sh
# using system-installed C++ libzim
brew install libzim  # macOS
apt-get install libzim-devel  # debian
dnf install libzim-dev  # fedora
USE_SYSTEM_LIBZIM=1 python3 -m build --wheel

# using a specific C++ libzim
USE_SYSTEM_LIBZIM=1 \
CFLAGS="-I/usr/local/include" \
LDFLAGS="-L/usr/local/lib"
DYLD_LIBRARY_PATH="/usr/local/lib" \
LD_LIBRARY_PATH="/usr/local/lib" \
python3 -m build --wheel
```

#### Other platforms

On platforms for which there is no [official binary](https://download.openzim.org/release/libzim/) available, you'd have to [compile C++ libzim from source](https://github.com/openzim/libzim) first then either use `DONT_DOWNLOAD_LIBZIM` or `USE_SYSTEM_LIBZIM`.


## License

[GPLv3](https://www.gnu.org/licenses/gpl-3.0) or later, see
[LICENSE](LICENSE) for more details.
