Skip to content

medcat==1.8.0 doesn't install with latest Rust-1.73.0 #14

@mkorvas

Description

@mkorvas

Following commands from the MedCAT tutorial on my recently updated Arch Linux, I started by pip-installing medcat==1.8.0:

TMPDIR=$(realpath tmp) pip install medcat==1.8.0

and it failed while installing the transitive dependency of tokenizers-0.12.1:

         Compiling tokenizers v0.12.1 (/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/tokenizers-lib)
           Running `rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="cached-path"' --cfg 'feature="clap"' --cfg 'feature="cli"' --cfg 'feature="default"' --cfg 'feature="http"' --cfg 'feature="indicatif"' --cfg 'feature="progressbar"' --cfg 'feature="reqwest"' -C metadata=6e744bd72fbca6b6 -C extra-filename=-6e744bd72fbca6b6 --out-dir /home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps -L dependency=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps --extern aho_corasick=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libaho_corasick-945b53c31d17d93a.rmeta --extern cached_path=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libcached_path-f08bff030f68babf.rmeta --extern clap=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libclap-e9d371f5e8d6a9a3.rmeta --extern derive_builder=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libderive_builder-fa11fc961fe52533.so --extern dirs=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libdirs-1a1d9e829264b7da.rmeta --extern esaxx_rs=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libesaxx_rs-85538497f74112a9.rmeta --extern indicatif=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libindicatif-d0d39a7cdd2548d8.rmeta --extern itertools=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libitertools-69eed52371d42a58.rmeta --extern lazy_static=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/liblazy_static-f66451aaeb61e431.rmeta --extern log=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/liblog-c574061a79b01b9c.rmeta --extern macro_rules_attribute=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libmacro_rules_attribute-fba70e287e0c3709.rmeta --extern onig=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libonig-e1ec9f287b0bb2a0.rmeta --extern paste=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libpaste-1e8a081fe8f77648.so --extern rand=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/librand-901f96c0508326da.rmeta --extern rayon=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/librayon-21e5476475f6123c.rmeta --extern rayon_cond=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/librayon_cond-abebb32de588b7d4.rmeta --extern regex=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libregex-e63a632912025278.rmeta --extern regex_syntax=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libregex_syntax-2ed5634723cf75a8.rmeta --extern reqwest=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libreqwest-31671eba5c38f195.rmeta --extern serde=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libserde-7eecf8cc84b5f85e.rmeta --extern serde_json=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libserde_json-486bdd6da639b7af.rmeta --extern spm_precompiled=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libspm_precompiled-38b82cabeec534fc.rmeta --extern thiserror=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libthiserror-b66f0526c1fb2f50.rmeta --extern unicode_normalization_alignments=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libunicode_normalization_alignments-2c588a19019b70cf.rmeta --extern unicode_segmentation=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libunicode_segmentation-76879f425c2b2d2d.rmeta --extern unicode_categories=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libunicode_categories-766047a35d8335eb.rmeta -L native=/usr/lib -L native=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/build/zstd-sys-39732ab2cbd6d2b3/out -L native=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/build/esaxx-rs-56e5ad34b63d614b/out -L native=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/build/onig_sys-ded1a183a7abd08d/out`
      warning: variable does not need to be mutable
         --> tokenizers-lib/src/models/unigram/model.rs:265:21
          |
      265 |                 let mut target_node = &mut best_path_ends_at[key_pos];
          |                     ----^^^^^^^^^^^
          |                     |
          |                     help: remove this `mut`
          |
          = note: `#[warn(unused_mut)]` on by default

      warning: variable does not need to be mutable
         --> tokenizers-lib/src/models/unigram/model.rs:282:21
          |
      282 |                 let mut target_node = &mut best_path_ends_at[starts_at + mblen];
          |                     ----^^^^^^^^^^^
          |                     |
          |                     help: remove this `mut`

      warning: variable does not need to be mutable
         --> tokenizers-lib/src/pre_tokenizers/byte_level.rs:200:59
          |
      200 |     encoding.process_tokens_with_offsets_mut(|(i, (token, mut offsets))| {
          |                                                           ----^^^^^^^
          |                                                           |
          |                                                           help: remove this `mut`

      error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
         --> tokenizers-lib/src/models/bpe/trainer.rs:526:47
          |
      522 |                     let w = &words[*i] as *const _ as *mut _;
          |                             -------------------------------- casting happend here
      ...
      526 |                         let word: &mut Word = &mut (*w);
          |                                               ^^^^^^^^^
          |
          = note: `#[deny(invalid_reference_casting)]` on by default

      warning: `tokenizers` (lib) generated 3 warnings
      error: could not compile `tokenizers` (lib) due to previous error; 3 warnings emitted

      Caused by:
        process didn't exit successfully: `rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="cached-path"' --cfg 'feature="clap"' --cfg 'feature="cli"' --cfg 'feature="default"' --cfg 'feature="http"' --cfg 'feature="indicatif"' --cfg 'feature="progressbar"' --cfg 'feature="reqwest"' -C metadata=6e744bd72fbca6b6 -C extra-filename=-6e744bd72fbca6b6 --out-dir /home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps -L dependency=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps --extern aho_corasick=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libaho_corasick-945b53c31d17d93a.rmeta --extern cached_path=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libcached_path-f08bff030f68babf.rmeta --extern clap=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libclap-e9d371f5e8d6a9a3.rmeta --extern derive_builder=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libderive_builder-fa11fc961fe52533.so --extern dirs=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libdirs-1a1d9e829264b7da.rmeta --extern esaxx_rs=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libesaxx_rs-85538497f74112a9.rmeta --extern indicatif=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libindicatif-d0d39a7cdd2548d8.rmeta --extern itertools=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libitertools-69eed52371d42a58.rmeta --extern lazy_static=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/liblazy_static-f66451aaeb61e431.rmeta --extern log=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/liblog-c574061a79b01b9c.rmeta --extern macro_rules_attribute=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libmacro_rules_attribute-fba70e287e0c3709.rmeta --extern onig=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libonig-e1ec9f287b0bb2a0.rmeta --extern paste=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libpaste-1e8a081fe8f77648.so --extern rand=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/librand-901f96c0508326da.rmeta --extern rayon=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/librayon-21e5476475f6123c.rmeta --extern rayon_cond=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/librayon_cond-abebb32de588b7d4.rmeta --extern regex=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libregex-e63a632912025278.rmeta --extern regex_syntax=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libregex_syntax-2ed5634723cf75a8.rmeta --extern reqwest=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libreqwest-31671eba5c38f195.rmeta --extern serde=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libserde-7eecf8cc84b5f85e.rmeta --extern serde_json=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libserde_json-486bdd6da639b7af.rmeta --extern spm_precompiled=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libspm_precompiled-38b82cabeec534fc.rmeta --extern thiserror=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libthiserror-b66f0526c1fb2f50.rmeta --extern unicode_normalization_alignments=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libunicode_normalization_alignments-2c588a19019b70cf.rmeta --extern unicode_segmentation=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libunicode_segmentation-76879f425c2b2d2d.rmeta --extern unicode_categories=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/deps/libunicode_categories-766047a35d8335eb.rmeta -L native=/usr/lib -L native=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/build/zstd-sys-39732ab2cbd6d2b3/out -L native=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/build/esaxx-rs-56e5ad34b63d614b/out -L native=/home/matej/proj/medcat/tmp/pip-install-ke2936w0/tokenizers_6e6a038a805943288e0da324d69fa299/target/release/build/onig_sys-ded1a183a7abd08d/out` (exit status: 1)
      error: `cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module --crate-type cdylib --` failed with code 101
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

[notice] A new release of pip is available: 23.2.1 -> 23.3.1
[notice] To update, run: pip install --upgrade pip

This was with Rust-1.73.0 installed on the system. After downgrading to Rust-1.72.1, the build worked. This post in the discussion of the python-tokenizers package in AUR suggests that requiring tokenizers==0.14.1 instead should make this work (with at least Rust-1.70.0 or newer).

I am posting this issue here because it effectively causes the instructions of the tutorial to be broken, even though it's probably not an issue that could be easily fixed in the tutorial itself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions