An error occurred when running Quickstart for ESM3-open.    replace() argument 2 must be str, not None

When I ran the "Quickstart for ESM3-open" file(open.py), the following error occurred(slurm-1031673.txt).

from huggingface_hub import login
from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig

# Will instruct you how to get an API key from huggingface hub, make one with "Read" permission.
login()

# This will download the model weights and instantiate the model on your machine.
model: ESM3InferenceClient = ESM3.from_pretrained("esm3-open").to("cuda") # or "cpu"

# Generate a completion for a partial Carbonic Anhydrase (2vvb)
prompt = "___________________________________________________DQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPP___________________________________________________________"
protein = ESMProtein(sequence=prompt)
# Generate the sequence, then the structure. This will iteratively unmask the sequence track.
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.7))
# We can show the predicted structure for the generated sequence.
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./generation.pdb")
# Then we can do a round trip design by inverse folding the sequence and recomputing the structure
protein.sequence = None
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
protein.coordinates = None
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./round_tripped.pdb")



gcc-11.3.0 loaded successful
cuda-12.1 loaded successful

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

/home/bingxing2/home/scx9842/.conda/envs/esm3/lib/python3.10/getpass.py:91: GetPassWarning: Can not control echo on the terminal.
  passwd = fallback_getpass(prompt, stream)
Warning: Password input may be echoed.
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) 
Fetching 22 files:   0%|          | 0/22 [00:00<?, ?it/s]
Fetching 22 files: 100%|██████████| 22/22 [00:00<00:00, 2270.76it/s]
/home/bingxing2/home/scx9842/esm3/esm/pretrained.py:68: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(
Traceback (most recent call last):
  File "/home/bingxing2/home/scx9842/esm3/open.py", line 15, in <module>
    protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.7))
  File "/home/bingxing2/home/scx9842/esm3/esm/models/esm3.py", line 397, in generate
    proteins = self.batch_generate([input], [config])
  File "/home/bingxing2/home/scx9842/esm3/esm/models/esm3.py", line 421, in batch_generate
    return iterative_sampling_raw(self, inputs, configs)  # type: ignore
  File "/home/bingxing2/home/scx9842/esm3/esm/utils/generation.py", line 105, in iterative_sampling_raw
    input_tokens = [client.encode(protein) for protein in proteins]
  File "/home/bingxing2/home/scx9842/esm3/esm/utils/generation.py", line 105, in <listcomp>
    input_tokens = [client.encode(protein) for protein in proteins]
  File "/home/bingxing2/home/scx9842/esm3/esm/models/esm3.py", line 445, in encode
    sequence_tokens = encoding.tokenize_sequence(
  File "/home/bingxing2/home/scx9842/esm3/esm/utils/encoding.py", line 53, in tokenize_sequence
    sequence = sequence.replace(C.MASK_STR_SHORT, sequence_tokenizer.mask_token)
TypeError: replace() argument 2 must be str, not None





[open.py](https://github.com/user-attachments/files/23176979/open.py)

[slurm-1031673.txt](https://github.com/user-attachments/files/23176992/slurm-1031673.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

An error occurred when running Quickstart for ESM3-open. replace() argument 2 must be str, not None #285

Will instruct you how to get an API key from huggingface hub, make one with "Read" permission.

This will download the model weights and instantiate the model on your machine.

Generate a completion for a partial Carbonic Anhydrase (2vvb)

Generate the sequence, then the structure. This will iteratively unmask the sequence track.

We can show the predicted structure for the generated sequence.

Then we can do a round trip design by inverse folding the sequence and recomputing the structure

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

An error occurred when running Quickstart for ESM3-open. replace() argument 2 must be str, not None #285

Description

Will instruct you how to get an API key from huggingface hub, make one with "Read" permission.

This will download the model weights and instantiate the model on your machine.

Generate a completion for a partial Carbonic Anhydrase (2vvb)

Generate the sequence, then the structure. This will iteratively unmask the sequence track.

We can show the predicted structure for the generated sequence.

Then we can do a round trip design by inverse folding the sequence and recomputing the structure

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions