Skip to content

An error occurred when running Quickstart for ESM3-open. replace() argument 2 must be str, not None #285

@pdxooo

Description

@pdxooo

When I ran the "Quickstart for ESM3-open" file(open.py), the following error occurred(slurm-1031673.txt).

from huggingface_hub import login
from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig

Will instruct you how to get an API key from huggingface hub, make one with "Read" permission.

login()

This will download the model weights and instantiate the model on your machine.

model: ESM3InferenceClient = ESM3.from_pretrained("esm3-open").to("cuda") # or "cpu"

Generate a completion for a partial Carbonic Anhydrase (2vvb)

prompt = "DQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPP________"
protein = ESMProtein(sequence=prompt)

Generate the sequence, then the structure. This will iteratively unmask the sequence track.

protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.7))

We can show the predicted structure for the generated sequence.

protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./generation.pdb")

Then we can do a round trip design by inverse folding the sequence and recomputing the structure

protein.sequence = None
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
protein.coordinates = None
protein = model.generate(protein, GenerationConfig(track="structure", num_steps=8))
protein.to_pdb("./round_tripped.pdb")

gcc-11.3.0 loaded successful
cuda-12.1 loaded successful

_|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
_|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
_|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
_|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
_|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

/home/bingxing2/home/scx9842/.conda/envs/esm3/lib/python3.10/getpass.py:91: GetPassWarning: Can not control echo on the terminal.
passwd = fallback_getpass(prompt, stream)
Warning: Password input may be echoed.
Enter your token (input will not be visible):
Add token as git credential? (Y/n)
Fetching 22 files: 0%| | 0/22 [00:00<?, ?it/s]
Fetching 22 files: 100%|██████████| 22/22 [00:00<00:00, 2270.76it/s]
/home/bingxing2/home/scx9842/esm3/esm/pretrained.py:68: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state_dict = torch.load(
Traceback (most recent call last):
File "/home/bingxing2/home/scx9842/esm3/open.py", line 15, in
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.7))
File "/home/bingxing2/home/scx9842/esm3/esm/models/esm3.py", line 397, in generate
proteins = self.batch_generate([input], [config])
File "/home/bingxing2/home/scx9842/esm3/esm/models/esm3.py", line 421, in batch_generate
return iterative_sampling_raw(self, inputs, configs) # type: ignore
File "/home/bingxing2/home/scx9842/esm3/esm/utils/generation.py", line 105, in iterative_sampling_raw
input_tokens = [client.encode(protein) for protein in proteins]
File "/home/bingxing2/home/scx9842/esm3/esm/utils/generation.py", line 105, in
input_tokens = [client.encode(protein) for protein in proteins]
File "/home/bingxing2/home/scx9842/esm3/esm/models/esm3.py", line 445, in encode
sequence_tokens = encoding.tokenize_sequence(
File "/home/bingxing2/home/scx9842/esm3/esm/utils/encoding.py", line 53, in tokenize_sequence
sequence = sequence.replace(C.MASK_STR_SHORT, sequence_tokenizer.mask_token)
TypeError: replace() argument 2 must be str, not None

open.py

slurm-1031673.txt

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions