Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Commit ed695f4

Browse files
Upstream merge
2 parents 93177b0 + 668e385 commit ed695f4

37 files changed

+1513
-566
lines changed

README.md

Lines changed: 36 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,25 @@ send along a pull request to add your dataset or model.
2323
See [our contribution
2424
doc](CONTRIBUTING.md) for details and our [open
2525
issues](https://github.com/tensorflow/tensor2tensor/issues).
26-
And chat with us and other users on
27-
[Gitter](https://gitter.im/tensor2tensor/Lobby).
26+
You can chat with us and other users on
27+
[Gitter](https://gitter.im/tensor2tensor/Lobby) and please join our
28+
[Google Group](https://groups.google.com/forum/#!forum/tensor2tensor) to keep up
29+
with T2T announcements.
30+
31+
Here is a one-command version that installs tensor2tensor, downloads the data,
32+
trains an English-German translation model, and lets you use it interactively:
33+
```
34+
pip install tensor2tensor && t2t-trainer \
35+
--generate_data \
36+
--data_dir=~/t2t_data \
37+
--problems=wmt_ende_tokens_32k \
38+
--model=transformer \
39+
--hparams_set=transformer_base_single_gpu \
40+
--output_dir=~/t2t_train/base \
41+
--decode_interactive
42+
```
43+
44+
See the [Walkthrough](#walkthrough) below for more details on each step.
2845

2946
### Contents
3047

@@ -72,8 +89,6 @@ t2t-datagen \
7289
--num_shards=100 \
7390
--problem=$PROBLEM
7491
75-
cp $TMP_DIR/tokens.vocab.* $DATA_DIR
76-
7792
# Train
7893
# * If you run out of memory, add --hparams='batch_size=2048' or even 1024.
7994
t2t-trainer \
@@ -153,7 +168,7 @@ python -c "from tensor2tensor.models.transformer import Transformer"
153168
specification.
154169
* Support for multi-GPU machines and synchronous (1 master, many workers) and
155170
asynchrounous (independent workers synchronizing through a parameter server)
156-
distributed training.
171+
[distributed training](https://github.com/tensorflow/tensor2tensor/tree/master/docs/distributed_training.md).
157172
* Easily swap amongst datasets and models by command-line flag with the data
158173
generation script `t2t-datagen` and the training script `t2t-trainer`.
159174

@@ -173,8 +188,10 @@ and many common sequence datasets are already available for generation and use.
173188

174189
**Problems** define training-time hyperparameters for the dataset and task,
175190
mainly by setting input and output **modalities** (e.g. symbol, image, audio,
176-
label) and vocabularies, if applicable. All problems are defined in
177-
[`problem_hparams.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem_hparams.py).
191+
label) and vocabularies, if applicable. All problems are defined either in
192+
[`problem_hparams.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem_hparams.py)
193+
or are registered with `@registry.register_problem` (run `t2t-datagen` to see
194+
the list of all available problems).
178195
**Modalities**, defined in
179196
[`modality.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/modality.py),
180197
abstract away the input and output data types so that **models** may deal with
@@ -211,7 +228,7 @@ inference. Users can easily switch between problems, models, and hyperparameter
211228
sets by using the `--model`, `--problems`, and `--hparams_set` flags. Specific
212229
hyperparameters can be overridden with the `--hparams` flag. `--schedule` and
213230
related flags control local and distributed training/evaluation
214-
([distributed training documentation](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/docs/distributed_training.md)).
231+
([distributed training documentation](https://github.com/tensorflow/tensor2tensor/tree/master/docs/distributed_training.md)).
215232

216233
---
217234

@@ -222,7 +239,7 @@ enables easily adding new ones and easily swapping amongst them by command-line
222239
flag. You can add your own components without editing the T2T codebase by
223240
specifying the `--t2t_usr_dir` flag in `t2t-trainer`.
224241

225-
You can currently do so for models, hyperparameter sets, and modalities. Please
242+
You can do so for models, hyperparameter sets, modalities, and problems. Please
226243
do submit a pull request if your component might be useful to others.
227244

228245
Here's an example with a new hyperparameter set:
@@ -253,9 +270,18 @@ You'll see under the registered HParams your
253270
`transformer_my_very_own_hparams_set`, which you can directly use on the command
254271
line with the `--hparams_set` flag.
255272

273+
`t2t-datagen` also supports the `--t2t_usr_dir` flag for `Problem`
274+
registrations.
275+
256276
## Adding a dataset
257277

258-
See the [data generators
278+
To add a new dataset, subclass
279+
[`Problem`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem.py)
280+
and register it with `@registry.register_problem`. See
281+
[`WMTEnDeTokens8k`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/wmt.py)
282+
for an example.
283+
284+
Also see the [data generators
259285
README](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/README.md).
260286

261287
---
File renamed without changes.

docs/index.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# T2T: Tensor2Tensor Transformers
2+
3+
Check us out on
4+
<a href=https://github.com/tensorflow/tensor2tensor>
5+
GitHub
6+
<img src="https://github.com/favicon.ico" width="16">
7+
</a>
8+
.
9+
10+
[![PyPI
11+
version](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/tensor2tensor)
12+
[![GitHub
13+
Issues](https://img.shields.io/github/issues/tensorflow/tensor2tensor.svg)](https://github.com/tensorflow/tensor2tensor/issues)
14+
[![Contributions
15+
welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)
16+
[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/tensor2tensor/Lobby)
17+
[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)
18+
19+
See our
20+
[README](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/README.md)
21+
for documentation.
22+
23+
More documentation and tutorials coming soon...

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
setup(
77
name='tensor2tensor',
8-
version='1.0.14',
8+
version='1.1.1',
99
description='Tensor2Tensor',
1010
author='Google Inc.',
1111
author_email='no-reply@google.com',

0 commit comments

Comments
 (0)