@@ -23,8 +23,25 @@ send along a pull request to add your dataset or model.
2323See [ our contribution
2424doc] ( CONTRIBUTING.md ) for details and our [ open
2525issues] ( https://github.com/tensorflow/tensor2tensor/issues ) .
26- And chat with us and other users on
27- [ Gitter] ( https://gitter.im/tensor2tensor/Lobby ) .
26+ You can chat with us and other users on
27+ [ Gitter] ( https://gitter.im/tensor2tensor/Lobby ) and please join our
28+ [ Google Group] ( https://groups.google.com/forum/#!forum/tensor2tensor ) to keep up
29+ with T2T announcements.
30+
31+ Here is a one-command version that installs tensor2tensor, downloads the data,
32+ trains an English-German translation model, and lets you use it interactively:
33+ ```
34+ pip install tensor2tensor && t2t-trainer \
35+ --generate_data \
36+ --data_dir=~/t2t_data \
37+ --problems=wmt_ende_tokens_32k \
38+ --model=transformer \
39+ --hparams_set=transformer_base_single_gpu \
40+ --output_dir=~/t2t_train/base \
41+ --decode_interactive
42+ ```
43+
44+ See the [ Walkthrough] ( #walkthrough ) below for more details on each step.
2845
2946### Contents
3047
@@ -69,11 +86,8 @@ mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR
6986t2t-datagen \
7087 --data_dir=$DATA_DIR \
7188 --tmp_dir=$TMP_DIR \
72- --num_shards=100 \
7389 --problem=$PROBLEM
7490
75- cp $TMP_DIR/tokens.vocab.* $DATA_DIR
76-
7791# Train
7892# * If you run out of memory, add --hparams='batch_size=2048' or even 1024.
7993t2t-trainer \
@@ -153,7 +167,7 @@ python -c "from tensor2tensor.models.transformer import Transformer"
153167 specification.
154168* Support for multi-GPU machines and synchronous (1 master, many workers) and
155169 asynchrounous (independent workers synchronizing through a parameter server)
156- distributed training.
170+ [ distributed training] ( https://github.com/tensorflow/tensor2tensor/tree/master/docs/distributed_training.md ) .
157171* Easily swap amongst datasets and models by command-line flag with the data
158172 generation script ` t2t-datagen ` and the training script ` t2t-trainer ` .
159173
@@ -173,8 +187,10 @@ and many common sequence datasets are already available for generation and use.
173187
174188** Problems** define training-time hyperparameters for the dataset and task,
175189mainly by setting input and output ** modalities** (e.g. symbol, image, audio,
176- label) and vocabularies, if applicable. All problems are defined in
177- [ ` problem_hparams.py ` ] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem_hparams.py ) .
190+ label) and vocabularies, if applicable. All problems are defined either in
191+ [ ` problem_hparams.py ` ] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem_hparams.py )
192+ or are registered with ` @registry.register_problem ` (run ` t2t-datagen ` to see
193+ the list of all available problems).
178194** Modalities** , defined in
179195[ ` modality.py ` ] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/modality.py ) ,
180196abstract away the input and output data types so that ** models** may deal with
@@ -211,7 +227,7 @@ inference. Users can easily switch between problems, models, and hyperparameter
211227sets by using the ` --model ` , ` --problems ` , and ` --hparams_set ` flags. Specific
212228hyperparameters can be overridden with the ` --hparams ` flag. ` --schedule ` and
213229related flags control local and distributed training/evaluation
214- ([ distributed training documentation] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/ docs/distributed_training.md ) ).
230+ ([ distributed training documentation] ( https://github.com/tensorflow/tensor2tensor/tree/master/docs/distributed_training.md ) ).
215231
216232---
217233
@@ -222,7 +238,7 @@ enables easily adding new ones and easily swapping amongst them by command-line
222238flag. You can add your own components without editing the T2T codebase by
223239specifying the ` --t2t_usr_dir ` flag in ` t2t-trainer ` .
224240
225- You can currently do so for models, hyperparameter sets, and modalities . Please
241+ You can do so for models, hyperparameter sets, modalities, and problems . Please
226242do submit a pull request if your component might be useful to others.
227243
228244Here's an example with a new hyperparameter set:
@@ -242,7 +258,7 @@ def transformer_my_very_own_hparams_set():
242258
243259``` python
244260# In ~/usr/t2t_usr/__init__.py
245- import my_registrations
261+ from . import my_registrations
246262```
247263
248264```
@@ -253,9 +269,18 @@ You'll see under the registered HParams your
253269` transformer_my_very_own_hparams_set ` , which you can directly use on the command
254270line with the ` --hparams_set ` flag.
255271
272+ ` t2t-datagen ` also supports the ` --t2t_usr_dir ` flag for ` Problem `
273+ registrations.
274+
256275## Adding a dataset
257276
258- See the [ data generators
277+ To add a new dataset, subclass
278+ [ ` Problem ` ] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem.py )
279+ and register it with ` @registry.register_problem ` . See
280+ [ ` WMTEnDeTokens8k ` ] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/wmt.py )
281+ for an example.
282+
283+ Also see the [ data generators
259284README] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/README.md ) .
260285
261286---
0 commit comments