@@ -12,87 +12,50 @@ welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CO
1212
1313[ Tensor2Tensor] ( https://github.com/tensorflow/tensor2tensor ) , or
1414[ T2T] ( https://github.com/tensorflow/tensor2tensor ) for short, is a library
15- of deep learning models and datasets. It has binaries to train the models and
16- to download and prepare the data for you. T2T is modular and extensible and can
17- be used in [ notebooks] ( https://goo.gl/wkHexj ) for prototyping your own models
18- or running existing ones on your data. It is actively used and maintained by
19- researchers and engineers within
20- the [ Google Brain team] ( https://research.google.com/teams/brain/ ) and was used
21- to develop state-of-the-art models for translation (see
22- [ Attention Is All You Need] ( https://arxiv.org/abs/1706.03762 ) ),
23- [ summarization] ( https://arxiv.org/abs/1801.10198 ) ,
24- image generation and other tasks. You can read
25- more about T2T in the [ Google Research Blog post introducing
26- it] ( https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html ) .
27-
28- We're eager to collaborate with you on extending T2T, so please feel
29- free to [ open an issue on
30- GitHub] ( https://github.com/tensorflow/tensor2tensor/issues ) or
31- send along a pull request to add your dataset or model.
32- See [ our contribution
33- doc] ( CONTRIBUTING.md ) for details and our [ open
34- issues] ( https://github.com/tensorflow/tensor2tensor/issues ) .
35- You can chat with us and other users on
36- [ Gitter] ( https://gitter.im/tensor2tensor/Lobby ) and please join our
37- [ Google Group] ( https://groups.google.com/forum/#!forum/tensor2tensor ) to keep up
38- with T2T announcements.
15+ of deep learning models and datasets designed to [ accelerate deep learning
16+ research] ( https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html ) and make it more accessible.
17+
18+ T2T is actively used and maintained by researchers and engineers within the
19+ [ Google Brain team] ( https://research.google.com/teams/brain/ ) and a community
20+ of users. We're eager to collaborate with you too, so feel free to
21+ [ open an issue on GitHub] ( https://github.com/tensorflow/tensor2tensor/issues )
22+ or send along a pull request (see [ our contribution doc] ( CONTRIBUTING.md ) ).
23+ You can chat with us on
24+ [ Gitter] ( https://gitter.im/tensor2tensor/Lobby ) and join the
25+ [ T2T Google Group] ( https://groups.google.com/forum/#!forum/tensor2tensor ) .
3926
4027### Quick Start
4128
4229[ This iPython notebook] ( https://goo.gl/wkHexj ) explains T2T and runs in your
4330browser using a free VM from Google, no installation needed.
44-
45- Alternatively, here is a one-command version that installs T2T, downloads data,
46- trains an MNIST model and evaluates it:
31+ Alternatively, here is a one-command version that installs T2T, downloads MNIST,
32+ trains a model and evaluates it:
4733
4834```
4935pip install tensor2tensor && t2t-trainer \
5036 --generate_data \
5137 --data_dir=~/t2t_data \
38+ --output_dir=~/t2t_train/mnist \
5239 --problems=image_mnist \
5340 --model=shake_shake \
5441 --hparams_set=shake_shake_quick \
55- --output_dir=~/t2t_train/mnist1
56- ```
57-
58- For a more demanding problem, here is how to train
59- an English-German translation model and evaluate it:
60-
61- ```
62- pip install tensor2tensor && t2t-trainer \
63- --generate_data \
64- --data_dir=~/t2t_data \
65- --problems=translate_ende_wmt32k \
66- --model=transformer \
67- --hparams_set=transformer_base_single_gpu \
68- --output_dir=~/t2t_train/base
42+ --train_steps=1000 \
43+ --eval_steps=100
6944```
7045
71- You can decode from the model interactively to get translations:
72-
73- ```
74- t2t-decoder \
75- --data_dir=~/t2t_data \
76- --problems=translate_ende_wmt32k \
77- --model=transformer \
78- --hparams_set=transformer_base_single_gpu \
79- --output_dir=~/t2t_train/base \
80- --decode_interactive
81- ```
82-
83- See the [ Walkthrough] ( #walkthrough ) below for more details on each step
84- and [ Suggested Models] ( #suggested-models ) for well performing models
85- on common tasks.
86-
8746### Contents
8847
89- * [ Walkthrough] ( #walkthrough )
90- * [ Suggested Models] ( #suggested-models )
91- * [ Translation] ( #translation )
92- * [ Summarization] ( #summarization )
48+ * [ Suggested Datasets and Models] ( #suggested-datasets-and-models )
9349 * [ Image Classification] ( #image-classification )
94- * [ Installation] ( #installation )
95- * [ Features] ( #features )
50+ * [ Language Modeling] ( #language-modeling )
51+ * [ Sentiment Analysis] ( #sentiment-analysis )
52+ * [ Speech Recognition] ( #speech-recognition )
53+ * [ Summarization] ( #summarization )
54+ * [ Translation] ( #translation )
55+ * [ Basics] ( #basics )
56+ * [ Walkthrough] ( #walkthrough )
57+ * [ Installation] ( #installation )
58+ * [ Features] ( #features )
9659* [ T2T Overview] ( #t2t-overview )
9760 * [ Datasets] ( #datasets )
9861 * [ Problems and Modalities] ( #problems-and-modalities )
@@ -101,10 +64,102 @@ on common tasks.
10164 * [ Trainer] ( #trainer )
10265* [ Adding your own components] ( #adding-your-own-components )
10366* [ Adding a dataset] ( #adding-a-dataset )
67+ * [ Papers] ( #papers )
68+
69+ ## Suggested Datasets and Models
70+
71+ Below we list a number of tasks that can be solved with T2T when
72+ you train the appropriate model on the appropriate problem.
73+ We give the problem and model below and we suggest a setting of
74+ hyperparameters that we know works well in our setup. We usually
75+ run either on Cloud TPUs or on 8-GPU machines; you might need
76+ to modify the hyperparameters if you run on a different setup.
77+
78+ ### Image Classification
79+
80+ For image classification, we have a number of standard data-sets:
81+ * ImageNet (a large data-set): ` --problems=image_imagenet ` , or one
82+ of the re-scaled versions (` image_imagenet224 ` , ` image_imagenet64 ` ,
83+ ` image_imagenet32 ` )
84+ * CIFAR-10: ` --problems=image_cifar10 ` (or
85+ ` --problems=image_cifar10_plain ` to turn off data augmentation)
86+ * CIFAR-100: ` --problems=image_cifar100 `
87+ * MNIST: ` --problems=image_mnist `
88+
89+ For ImageNet, we suggest to use the ResNet or Xception, i.e.,
90+ use ` --model=resnet --hparams_set=resnet_50 ` or
91+ ` --model=xception --hparams_set=xception_base ` .
92+ Resnet should get to above 76% top-1 accuracy on ImageNet.
93+
94+ For CIFAR and MNIST, we suggest to try the shake-shake model:
95+ ` --model=shake_shake --hparams_set=shakeshake_big ` .
96+ This setting trained for ` --train_steps=700000 ` should yield
97+ close to 97% accuracy on CIFAR-10.
98+
99+ ### Language Modeling
100+
101+ For language modeling, we have these data-sets in T2T:
102+ * PTB (a small data-set): ` --problems=languagemodel_ptb10k ` for
103+ word-level modeling and ` --problems=languagemodel_ptb_characters `
104+ for character-level modeling.
105+ * LM1B (a billion-word corpus): ` --problems=languagemodel_lm1b32k ` for
106+ subword-level modeling and ` --problems=languagemodel_lm1b_characters `
107+ for character-level modeling.
108+
109+ We suggest to start with ` --model=transformer ` on this task and use
110+ ` --hparams_set=transformer_small ` for PTB and
111+ ` --hparams_set=transformer_base ` for LM1B.
112+
113+ ### Sentiment Analysis
104114
105- ---
115+ For the task of recognizing the sentiment of a sentence, use
116+ * the IMDB data-set: ` --problems=sentiment_imdb `
106117
107- ## Walkthrough
118+ We suggest to use ` --model=transformer_encoder ` here and since it is
119+ a small data-set, try ` --hparams_set=transformer_tiny ` and train for
120+ few steps (e.g., ` --train_steps=2000 ` ).
121+
122+ ### Speech Recognition
123+
124+ For speech-to-text, we have these data-sets in T2T:
125+ * Librispeech (English speech to text): ` --problems=librispeech ` for
126+ the whole set and ` --problems=librispeech_clean ` for a smaller
127+ but nicely filtered part.
128+
129+ ### Summarization
130+
131+ For summarizing longer text into shorter one we have these data-sets:
132+ * CNN/DailyMail articles summarized into a few sentences:
133+ ` --problems=summarize_cnn_dailymail32k `
134+
135+ We suggest to use ` --model=transformer ` and
136+ ` --hparams_set=transformer_prepend ` for this task.
137+ This yields good ROUGE scores.
138+
139+ ### Translation
140+
141+ There are a number of translation data-sets in T2T:
142+ * English-German: ` --problems=translate_ende_wmt32k `
143+ * English-French: ` --problems=translate_enfr_wmt32k `
144+ * English-Czech: ` --problems=translate_encs_wmt32k `
145+ * English-Chinese: ` --problems=translate_enzh_wmt32k `
146+
147+ You can get translations in the other direction by appending ` _rev ` to
148+ the problem name, e.g., for German-English use
149+ ` --problems=translate_ende_wmt32k_rev ` .
150+
151+ For all translation problems, we suggest to try the Transformer model:
152+ ` --model=transformer ` . At first it is best to try the base setting,
153+ ` --hparams_set=transformer_base ` . When trained on 8 GPUs for 300K steps
154+ this should reach a BLEU score of about 28 on the English-German data-set,
155+ which is close to state-of-the art. If training on a single GPU, try the
156+ ` --hparams_set=transformer_base_single_gpu ` setting. For very good results
157+ or larger data-sets (e.g., for English-French)m, try the big model
158+ with ` --hparams_set=transformer_big ` .
159+
160+ ## Basics
161+
162+ ### Walkthrough
108163
109164Here's a walkthrough training a good English-to-German translation
110165model using the Transformer model from [ * Attention Is All You
@@ -170,36 +225,8 @@ cat translation.en
170225t2t-bleu --translation=translation.en --reference=ref-translation.de
171226```
172227
173- ---
174-
175- ## Suggested Models
228+ ### Installation
176229
177- Here are some combinations of models, hparams and problems that we found
178- work well, so we suggest to use them if you're interested in that problem.
179-
180- ### Translation
181-
182- For translation, esp. English-German and English-French, we suggest to use
183- the Transformer model in base or big configurations, i.e.
184- for ` --problems=translate_ende_wmt32k ` use ` --model=transformer ` and
185- ` --hparams_set=transformer_base ` . When trained on 8 GPUs for 300K steps
186- this should reach a BLEU score of about 28.
187-
188- ### Summarization
189-
190- For summarization suggest to use the Transformer model in prepend mode, i.e.
191- for ` --problems=summarize_cnn_dailymail32k ` use ` --model=transformer ` and
192- ` --hparams_set=transformer_prepend ` .
193-
194- ### Image Classification
195-
196- For image classification suggest to use the ResNet or Xception, i.e.
197- for ` --problems=image_imagenet ` use ` --model=resnet50 ` and
198- ` --hparams_set=resnet_base ` or ` --model=xception ` and
199- ` --hparams_set=xception_base ` .
200-
201-
202- ## Installation
203230
204231```
205232# Assumes tensorflow or tensorflow-gpu installed
@@ -228,9 +255,7 @@ Library usage:
228255python -c "from tensor2tensor.models.transformer import Transformer"
229256```
230257
231- ---
232-
233- ## Features
258+ ### Features
234259
235260* Many state of the art and baseline models are built-in and new models can be
236261 added easily (open an issue or pull request!).
@@ -243,11 +268,10 @@ python -c "from tensor2tensor.models.transformer import Transformer"
243268 specification.
244269* Support for multi-GPU machines and synchronous (1 master, many workers) and
245270 asynchronous (independent workers synchronizing through a parameter server)
246- [ distributed training] ( https://github.com/tensorflow/ tensor2tensor/tree/master/docs/ distributed_training.md ) .
271+ [ distributed training] ( https://tensorflow. github.io/ tensor2tensor/distributed_training.html ) .
247272* Easily swap amongst datasets and models by command-line flag with the data
248273 generation script ` t2t-datagen ` and the training script ` t2t-trainer ` .
249-
250- ---
274+ * Train on [ Google Cloud ML] ( https://tensorflow.github.io/tensor2tensor/cloud_mlengine.html ) and [ Cloud TPUs] ( https://tensorflow.github.io/tensor2tensor/cloud_tpu.html ) .
251275
252276## T2T overview
253277
@@ -303,9 +327,7 @@ inference. Users can easily switch between problems, models, and hyperparameter
303327sets by using the ` --model ` , ` --problems ` , and ` --hparams_set ` flags. Specific
304328hyperparameters can be overridden with the ` --hparams ` flag. ` --schedule ` and
305329related flags control local and distributed training/evaluation
306- ([ distributed training documentation] ( https://github.com/tensorflow/tensor2tensor/tree/master/docs/distributed_training.md ) ).
307-
308- ---
330+ ([ distributed training documentation] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/g3doc/distributed_training.md ) ).
309331
310332## Adding your own components
311333
@@ -331,6 +353,21 @@ for an example.
331353Also see the [ data generators
332354README] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/README.md ) .
333355
334- ---
356+ ## Papers
357+
358+ Tensor2Tensor was used to develop a number of state-of-the-art models
359+ and deep learning methods. Here we list some papers that were based on T2T
360+ from the start and benefited from its features and architecture in ways
361+ described in the [ Google Research Blog post introducing
362+ T2T] ( https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html ) .
363+
364+ * [ Attention Is All You Need] ( https://arxiv.org/abs/1706.03762 )
365+ * [ Depthwise Separable Convolutions for Neural Machine
366+ Translation] ( https://arxiv.org/abs/1706.03059 )
367+ * [ One Model To Learn Them All] ( https://arxiv.org/abs/1706.05137 )
368+ * [ Discrete Autoencoders for Sequence Models] ( https://arxiv.org/abs/1801.09797 )
369+ * [ Generating Wikipedia by Summarizing Long
370+ Sequences] ( https://arxiv.org/abs/1801.10198 )
371+ * [ Image Transformer] ( https://openreview.net/forum?id=r16Vyf-0- )
335372
336373* Note: This is not an official Google product.*
0 commit comments