Home Linux Software Templates Guides

TensorFlow

TensorFlow is an open-source software library designed for high performance, scalable numerical computation, placing a particular emphasis on machine learning and deep neural networks. These pages provide a brief introduction to the use of TensorFlow through a series of increasingly complex examples.

The source code is available on GitHub, and additional resources for working with TensorFlow are available on the API Documentation and Youtube Channel.

All of the TensorFlow examples below are also available on GitHub.

TensorFlow icon

Icon from Wikimedia Commons

Structuring Code

For simple models like the ones presented above, a straight-forward imperative coding style is adequate. As models become more complex, however, it is better to define a dedicated class to represent the model in terms of attributes and methods.

  Hide Code     View on GitHub

Monitored Training Sessions

It is also helpful to save checkpoints and log files to monitor the progress of the training process. This can be done using the built-in tf.train.MonitoredTrainingSession and tf.summary.FileWriter functions. An example implementation of a monitored training session is provided below:

  Show Code     View on GitHub

The above code stores checkpoints in the directory ./Model/Checkpoints/ which are used to restore the trained model parameters after training. These checkpoints can also be used to resume where the training last left off in the event that the training process is interrupted (this is done automatically when the code above is called and previous checkpoints exist in the checkpoints folder). In addition, log files with summaries of the loss at each step are stored in the ./Model/logs/ directory.

The summary data can be visualized using TensorBoard by issuing the command:

$ tensorboard --logdir Model/logs

and navigating to localhost:6006 in a web browser (if localhost is not set, you may consider trying 127.0.0.1:6006). After training the model defined above, the TensorBoard summaries and graph should appear roughly as follows:

Screenshot of TensorBoard summaries

Screenshot of TensorBoard graph

Early Stopping

Looking at the TensorBoard loss plot from the previous section, we see that the performance of the network effectively levels off less than halfway through the training process. Moreover, we note that some erratic behavior occurs after the peek performance is reached (i.e. the spikes which occur late in the training process). For this reason it is often helpful to specify a performance tolerance beforehand and to stop/save the model once this performance level has been reached.

This implementation technique, which is referred to as early stopping, can be executed in TensorFlow by periodically checking the performance of the model at regular intervals throughout the training process. Periodic checks like this can be made using hooks passed to the monitored training session instance. Below is an example implementation of the previous model with early stopping enforced:

  Show Code     View on GitHub

Since we will be evaluating the neural network on different input tensors for training and early stopping checks, it is convenient to define the network as a function which allows us to specify the desired input. To make sure the same network parameters are used for both evaluations (and that the network is not duplicated in the graph) we must set reuse=True when the network is called a second time during the build_model routine. The individual network layers must also be assigned distinct names (e.g. dense_1) so that the appropriate weights and biases are restored to the correct layers.

Note: In this artificial example, the large spikes in the loss function are due to the existence of outliers in the dataset (since we have used np.random.normal instead of np.random.uniform). In practice, however, it is almost inevitable that outliers will exist in the training dataset, and the early stopping technique along with learning rate decay help to mitigate their effect on the overall training process.

TensorBoard graph for Early_Stopping loss

Validation and Regularization

In order to more accurately assess the performance of a network, a dedicated validation dataset should be used during the training process.

  Show Code     View on GitHub

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

TensorBoard graph for loss on training and validation dataset

Freezing Models

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.