set_random_seed
functionkeras.utils.set_random_seed(seed)
Sets all random seeds (Python, NumPy, and backend framework, e.g. TF).
You can use this utility to make almost any Keras program fully deterministic. Some limitations apply in cases where network communications are involved (e.g. parameter server distribution), which creates additional sources of randomness, or when certain non-deterministic cuDNN ops are involved.
Calling this utility is equivalent to the following:
import random
random.seed(seed)
import numpy as np
np.random.seed(seed)
import tensorflow as tf # Only if TF is installed
tf.random.set_seed(seed)
import torch # Only if the backend is 'torch'
torch.manual_seed(seed)
Note that the TensorFlow seed is set even if you're not using TensorFlow
as your backend framework, since many workflows leverage tf.data
pipelines (which feature random shuffling). Likewise many workflows
might leverage NumPy APIs.
Arguments
split_dataset
functionkeras.utils.split_dataset(
dataset, left_size=None, right_size=None, shuffle=False, seed=None
)
Splits a dataset into a left half and a right half (e.g. train / test).
Arguments
tf.data.Dataset
, a torch.utils.data.Dataset
object,
or a list/tuple of arrays with the same length.[0, 1]
), it signifies
the fraction of the data to pack in the left dataset. If integer, it
signifies the number of samples to pack in the left dataset. If
None
, defaults to the complement to right_size
.
Defaults to None
.[0, 1]
), it signifies
the fraction of the data to pack in the right dataset.
If integer, it signifies the number of samples to pack
in the right dataset.
If None
, defaults to the complement to left_size
.
Defaults to None
.Returns
tf.data.Dataset
objects:
the left and right splits.Example
>>> data = np.random.random(size=(1000, 4))
>>> left_ds, right_ds = keras.utils.split_dataset(data, left_size=0.8)
>>> int(left_ds.cardinality())
800
>>> int(right_ds.cardinality())
200
pack_x_y_sample_weight
functionkeras.utils.pack_x_y_sample_weight(x, y=None, sample_weight=None)
Packs user-provided data into a tuple.
This is a convenience utility for packing data into the tuple formats
that Model.fit()
uses.
Example
>>> x = ops.ones((10, 1))
>>> data = pack_x_y_sample_weight(x)
>>> isinstance(data, ops.Tensor)
True
>>> y = ops.ones((10, 1))
>>> data = pack_x_y_sample_weight(x, y)
>>> isinstance(data, tuple)
True
>>> x, y = data
Arguments
Model
.Model
.Returns
Tuple in the format used in Model.fit()
.
get_file
functionkeras.utils.get_file(
fname=None,
origin=None,
untar=False,
md5_hash=None,
file_hash=None,
cache_subdir="datasets",
hash_algorithm="auto",
extract=False,
archive_format="auto",
cache_dir=None,
force_download=False,
)
Downloads a file from a URL if it not already in the cache.
By default the file at the url origin
is downloaded to the
cache_dir ~/.keras
, placed in the cache_subdir datasets
,
and given the filename fname
. The final location of a file
example.txt
would therefore be ~/.keras/datasets/example.txt
.
Files in .tar
, .tar.gz
, .tar.bz
, and .zip
formats can
also be extracted.
Passing a hash will verify the file after download. The command line
programs shasum
and sha256sum
can compute the hash.
Example
path_to_downloaded_file = get_file(
origin="https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz",
extract=True,
)
Arguments
None
, the name of the file at origin
will be used.
If downloading and extracting a directory archive,
the provided fname
will be used as extraction directory
name (only if it doesn't have an extension).extract
argument.
Boolean, whether the file is a tar archive that should
be extracted.file_hash
argument.
md5 hash of the file for file integrity verification."/path/to/folder"
is
specified, the file will be saved at that location."md5'
, "sha256'
, and "auto'
.
The default 'auto' detects the hash algorithm in use.True
, extracts the archive. Only applicable to compressed
archive files like tar or zip."auto'
, "tar'
, "zip'
, and None
.
"tar"
includes tar, tar.gz, and tar.bz files.
The default "auto"
corresponds to ["tar", "zip"]
.
None or an empty list will return no matches found.$KERAS_HOME
if the KERAS_HOME
environment
variable is set or ~/.keras/
.True
, the file will always be re-downloaded
regardless of the cache state.Returns
Path to the downloaded file.
⚠️ Warning on malicious downloads ⚠️
Downloading something from the Internet carries a risk.
NEVER download a file/archive if you do not trust the source.
We recommend that you specify the file_hash
argument
(if the hash of the source file is known) to make sure that the file you
are getting is the one you expect.
Progbar
classkeras.utils.Progbar(
target, width=20, verbose=1, interval=0.05, stateful_metrics=None, unit_name="step"
)
Displays a progress bar.
Arguments
PyDataset
classkeras.utils.PyDataset(workers=1, use_multiprocessing=False, max_queue_size=10)
Base class for defining a parallel dataset using Python code.
Every PyDataset
must implement the __getitem__()
and the __len__()
methods. If you want to modify your dataset between epochs,
you may additionally implement on_epoch_end()
,
or on_epoch_begin
to be called at the start of each epoch.
The __getitem__()
method should return a complete batch
(not a single sample), and the __len__
method should return
the number of batches in the dataset (rather than the number of samples).
Arguments
True
means that your
dataset will be replicated in multiple forked processes.
This is necessary to gain compute-level (rather than I/O level)
benefits from parallelism. However it can only be set to
True
if your dataset can be safely pickled.Notes:
PyDataset
is a safer way to do multiprocessing.
This structure guarantees that the model will only train
once on each sample per epoch, which is not the case
with Python generators.workers
, use_multiprocessing
, and max_queue_size
exist to configure how fit()
uses parallelism to iterate
over the dataset. They are not being used by the PyDataset
class
directly. When you are manually iterating over a PyDataset
,
no parallelism is applied.Example
from skimage.io import imread
from skimage.transform import resize
import numpy as np
import math
# Here, `x_set` is list of path to the images
# and `y_set` are the associated classes.
class CIFAR10PyDataset(keras.utils.PyDataset):
def __init__(self, x_set, y_set, batch_size, **kwargs):
super().__init__(**kwargs)
self.x, self.y = x_set, y_set
self.batch_size = batch_size
def __len__(self):
# Return number of batches.
return math.ceil(len(self.x) / self.batch_size)
def __getitem__(self, idx):
# Return x, y for batch idx.
low = idx * self.batch_size
# Cap upper bound at array length; the last batch may be smaller
# if the total number of items is not a multiple of batch size.
high = min(low + self.batch_size, len(self.x))
batch_x = self.x[low:high]
batch_y = self.y[low:high]
return np.array([
resize(imread(file_name), (200, 200))
for file_name in batch_x]), np.array(batch_y)
to_categorical
functionkeras.utils.to_categorical(x, num_classes=None)
Converts a class vector (integers) to binary class matrix.
E.g. for use with categorical_crossentropy
.
Arguments
num_classes - 1
).None
, this would be inferred
as max(x) + 1
. Defaults to None
.Returns
A binary matrix representation of the input as a NumPy array. The class axis is placed last.
Example
>>> a = keras.utils.to_categorical([0, 1, 2, 3], num_classes=4)
>>> print(a)
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
>>> b = np.array([.9, .04, .03, .03,
... .3, .45, .15, .13,
... .04, .01, .94, .05,
... .12, .21, .5, .17],
... shape=[4, 4])
>>> loss = keras.ops.categorical_crossentropy(a, b)
>>> print(np.around(loss, 5))
[0.10536 0.82807 0.1011 1.77196]
>>> loss = keras.ops.categorical_crossentropy(a, a)
>>> print(np.around(loss, 5))
[0. 0. 0. 0.]
normalize
functionkeras.utils.normalize(x, axis=-1, order=2)
Normalizes an array.
If the input is a NumPy array, a NumPy array will be returned. If it's a backend tensor, a backend tensor will be returned.
Arguments
order=2
for L2 norm).Returns
A normalized copy of the array.