超参数调参可以使平均模型和高精度模型之间的差异。通常简单的事情,比如选择不同的学习速率或改变网络层大小,都会对您的模型性能产生巨大的影响。
幸运的是,有一些工具可以帮助找到参数的最佳组合。 Ray Tune 是分布式超参数调优的行业标准工具。Ray Tune包括最新的超参数搜索算法,与TensorBoard等分析库集成,并通过Ray’s distributed machine learning engine本地支持分布式训练。
在本教程中,我们将向大家展示如何将Ray Tune集成到Py Torch培训工作流程中。我们将从Py Torch文档this tutorial from the PyTorch documentation 中扩展本教程,用于训练CIFAR10图像分类器。
As you will see, we only need to add some slight modifications. In particular, we need to
wrap data loading and training in functions,
将数据的训练和加载包装在功能模块里
make some network parameters configurable,
对网络参数进行配置
add checkpointing (optional),
增加checkpoint
and define the search space for the model tuning
定义搜索区域用来模型调优
要运行本教程,请确保安装了以下包:
ray[tune]
: Distributed hyperparameter tuning library
- 分布式超参数调参库
torchvision
: For the data transformers
Setup / Imports
首先从imports开始:
from functools import partial import numpy as np import os import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import random_split import torchvision import torchvision.transforms as transforms from ray import tune from ray.tune import CLIReporter from ray.tune.schedulers import ASHAScheduler
Data loaders
我们将数据加载器包装在自己的函数中,并传递一个全局数据目录。这样我们就可以在不同的试验之间共享一个数据目录。
def load_data(data_dir="./data"):transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) ?trainset = torchvision.datasets.CIFAR10(root=data_dir, train=True, download=True, transform=transform) ?testset = torchvision.datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform) ?return trainset, testset
Configurable neural network
我们只能调整那些可配置的参数。在本例中,我们可以指定完全连接层的层大小:
class Net(nn.Module):def __init__(self, l1=120, l2=84):super(Net, self).__init__()self.conv1 = nn.Conv2d(3, 6, 5)self.pool = nn.MaxPool2d(2, 2)self.conv2 = nn.Conv2d(6, 16, 5)self.fc1 = nn.Linear(16 * 5 * 5, l1)self.fc2 = nn.Linear(l1, l2)self.fc3 = nn.Linear(l2, 10) ?def forward(self, x):x = self.pool(F.relu(self.conv1(x)))x = self.pool(F.relu(self.conv2(x)))x = x.view(-1, 16 * 5 * 5)x = F.relu(self.fc1(x))x = F.relu(self.fc2(x))x = self.fc3(x)return x
The train function
Now it gets interesting,因为我们从Py Torch文档中引入了一些对示例的更改。
我们将训练脚本包装在函数train_cifar中 train_cifar(config,
checkpoint_dir=None,
data_dir=None)
配置参数将接收我们希望使用的超参数。checkpoint_dir参数用于恢复检查点。data_dir指定我们加载和存储数据的目录,因此多次运行可以共享相同的数据源。
net = Net(config["l1"], config["l2"]) ? if checkpoint_dir:model_state, optimizer_state = torch.load(os.path.join(checkpoint_dir, "checkpoint"))net.load_state_dict(model_state)optimizer.load_state_dict(optimizer_state)
优化器的学习速率也是可配置的:
optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
我们还将培训数据拆分为一个培训和验证子集。因此,我们对80%的数据进行培训,并计算其余20%的验证损失。我们迭代训练和测试集的批处理大小也是可配置的。
Adding (multi) GPU support with DataParallel
图像分类主要得益于GPU。幸运的是,我们可以继续在RayTune中使用PyTorch的抽象。因此,我们可以在nn中包装我们的模型。数据并行支持多个GPU上的数据并行培训:
device = "cpu" if torch.cuda.is_available():device = "cuda:0"if torch.cuda.device_count() > 1:net = nn.DataParallel(net) net.to(device)
通过使用设备变量,我们确保当我们没有可用的GPU时,培训也能工作。Py Torch要求我们显式地将数据发送到GPU内存,如下所示:
for i, data in enumerate(trainloader, 0):inputs, labels = datainputs, labels = inputs.to(device), labels.to(device)
该代码现在支持对CPU、单个GPU和多个GPU进行培训。 值得注意的是,Ray还支持fractional GPUs,因此我们可以在试验中共享GPU,只要模型仍然适合GPU内存。 我们稍后再谈。
Communicating with Ray Tune
The most interesting part is the communication with Ray Tune:
with tune.checkpoint_dir(epoch) as checkpoint_dir:path = os.path.join(checkpoint_dir, "checkpoint")torch.save((net.state_dict(), optimizer.state_dict()), path) ? tune.report(loss=(val_loss / val_steps), accuracy=correct / total)
在这里,我们首先保存一个检查点,然后向RayTune报告一些度量。 具体来说,我们将验证损失和准确性送回RayTune。 然后,Ray Tune可以使用这些度量来决定哪些超参数配置导致最佳结果。 这些指标也可以用于早期停止性能不佳的试验,以避免在这些试验上浪费资源。
检查点保存是可选的,但是,如果我们想使用诸如Population Based Training的高级调度程序,这是必要的。 此外,通过保存检查点,我们可以稍后加载经过训练的模型并在测试集上验证它们。
Full training function
完整的代码示例如下:
def train_cifar(config, checkpoint_dir=None, data_dir=None):net = Net(config["l1"], config["l2"]) ?device = "cpu"if torch.cuda.is_available():device = "cuda:0"if torch.cuda.device_count() > 1:net = nn.DataParallel(net)net.to(device) ?criterion = nn.CrossEntropyLoss()optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9) ?if checkpoint_dir:model_state, optimizer_state = torch.load(os.path.join(checkpoint_dir, "checkpoint"))net.load_state_dict(model_state)optimizer.load_state_dict(optimizer_state) ?trainset, testset = load_data(data_dir) ?test_abs = int(len(trainset) * 0.8)train_subset, val_subset = random_split(trainset, [test_abs, len(trainset) - test_abs]) ?trainloader = torch.utils.data.DataLoader(train_subset,batch_size=int(config["batch_size"]),shuffle=True,num_workers=8)valloader = torch.utils.data.DataLoader(val_subset,batch_size=int(config["batch_size"]),shuffle=True,num_workers=8) ?for epoch in range(10): # loop over the dataset multiple timesrunning_loss = 0.0epoch_steps = 0for i, data in enumerate(trainloader, 0):# get the inputs; data is a list of [inputs, labels]inputs, labels = datainputs, labels = inputs.to(device), labels.to(device)# zero the parameter gradientsoptimizer.zero_grad()# forward + backward + optimizeoutputs = net(inputs)loss = criterion(outputs, labels)loss.backward()optimizer.step()# print statisticsrunning_loss += loss.item()epoch_steps += 1if i % 2000 == 1999: # print every 2000 mini-batchesprint("[%d, %5d] loss: %.3f" % (epoch + 1, i + 1,running_loss / epoch_steps))running_loss = 0.0# Validation lossval_loss = 0.0val_steps = 0total = 0correct = 0for i, data in enumerate(valloader, 0):with torch.no_grad():inputs, labels = datainputs, labels = inputs.to(device), labels.to(device) ?outputs = net(inputs)_, predicted = torch.max(outputs.data, 1)total += labels.size(0)correct += (predicted == labels).sum().item() ?loss = criterion(outputs, labels)val_loss += loss.cpu().numpy()val_steps += 1 ?with tune.checkpoint_dir(epoch) as checkpoint_dir:path = os.path.join(checkpoint_dir, "checkpoint")torch.save((net.state_dict(), optimizer.state_dict()), path) ?tune.report(loss=(val_loss / val_steps), accuracy=correct / total)print("Finished Trainin
大多数代码都是直接从原始示例中改编的。
?Test set accuracy
通常,机器学习模型的性能是在一个没有用于训练模型的数据的搁置测试集上进行测试的。 我们还将其包装为一个函数:
def test_accuracy(net, device="cpu"):trainset, testset = load_data() ?testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2) ?correct = 0total = 0with torch.no_grad():for data in testloader:images, labels = dataimages, labels = images.to(device), labels.to(device)outputs = net(images)_, predicted = torch.max(outputs.data, 1)total += labels.size(0)correct += (predicted == labels).sum().item() ?return correct / total
该函数还需要device
参数,因此我们可以在GPU上进行测试集验证。
Configuring the search space
最后,我们需要定义RayTune的搜索空间。 下面举一个例子:
config = {"l1": tune.sample_from(lambda _: 2**np.random.randint(2, 9)),"l2": tune.sample_from(lambda _: 2**np.random.randint(2, 9)),"lr": tune.loguniform(1e-4, 1e-1),"batch_size": tune.choice([2, 4, 8, 16]) }
sample_from()函数可以定义自己的示例方法以获得超参数。 在本例中,L1和L2参数应该是4到256之间的2的幂,所以要么是4、8、16、32、64、128或256。 应在0.0001到0.1之间均匀采样lr(学习率。 最后,批处理大小是2、4、8和16之间的选择。
在每一次试验中,Ray Tune现在将随机地从这些搜索空间中抽取一个参数组合。 然后,它将并行训练一些模型,并在其中找到性能最好的模型。 我们还使用ASHAScheduler,它将提前终止执行不良的试验。
我们用functools.partial包装train_cifar函数来设置常量data_dir参数。 我们还可以告诉RayTune每个试验应该有哪些资源:
gpus_per_trial = 2 # ... result = tune.run(partial(train_cifar, data_dir=data_dir),resources_per_trial={"cpu": 8, "gpu": gpus_per_trial},config=config,num_samples=num_samples,scheduler=scheduler,progress_reporter=reporter,checkpoint_at_end=True)
可以指定CPU的数量,然后可用,例如。 以增加PyTorch数据加载器实例的num_workers。 在每个试验中,选定的GPU数对PyTorch是可见的。 审判不能访问尚未为他们请求的GPU-所以您不必关心使用同一组资源的两个审判。
这里我们也可以指定分数GPU,所以像gpus_per_trial=0.5这样的东西是完全有效的。 然后,试验将相互共享GPU。 只需要确保模型仍然适合GPU内存。
在对模型进行训练后,我们将找到性能最好的模型,并从检查点文件加载训练好的网络。 然后,我们获得测试集的准确性,并通过打印报告一切。
完整的主要功能如下:
def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2):data_dir = os.path.abspath("./data")load_data(data_dir)config = {"l1": tune.sample_from(lambda _: 2 ** np.random.randint(2, 9)),"l2": tune.sample_from(lambda _: 2 ** np.random.randint(2, 9)),"lr": tune.loguniform(1e-4, 1e-1),"batch_size": tune.choice([2, 4, 8, 16])}scheduler = ASHAScheduler(metric="loss",mode="min",max_t=max_num_epochs,grace_period=1,reduction_factor=2)reporter = CLIReporter(# parameter_columns=["l1", "l2", "lr", "batch_size"],metric_columns=["loss", "accuracy", "training_iteration"])result = tune.run(partial(train_cifar, data_dir=data_dir),resources_per_trial={"cpu": 2, "gpu": gpus_per_trial},config=config,num_samples=num_samples,scheduler=scheduler,progress_reporter=reporter) ?best_trial = result.get_best_trial("loss", "min", "last")print("Best trial config: {}".format(best_trial.config))print("Best trial final validation loss: {}".format(best_trial.last_result["loss"]))print("Best trial final validation accuracy: {}".format(best_trial.last_result["accuracy"])) ?best_trained_model = Net(best_trial.config["l1"], best_trial.config["l2"])device = "cpu"if torch.cuda.is_available():device = "cuda:0"if gpus_per_trial > 1:best_trained_model = nn.DataParallel(best_trained_model)best_trained_model.to(device) ?best_checkpoint_dir = best_trial.checkpoint.valuemodel_state, optimizer_state = torch.load(os.path.join(best_checkpoint_dir, "checkpoint"))best_trained_model.load_state_dict(model_state) ?test_acc = test_accuracy(best_trained_model, device)print("Best trial test set accuracy: {}".format(test_acc)) ? ? if __name__ == "__main__":# You can change the number of GPUs per trial here:main(num_samples=10, max_num_epochs=10, gpus_per_trial=0)
?输出:
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to /var/lib/jenkins/workspace/beginner_source/data/cifar-10-python.tar.gz Extracting /var/lib/jenkins/workspace/beginner_source/data/cifar-10-python.tar.gz to /var/lib/jenkins/workspace/beginner_source/data Files already downloaded and verified == Status == Memory usage on this node: 4.0/240.1 GiB Using AsyncHyperBand: num_stopped=0 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: None Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (9 PENDING, 1 RUNNING) +---------------------+----------+-------+--------------+------+------+-------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | |---------------------+----------+-------+--------------+------+------+-------------| | DEFAULT_77a44_00000 | RUNNING | | 4 | 8 | 128 | 0.0210161 | | DEFAULT_77a44_00001 | PENDING | | 2 | 256 | 128 | 0.000461678 | | DEFAULT_77a44_00002 | PENDING | | 8 | 32 | 16 | 0.0131231 | | DEFAULT_77a44_00003 | PENDING | | 4 | 4 | 128 | 0.00551547 | | DEFAULT_77a44_00004 | PENDING | | 2 | 256 | 256 | 0.0647615 | | DEFAULT_77a44_00005 | PENDING | | 4 | 4 | 128 | 0.0421917 | | DEFAULT_77a44_00006 | PENDING | | 2 | 8 | 8 | 0.000359613 | | DEFAULT_77a44_00007 | PENDING | | 4 | 128 | 16 | 0.00202898 | | DEFAULT_77a44_00008 | PENDING | | 2 | 4 | 8 | 0.000162963 | | DEFAULT_77a44_00009 | PENDING | | 2 | 32 | 256 | 0.000134494 | +---------------------+----------+-------+--------------+------+------+-------------+[2m[36m(pid=1164)[0m Files already downloaded and verified [2m[36m(pid=1145)[0m Files already downloaded and verified [2m[36m(pid=1104)[0m Files already downloaded and verified [2m[36m(pid=1119)[0m Files already downloaded and verified [2m[36m(pid=1140)[0m Files already downloaded and verified [2m[36m(pid=1118)[0m Files already downloaded and verified [2m[36m(pid=1098)[0m Files already downloaded and verified [2m[36m(pid=1101)[0m Files already downloaded and verified [2m[36m(pid=1165)[0m Files already downloaded and verified [2m[36m(pid=1126)[0m Files already downloaded and verified [2m[36m(pid=1164)[0m Files already downloaded and verified [2m[36m(pid=1098)[0m Files already downloaded and verified [2m[36m(pid=1145)[0m Files already downloaded and verified [2m[36m(pid=1104)[0m Files already downloaded and verified [2m[36m(pid=1119)[0m Files already downloaded and verified [2m[36m(pid=1140)[0m Files already downloaded and verified [2m[36m(pid=1118)[0m Files already downloaded and verified [2m[36m(pid=1101)[0m Files already downloaded and verified [2m[36m(pid=1165)[0m Files already downloaded and verified [2m[36m(pid=1126)[0m Files already downloaded and verified [2m[36m(pid=1126)[0m [1, 2000] loss: 2.295 [2m[36m(pid=1101)[0m [1, 2000] loss: 2.310 [2m[36m(pid=1165)[0m [1, 2000] loss: 2.193 [2m[36m(pid=1119)[0m [1, 2000] loss: 2.302 [2m[36m(pid=1145)[0m [1, 2000] loss: 2.296 [2m[36m(pid=1118)[0m [1, 2000] loss: 2.326 [2m[36m(pid=1104)[0m [1, 2000] loss: 2.303 [2m[36m(pid=1098)[0m [1, 2000] loss: 2.083 [2m[36m(pid=1164)[0m [1, 2000] loss: 1.995 [2m[36m(pid=1140)[0m [1, 2000] loss: 2.377 [2m[36m(pid=1126)[0m [1, 4000] loss: 1.078 [2m[36m(pid=1101)[0m [1, 4000] loss: 1.149 [2m[36m(pid=1119)[0m [1, 4000] loss: 1.149 [2m[36m(pid=1165)[0m [1, 4000] loss: 1.020 [2m[36m(pid=1118)[0m [1, 4000] loss: 1.161 [2m[36m(pid=1104)[0m [1, 4000] loss: 1.157 [2m[36m(pid=1145)[0m [1, 4000] loss: 1.052 [2m[36m(pid=1098)[0m [1, 4000] loss: 0.883 [2m[36m(pid=1164)[0m [1, 4000] loss: 0.927 [2m[36m(pid=1140)[0m [1, 4000] loss: 1.186 [2m[36m(pid=1126)[0m [1, 6000] loss: 0.684 [2m[36m(pid=1101)[0m [1, 6000] loss: 0.760 [2m[36m(pid=1119)[0m [1, 6000] loss: 0.758 [2m[36m(pid=1165)[0m [1, 6000] loss: 0.660 [2m[36m(pid=1118)[0m [1, 6000] loss: 0.775 [2m[36m(pid=1104)[0m [1, 6000] loss: 0.770 [2m[36m(pid=1145)[0m [1, 6000] loss: 0.624 [2m[36m(pid=1098)[0m [1, 6000] loss: 0.542 Result for DEFAULT_77a44_00002:accuracy: 0.2841date: 2020-10-09_19-56-48done: falseexperiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123hostname: 234fef3cc6b0iterations_since_restore: 1loss: 1.881975656604767node_ip: 172.17.0.2pid: 1164should_checkpoint: truetime_since_restore: 41.3854501247406time_this_iter_s: 41.3854501247406time_total_s: 41.3854501247406timestamp: 1602273408timesteps_since_restore: 0training_iteration: 1trial_id: 77a44_00002== Status == Memory usage on this node: 8.8/240.1 GiB Using AsyncHyperBand: num_stopped=0 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: -1.881975656604767 Resources requested: 20/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (10 RUNNING) +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | RUNNING | | 4 | 8 | 128 | 0.0210161 | | | | | DEFAULT_77a44_00001 | RUNNING | | 2 | 256 | 128 | 0.000461678 | | | | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.88198 | 0.2841 | 1 | | DEFAULT_77a44_00003 | RUNNING | | 4 | 4 | 128 | 0.00551547 | | | | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | RUNNING | | 4 | 4 | 128 | 0.0421917 | | | | | DEFAULT_77a44_00006 | RUNNING | | 2 | 8 | 8 | 0.000359613 | | | | | DEFAULT_77a44_00007 | RUNNING | | 4 | 128 | 16 | 0.00202898 | | | | | DEFAULT_77a44_00008 | RUNNING | | 2 | 4 | 8 | 0.000162963 | | | | | DEFAULT_77a44_00009 | RUNNING | | 2 | 32 | 256 | 0.000134494 | | | | +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1126)[0m [1, 8000] loss: 0.499 [2m[36m(pid=1101)[0m [1, 8000] loss: 0.559 [2m[36m(pid=1119)[0m [1, 8000] loss: 0.552 [2m[36m(pid=1165)[0m [1, 8000] loss: 0.488 [2m[36m(pid=1118)[0m [1, 8000] loss: 0.581 [2m[36m(pid=1104)[0m [1, 8000] loss: 0.579 [2m[36m(pid=1145)[0m [1, 8000] loss: 0.448 [2m[36m(pid=1098)[0m [1, 8000] loss: 0.389 [2m[36m(pid=1140)[0m [1, 6000] loss: 0.793 [2m[36m(pid=1164)[0m [2, 2000] loss: 1.870 [2m[36m(pid=1101)[0m [1, 10000] loss: 0.435 [2m[36m(pid=1126)[0m [1, 10000] loss: 0.386 [2m[36m(pid=1119)[0m [1, 10000] loss: 0.427 [2m[36m(pid=1165)[0m [1, 10000] loss: 0.390 [2m[36m(pid=1118)[0m [1, 10000] loss: 0.465 [2m[36m(pid=1104)[0m [1, 10000] loss: 0.462 [2m[36m(pid=1145)[0m [1, 10000] loss: 0.341 [2m[36m(pid=1098)[0m [1, 10000] loss: 0.302 [2m[36m(pid=1101)[0m [1, 12000] loss: 0.353 [2m[36m(pid=1126)[0m [1, 12000] loss: 0.311 [2m[36m(pid=1119)[0m [1, 12000] loss: 0.345 [2m[36m(pid=1164)[0m [2, 4000] loss: 0.938 [2m[36m(pid=1140)[0m [1, 8000] loss: 0.594 Result for DEFAULT_77a44_00003:accuracy: 0.2563date: 2020-10-09_19-57-13done: trueexperiment_id: 5c01db6fb7974f6087f128418068ab25experiment_tag: 3_batch_size=4,l1=4,l2=128,lr=0.0055155hostname: 234fef3cc6b0iterations_since_restore: 1loss: 1.9565512576580049node_ip: 172.17.0.2pid: 1165should_checkpoint: truetime_since_restore: 65.84106469154358time_this_iter_s: 65.84106469154358time_total_s: 65.84106469154358timestamp: 1602273433timesteps_since_restore: 0training_iteration: 1trial_id: 77a44_00003== Status == Memory usage on this node: 8.9/240.1 GiB Using AsyncHyperBand: num_stopped=1 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: -1.919263457131386 Resources requested: 20/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (10 RUNNING) +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | RUNNING | | 4 | 8 | 128 | 0.0210161 | | | | | DEFAULT_77a44_00001 | RUNNING | | 2 | 256 | 128 | 0.000461678 | | | | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.88198 | 0.2841 | 1 | | DEFAULT_77a44_00003 | RUNNING | 172.17.0.2:1165 | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | RUNNING | | 4 | 4 | 128 | 0.0421917 | | | | | DEFAULT_77a44_00006 | RUNNING | | 2 | 8 | 8 | 0.000359613 | | | | | DEFAULT_77a44_00007 | RUNNING | | 4 | 128 | 16 | 0.00202898 | | | | | DEFAULT_77a44_00008 | RUNNING | | 2 | 4 | 8 | 0.000162963 | | | | | DEFAULT_77a44_00009 | RUNNING | | 2 | 32 | 256 | 0.000134494 | | | | +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+Result for DEFAULT_77a44_00005:accuracy: 0.0986date: 2020-10-09_19-57-13done: trueexperiment_id: 8d41531f8ac84a2fa81eb0d04bb4809aexperiment_tag: 5_batch_size=4,l1=4,l2=128,lr=0.042192hostname: 234fef3cc6b0iterations_since_restore: 1loss: 2.3523551787376404node_ip: 172.17.0.2pid: 1118should_checkpoint: truetime_since_restore: 66.13440608978271time_this_iter_s: 66.13440608978271time_total_s: 66.13440608978271timestamp: 1602273433timesteps_since_restore: 0training_iteration: 1trial_id: 77a44_00005Result for DEFAULT_77a44_00000:accuracy: 0.1073date: 2020-10-09_19-57-13done: trueexperiment_id: 71350ebb3b9b4c2ca892c43094b6e672experiment_tag: 0_batch_size=4,l1=8,l2=128,lr=0.021016hostname: 234fef3cc6b0iterations_since_restore: 1loss: 2.306087596511841node_ip: 172.17.0.2pid: 1104should_checkpoint: truetime_since_restore: 66.43020415306091time_this_iter_s: 66.43020415306091time_total_s: 66.43020415306091timestamp: 1602273433timesteps_since_restore: 0training_iteration: 1trial_id: 77a44_00000Result for DEFAULT_77a44_00007:accuracy: 0.4484date: 2020-10-09_19-57-14done: falseexperiment_id: 1e0a3b1304eb470898956b381db607e6experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029hostname: 234fef3cc6b0iterations_since_restore: 1loss: 1.505290996646881node_ip: 172.17.0.2pid: 1098should_checkpoint: truetime_since_restore: 67.45768523216248time_this_iter_s: 67.45768523216248time_total_s: 67.45768523216248timestamp: 1602273434timesteps_since_restore: 0training_iteration: 1trial_id: 77a44_00007[2m[36m(pid=1145)[0m [1, 12000] loss: 0.270 [2m[36m(pid=1126)[0m [1, 14000] loss: 0.260 [2m[36m(pid=1101)[0m [1, 14000] loss: 0.301 [2m[36m(pid=1119)[0m [1, 14000] loss: 0.288 Result for DEFAULT_77a44_00002:accuracy: 0.2704date: 2020-10-09_19-57-21done: falseexperiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123hostname: 234fef3cc6b0iterations_since_restore: 2loss: 1.9036258604049683node_ip: 172.17.0.2pid: 1164should_checkpoint: truetime_since_restore: 74.83478355407715time_this_iter_s: 33.44933342933655time_total_s: 74.83478355407715timestamp: 1602273441timesteps_since_restore: 0training_iteration: 2trial_id: 77a44_00002== Status == Memory usage on this node: 7.3/240.1 GiB Using AsyncHyperBand: num_stopped=3 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.9036258604049683 | Iter 1.000: -1.9565512576580049 Resources requested: 14/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (7 RUNNING, 3 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | | 2 | 256 | 128 | 0.000461678 | | | | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.90363 | 0.2704 | 2 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | | 2 | 8 | 8 | 0.000359613 | | | | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.50529 | 0.4484 | 1 | | DEFAULT_77a44_00008 | RUNNING | | 2 | 4 | 8 | 0.000162963 | | | | | DEFAULT_77a44_00009 | RUNNING | | 2 | 32 | 256 | 0.000134494 | | | | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1098)[0m [2, 2000] loss: 1.427 [2m[36m(pid=1145)[0m [1, 14000] loss: 0.227 [2m[36m(pid=1140)[0m [1, 10000] loss: 0.476 [2m[36m(pid=1101)[0m [1, 16000] loss: 0.260 [2m[36m(pid=1126)[0m [1, 16000] loss: 0.223 [2m[36m(pid=1119)[0m [1, 16000] loss: 0.245 [2m[36m(pid=1164)[0m [3, 2000] loss: 1.876 [2m[36m(pid=1098)[0m [2, 4000] loss: 0.711 [2m[36m(pid=1145)[0m [1, 16000] loss: 0.196 [2m[36m(pid=1101)[0m [1, 18000] loss: 0.226 [2m[36m(pid=1126)[0m [1, 18000] loss: 0.194 [2m[36m(pid=1119)[0m [1, 18000] loss: 0.216 [2m[36m(pid=1140)[0m [1, 12000] loss: 0.396 [2m[36m(pid=1098)[0m [2, 6000] loss: 0.462 [2m[36m(pid=1164)[0m [3, 4000] loss: 0.927 [2m[36m(pid=1126)[0m [1, 20000] loss: 0.171 [2m[36m(pid=1101)[0m [1, 20000] loss: 0.200 [2m[36m(pid=1145)[0m [1, 18000] loss: 0.170 [2m[36m(pid=1119)[0m [1, 20000] loss: 0.188 [2m[36m(pid=1098)[0m [2, 8000] loss: 0.345 Result for DEFAULT_77a44_00002:accuracy: 0.3206date: 2020-10-09_19-57-52done: falseexperiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123hostname: 234fef3cc6b0iterations_since_restore: 3loss: 1.9260577551841735node_ip: 172.17.0.2pid: 1164should_checkpoint: truetime_since_restore: 105.59961199760437time_this_iter_s: 30.76482844352722time_total_s: 105.59961199760437timestamp: 1602273472timesteps_since_restore: 0training_iteration: 3trial_id: 77a44_00002== Status == Memory usage on this node: 7.3/240.1 GiB Using AsyncHyperBand: num_stopped=3 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.9036258604049683 | Iter 1.000: -1.9565512576580049 Resources requested: 14/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (7 RUNNING, 3 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | | 2 | 256 | 128 | 0.000461678 | | | | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.92606 | 0.3206 | 3 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | | 2 | 8 | 8 | 0.000359613 | | | | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.50529 | 0.4484 | 1 | | DEFAULT_77a44_00008 | RUNNING | | 2 | 4 | 8 | 0.000162963 | | | | | DEFAULT_77a44_00009 | RUNNING | | 2 | 32 | 256 | 0.000134494 | | | | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1145)[0m [1, 20000] loss: 0.148 [2m[36m(pid=1140)[0m [1, 14000] loss: 0.339 Result for DEFAULT_77a44_00008:accuracy: 0.1883date: 2020-10-09_19-57-56done: trueexperiment_id: 528c728f0abd4dde8df53627aa7b3cc9experiment_tag: 8_batch_size=2,l1=4,l2=8,lr=0.00016296hostname: 234fef3cc6b0iterations_since_restore: 1loss: 1.984449322938919node_ip: 172.17.0.2pid: 1101should_checkpoint: truetime_since_restore: 109.06154918670654time_this_iter_s: 109.06154918670654time_total_s: 109.06154918670654timestamp: 1602273476timesteps_since_restore: 0training_iteration: 1trial_id: 77a44_00008Result for DEFAULT_77a44_00006:accuracy: 0.3722date: 2020-10-09_19-57-56done: falseexperiment_id: 696157fc029f42e781f0779431a5902fexperiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961hostname: 234fef3cc6b0iterations_since_restore: 1loss: 1.6620629720330238node_ip: 172.17.0.2pid: 1126should_checkpoint: truetime_since_restore: 109.24619793891907time_this_iter_s: 109.24619793891907time_total_s: 109.24619793891907timestamp: 1602273476timesteps_since_restore: 0training_iteration: 1trial_id: 77a44_00006Result for DEFAULT_77a44_00009:accuracy: 0.3066date: 2020-10-09_19-57-58done: falseexperiment_id: 448a03d8183b48e4a732b9974760de96experiment_tag: 9_batch_size=2,l1=32,l2=256,lr=0.00013449hostname: 234fef3cc6b0iterations_since_restore: 1loss: 1.8606878761410712node_ip: 172.17.0.2pid: 1119should_checkpoint: truetime_since_restore: 111.55251812934875time_this_iter_s: 111.55251812934875time_total_s: 111.55251812934875timestamp: 1602273478timesteps_since_restore: 0training_iteration: 1trial_id: 77a44_00009== Status == Memory usage on this node: 6.8/240.1 GiB Using AsyncHyperBand: num_stopped=4 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.9036258604049683 | Iter 1.000: -1.919263457131386 Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (6 RUNNING, 4 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | | 2 | 256 | 128 | 0.000461678 | | | | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.92606 | 0.3206 | 3 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.66206 | 0.3722 | 1 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.50529 | 0.4484 | 1 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.86069 | 0.3066 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1098)[0m [2, 10000] loss: 0.275 [2m[36m(pid=1164)[0m [4, 2000] loss: 1.842 [2m[36m(pid=1126)[0m [2, 2000] loss: 1.660 Result for DEFAULT_77a44_00001:accuracy: 0.4374date: 2020-10-09_19-58-05done: falseexperiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168hostname: 234fef3cc6b0iterations_since_restore: 1loss: 1.5289554242562502node_ip: 172.17.0.2pid: 1145should_checkpoint: truetime_since_restore: 118.45757269859314time_this_iter_s: 118.45757269859314time_total_s: 118.45757269859314timestamp: 1602273485timesteps_since_restore: 0training_iteration: 1trial_id: 77a44_00001== Status == Memory usage on this node: 6.8/240.1 GiB Using AsyncHyperBand: num_stopped=4 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.9036258604049683 | Iter 1.000: -1.881975656604767 Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (6 RUNNING, 4 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.52896 | 0.4374 | 1 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.92606 | 0.3206 | 3 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.66206 | 0.3722 | 1 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.50529 | 0.4484 | 1 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.86069 | 0.3066 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1119)[0m [2, 2000] loss: 1.796 Result for DEFAULT_77a44_00007:accuracy: 0.5087date: 2020-10-09_19-58-08done: falseexperiment_id: 1e0a3b1304eb470898956b381db607e6experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029hostname: 234fef3cc6b0iterations_since_restore: 2loss: 1.3934748243197799node_ip: 172.17.0.2pid: 1098should_checkpoint: truetime_since_restore: 121.18621754646301time_this_iter_s: 53.72853231430054time_total_s: 121.18621754646301timestamp: 1602273488timesteps_since_restore: 0training_iteration: 2trial_id: 77a44_00007[2m[36m(pid=1140)[0m [1, 16000] loss: 0.298 [2m[36m(pid=1126)[0m [2, 4000] loss: 0.801 [2m[36m(pid=1164)[0m [4, 4000] loss: 0.914 [2m[36m(pid=1145)[0m [2, 2000] loss: 1.454 [2m[36m(pid=1119)[0m [2, 4000] loss: 0.886 [2m[36m(pid=1098)[0m [3, 2000] loss: 1.292 [2m[36m(pid=1126)[0m [2, 6000] loss: 0.528 [2m[36m(pid=1140)[0m [1, 18000] loss: 0.264 Result for DEFAULT_77a44_00002:accuracy: 0.3437date: 2020-10-09_19-58-23done: falseexperiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123hostname: 234fef3cc6b0iterations_since_restore: 4loss: 1.8035019870758056node_ip: 172.17.0.2pid: 1164should_checkpoint: truetime_since_restore: 136.0801386833191time_this_iter_s: 30.48052668571472time_total_s: 136.0801386833191timestamp: 1602273503timesteps_since_restore: 0training_iteration: 4trial_id: 77a44_00002== Status == Memory usage on this node: 6.8/240.1 GiB Using AsyncHyperBand: num_stopped=4 Bracket: Iter 8.000: None | Iter 4.000: -1.8035019870758056 | Iter 2.000: -1.6485503423623742 | Iter 1.000: -1.881975656604767 Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (6 RUNNING, 4 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.52896 | 0.4374 | 1 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.8035 | 0.3437 | 4 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.66206 | 0.3722 | 1 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.39347 | 0.5087 | 2 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.86069 | 0.3066 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1145)[0m [2, 4000] loss: 0.730 [2m[36m(pid=1119)[0m [2, 6000] loss: 0.570 [2m[36m(pid=1098)[0m [3, 4000] loss: 0.647 [2m[36m(pid=1126)[0m [2, 8000] loss: 0.389 [2m[36m(pid=1119)[0m [2, 8000] loss: 0.417 [2m[36m(pid=1145)[0m [2, 6000] loss: 0.476 [2m[36m(pid=1164)[0m [5, 2000] loss: 1.852 [2m[36m(pid=1098)[0m [3, 6000] loss: 0.428 [2m[36m(pid=1126)[0m [2, 10000] loss: 0.306 [2m[36m(pid=1140)[0m [1, 20000] loss: 0.237 [2m[36m(pid=1119)[0m [2, 10000] loss: 0.326 [2m[36m(pid=1145)[0m [2, 8000] loss: 0.349 [2m[36m(pid=1126)[0m [2, 12000] loss: 0.255 [2m[36m(pid=1164)[0m [5, 4000] loss: 0.934 [2m[36m(pid=1098)[0m [3, 8000] loss: 0.325 Result for DEFAULT_77a44_00004:accuracy: 0.1024date: 2020-10-09_19-58-49done: trueexperiment_id: 2ca91983c1654f39a11db9cdd1e47f10experiment_tag: 4_batch_size=2,l1=256,l2=256,lr=0.064762hostname: 234fef3cc6b0iterations_since_restore: 1loss: 2.346003741002083node_ip: 172.17.0.2pid: 1140should_checkpoint: truetime_since_restore: 161.9359531402588time_this_iter_s: 161.9359531402588time_total_s: 161.9359531402588timestamp: 1602273529timesteps_since_restore: 0training_iteration: 1trial_id: 77a44_00004== Status == Memory usage on this node: 6.8/240.1 GiB Using AsyncHyperBand: num_stopped=5 Bracket: Iter 8.000: None | Iter 4.000: -1.8035019870758056 | Iter 2.000: -1.6485503423623742 | Iter 1.000: -1.919263457131386 Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (6 RUNNING, 4 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.52896 | 0.4374 | 1 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.8035 | 0.3437 | 4 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | 172.17.0.2:1140 | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.66206 | 0.3722 | 1 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.39347 | 0.5087 | 2 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.86069 | 0.3066 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1119)[0m [2, 12000] loss: 0.271 [2m[36m(pid=1145)[0m [2, 10000] loss: 0.276 [2m[36m(pid=1126)[0m [2, 14000] loss: 0.213 Result for DEFAULT_77a44_00002:accuracy: 0.3035date: 2020-10-09_19-58-53done: falseexperiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123hostname: 234fef3cc6b0iterations_since_restore: 5loss: 1.8839821341514587node_ip: 172.17.0.2pid: 1164should_checkpoint: truetime_since_restore: 166.10145020484924time_this_iter_s: 30.02131152153015time_total_s: 166.10145020484924timestamp: 1602273533timesteps_since_restore: 0training_iteration: 5trial_id: 77a44_00002[2m[36m(pid=1098)[0m [3, 10000] loss: 0.254 [2m[36m(pid=1119)[0m [2, 14000] loss: 0.228 [2m[36m(pid=1145)[0m [2, 12000] loss: 0.230 [2m[36m(pid=1126)[0m [2, 16000] loss: 0.187 Result for DEFAULT_77a44_00007:accuracy: 0.5319date: 2020-10-09_19-59-00done: falseexperiment_id: 1e0a3b1304eb470898956b381db607e6experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029hostname: 234fef3cc6b0iterations_since_restore: 3loss: 1.3139552696928383node_ip: 172.17.0.2pid: 1098should_checkpoint: truetime_since_restore: 173.1586651802063time_this_iter_s: 51.972447633743286time_total_s: 173.1586651802063timestamp: 1602273540timesteps_since_restore: 0training_iteration: 3trial_id: 77a44_00007== Status == Memory usage on this node: 6.3/240.1 GiB Using AsyncHyperBand: num_stopped=5 Bracket: Iter 8.000: None | Iter 4.000: -1.8035019870758056 | Iter 2.000: -1.6485503423623742 | Iter 1.000: -1.919263457131386 Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (5 RUNNING, 5 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.52896 | 0.4374 | 1 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.88398 | 0.3035 | 5 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.66206 | 0.3722 | 1 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.31396 | 0.5319 | 3 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.86069 | 0.3066 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1164)[0m [6, 2000] loss: 1.907 [2m[36m(pid=1119)[0m [2, 16000] loss: 0.198 [2m[36m(pid=1145)[0m [2, 14000] loss: 0.192 [2m[36m(pid=1126)[0m [2, 18000] loss: 0.166 [2m[36m(pid=1098)[0m [4, 2000] loss: 1.200 [2m[36m(pid=1119)[0m [2, 18000] loss: 0.177 [2m[36m(pid=1164)[0m [6, 4000] loss: 0.960 [2m[36m(pid=1126)[0m [2, 20000] loss: 0.148 [2m[36m(pid=1145)[0m [2, 16000] loss: 0.164 [2m[36m(pid=1098)[0m [4, 4000] loss: 0.599 [2m[36m(pid=1119)[0m [2, 20000] loss: 0.152 Result for DEFAULT_77a44_00002:accuracy: 0.2862date: 2020-10-09_19-59-22done: falseexperiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123hostname: 234fef3cc6b0iterations_since_restore: 6loss: 1.9193087907791138node_ip: 172.17.0.2pid: 1164should_checkpoint: truetime_since_restore: 195.79263925552368time_this_iter_s: 29.69118905067444time_total_s: 195.79263925552368timestamp: 1602273562timesteps_since_restore: 0training_iteration: 6trial_id: 77a44_00002== Status == Memory usage on this node: 6.3/240.1 GiB Using AsyncHyperBand: num_stopped=5 Bracket: Iter 8.000: None | Iter 4.000: -1.8035019870758056 | Iter 2.000: -1.6485503423623742 | Iter 1.000: -1.919263457131386 Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (5 RUNNING, 5 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.52896 | 0.4374 | 1 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.91931 | 0.2862 | 6 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.66206 | 0.3722 | 1 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.31396 | 0.5319 | 3 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.86069 | 0.3066 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1145)[0m [2, 18000] loss: 0.147 Result for DEFAULT_77a44_00006:accuracy: 0.4589date: 2020-10-09_19-59-27done: falseexperiment_id: 696157fc029f42e781f0779431a5902fexperiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961hostname: 234fef3cc6b0iterations_since_restore: 2loss: 1.448237135411054node_ip: 172.17.0.2pid: 1126should_checkpoint: truetime_since_restore: 199.99908256530762time_this_iter_s: 90.75288462638855time_total_s: 199.99908256530762timestamp: 1602273567timesteps_since_restore: 0training_iteration: 2trial_id: 77a44_00006[2m[36m(pid=1098)[0m [4, 6000] loss: 0.403 Result for DEFAULT_77a44_00009:accuracy: 0.4358date: 2020-10-09_19-59-33done: trueexperiment_id: 448a03d8183b48e4a732b9974760de96experiment_tag: 9_batch_size=2,l1=32,l2=256,lr=0.00013449hostname: 234fef3cc6b0iterations_since_restore: 2loss: 1.5461469007849693node_ip: 172.17.0.2pid: 1119should_checkpoint: truetime_since_restore: 206.13924598693848time_this_iter_s: 94.58672785758972time_total_s: 206.13924598693848timestamp: 1602273573timesteps_since_restore: 0training_iteration: 2trial_id: 77a44_00009== Status == Memory usage on this node: 6.3/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: None | Iter 4.000: -1.8035019870758056 | Iter 2.000: -1.4971920180980116 | Iter 1.000: -1.919263457131386 Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (5 RUNNING, 5 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.52896 | 0.4374 | 1 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.91931 | 0.2862 | 6 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.44824 | 0.4589 | 2 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.31396 | 0.5319 | 3 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1145)[0m [2, 20000] loss: 0.130 [2m[36m(pid=1164)[0m [7, 2000] loss: 1.967 [2m[36m(pid=1126)[0m [3, 2000] loss: 1.454 [2m[36m(pid=1098)[0m [4, 8000] loss: 0.310 [2m[36m(pid=1126)[0m [3, 4000] loss: 0.715 [2m[36m(pid=1164)[0m [7, 4000] loss: 0.997 [2m[36m(pid=1098)[0m [4, 10000] loss: 0.248 Result for DEFAULT_77a44_00001:accuracy: 0.5459date: 2020-10-09_19-59-44done: falseexperiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168hostname: 234fef3cc6b0iterations_since_restore: 2loss: 1.2801105223743245node_ip: 172.17.0.2pid: 1145should_checkpoint: truetime_since_restore: 217.948983669281time_this_iter_s: 99.49141097068787time_total_s: 217.948983669281timestamp: 1602273584timesteps_since_restore: 0training_iteration: 2trial_id: 77a44_00001== Status == Memory usage on this node: 5.8/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: None | Iter 4.000: -1.8035019870758056 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.28011 | 0.5459 | 2 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.91931 | 0.2862 | 6 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.44824 | 0.4589 | 2 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.31396 | 0.5319 | 3 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1126)[0m [3, 6000] loss: 0.488 Result for DEFAULT_77a44_00007:accuracy: 0.5309date: 2020-10-09_19-59-50done: falseexperiment_id: 1e0a3b1304eb470898956b381db607e6experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029hostname: 234fef3cc6b0iterations_since_restore: 4loss: 1.3358730784237385node_ip: 172.17.0.2pid: 1098should_checkpoint: truetime_since_restore: 223.8010766506195time_this_iter_s: 50.64241147041321time_total_s: 223.8010766506195timestamp: 1602273590timesteps_since_restore: 0training_iteration: 4trial_id: 77a44_00007== Status == Memory usage on this node: 5.8/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: None | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.28011 | 0.5459 | 2 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.91931 | 0.2862 | 6 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.44824 | 0.4589 | 2 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.33587 | 0.5309 | 4 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+Result for DEFAULT_77a44_00002:accuracy: 0.2505date: 2020-10-09_19-59-52done: falseexperiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123hostname: 234fef3cc6b0iterations_since_restore: 7loss: 2.00418664560318node_ip: 172.17.0.2pid: 1164should_checkpoint: truetime_since_restore: 225.23884892463684time_this_iter_s: 29.44620966911316time_total_s: 225.23884892463684timestamp: 1602273592timesteps_since_restore: 0training_iteration: 7trial_id: 77a44_00002[2m[36m(pid=1145)[0m [3, 2000] loss: 1.219 [2m[36m(pid=1126)[0m [3, 8000] loss: 0.356 [2m[36m(pid=1098)[0m [5, 2000] loss: 1.144 [2m[36m(pid=1145)[0m [3, 4000] loss: 0.632 [2m[36m(pid=1164)[0m [8, 2000] loss: 1.980 [2m[36m(pid=1126)[0m [3, 10000] loss: 0.283 [2m[36m(pid=1098)[0m [5, 4000] loss: 0.566 [2m[36m(pid=1145)[0m [3, 6000] loss: 0.410 [2m[36m(pid=1164)[0m [8, 4000] loss: 1.014 [2m[36m(pid=1126)[0m [3, 12000] loss: 0.236 [2m[36m(pid=1098)[0m [5, 6000] loss: 0.390 [2m[36m(pid=1145)[0m [3, 8000] loss: 0.304 [2m[36m(pid=1126)[0m [3, 14000] loss: 0.198 Result for DEFAULT_77a44_00002:accuracy: 0.2253date: 2020-10-09_20-00-21done: falseexperiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123hostname: 234fef3cc6b0iterations_since_restore: 8loss: 2.1314156931877135node_ip: 172.17.0.2pid: 1164should_checkpoint: truetime_since_restore: 254.41000890731812time_this_iter_s: 29.171159982681274time_total_s: 254.41000890731812timestamp: 1602273621timesteps_since_restore: 0training_iteration: 8trial_id: 77a44_00002== Status == Memory usage on this node: 5.7/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.28011 | 0.5459 | 2 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 2.13142 | 0.2253 | 8 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.44824 | 0.4589 | 2 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.33587 | 0.5309 | 4 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1098)[0m [5, 8000] loss: 0.297 [2m[36m(pid=1145)[0m [3, 10000] loss: 0.245 [2m[36m(pid=1126)[0m [3, 16000] loss: 0.173 [2m[36m(pid=1164)[0m [9, 2000] loss: 2.112 [2m[36m(pid=1098)[0m [5, 10000] loss: 0.235 [2m[36m(pid=1145)[0m [3, 12000] loss: 0.203 [2m[36m(pid=1126)[0m [3, 18000] loss: 0.154 Result for DEFAULT_77a44_00007:accuracy: 0.5628date: 2020-10-09_20-00-40done: falseexperiment_id: 1e0a3b1304eb470898956b381db607e6experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029hostname: 234fef3cc6b0iterations_since_restore: 5loss: 1.2729537689715624node_ip: 172.17.0.2pid: 1098should_checkpoint: truetime_since_restore: 273.7186484336853time_this_iter_s: 49.917571783065796time_total_s: 273.7186484336853timestamp: 1602273640timesteps_since_restore: 0training_iteration: 5trial_id: 77a44_00007== Status == Memory usage on this node: 5.7/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.28011 | 0.5459 | 2 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 2.13142 | 0.2253 | 8 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.44824 | 0.4589 | 2 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.27295 | 0.5628 | 5 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1164)[0m [9, 4000] loss: 1.053 [2m[36m(pid=1126)[0m [3, 20000] loss: 0.141 [2m[36m(pid=1145)[0m [3, 14000] loss: 0.170 [2m[36m(pid=1098)[0m [6, 2000] loss: 1.095 Result for DEFAULT_77a44_00002:accuracy: 0.17date: 2020-10-09_20-00-51done: falseexperiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123hostname: 234fef3cc6b0iterations_since_restore: 9loss: 2.1584741218566896node_ip: 172.17.0.2pid: 1164should_checkpoint: truetime_since_restore: 284.08941316604614time_this_iter_s: 29.679404258728027time_total_s: 284.08941316604614timestamp: 1602273651timesteps_since_restore: 0training_iteration: 9trial_id: 77a44_00002== Status == Memory usage on this node: 5.7/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.28011 | 0.5459 | 2 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 2.15847 | 0.17 | 9 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.44824 | 0.4589 | 2 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.27295 | 0.5628 | 5 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1145)[0m [3, 16000] loss: 0.149 Result for DEFAULT_77a44_00006:accuracy: 0.4727date: 2020-10-09_20-00-55done: falseexperiment_id: 696157fc029f42e781f0779431a5902fexperiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961hostname: 234fef3cc6b0iterations_since_restore: 3loss: 1.4226891365654766node_ip: 172.17.0.2pid: 1126should_checkpoint: truetime_since_restore: 287.9995017051697time_this_iter_s: 88.00041913986206time_total_s: 287.9995017051697timestamp: 1602273655timesteps_since_restore: 0training_iteration: 3trial_id: 77a44_00006[2m[36m(pid=1098)[0m [6, 4000] loss: 0.556 [2m[36m(pid=1145)[0m [3, 18000] loss: 0.136 [2m[36m(pid=1164)[0m [10, 2000] loss: 2.212 [2m[36m(pid=1126)[0m [4, 2000] loss: 1.392 [2m[36m(pid=1098)[0m [6, 6000] loss: 0.376 [2m[36m(pid=1145)[0m [3, 20000] loss: 0.114 [2m[36m(pid=1126)[0m [4, 4000] loss: 0.679 [2m[36m(pid=1164)[0m [10, 4000] loss: 1.133 [2m[36m(pid=1098)[0m [6, 8000] loss: 0.279 [2m[36m(pid=1126)[0m [4, 6000] loss: 0.458 Result for DEFAULT_77a44_00001:accuracy: 0.5798date: 2020-10-09_20-01-21done: falseexperiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168hostname: 234fef3cc6b0iterations_since_restore: 3loss: 1.1820625860116911node_ip: 172.17.0.2pid: 1145should_checkpoint: truetime_since_restore: 314.0342721939087time_this_iter_s: 96.08528852462769time_total_s: 314.0342721939087timestamp: 1602273681timesteps_since_restore: 0training_iteration: 3trial_id: 77a44_00001== Status == Memory usage on this node: 5.7/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.18206 | 0.5798 | 3 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 2.15847 | 0.17 | 9 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.42269 | 0.4727 | 3 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.27295 | 0.5628 | 5 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+Result for DEFAULT_77a44_00002:accuracy: 0.1292date: 2020-10-09_20-01-21done: trueexperiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123hostname: 234fef3cc6b0iterations_since_restore: 10loss: 2.2377114813804626node_ip: 172.17.0.2pid: 1164should_checkpoint: truetime_since_restore: 314.6153542995453time_this_iter_s: 30.525941133499146time_total_s: 314.6153542995453timestamp: 1602273681timesteps_since_restore: 0training_iteration: 10trial_id: 77a44_00002[2m[36m(pid=1098)[0m [6, 10000] loss: 0.232 [2m[36m(pid=1126)[0m [4, 8000] loss: 0.342 [2m[36m(pid=1145)[0m [4, 2000] loss: 1.100 Result for DEFAULT_77a44_00007:accuracy: 0.5459date: 2020-10-09_20-01-30done: falseexperiment_id: 1e0a3b1304eb470898956b381db607e6experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029hostname: 234fef3cc6b0iterations_since_restore: 6loss: 1.3732997598737477node_ip: 172.17.0.2pid: 1098should_checkpoint: truetime_since_restore: 323.68818259239197time_this_iter_s: 49.969534158706665time_total_s: 323.68818259239197timestamp: 1602273690timesteps_since_restore: 0training_iteration: 6trial_id: 77a44_00007== Status == Memory usage on this node: 5.2/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.18206 | 0.5798 | 3 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.42269 | 0.4727 | 3 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.3733 | 0.5459 | 6 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1126)[0m [4, 10000] loss: 0.271 [2m[36m(pid=1145)[0m [4, 4000] loss: 0.556 [2m[36m(pid=1098)[0m [7, 2000] loss: 1.034 [2m[36m(pid=1126)[0m [4, 12000] loss: 0.229 [2m[36m(pid=1145)[0m [4, 6000] loss: 0.364 [2m[36m(pid=1126)[0m [4, 14000] loss: 0.196 [2m[36m(pid=1098)[0m [7, 4000] loss: 0.541 [2m[36m(pid=1145)[0m [4, 8000] loss: 0.274 [2m[36m(pid=1126)[0m [4, 16000] loss: 0.169 [2m[36m(pid=1098)[0m [7, 6000] loss: 0.368 [2m[36m(pid=1145)[0m [4, 10000] loss: 0.215 [2m[36m(pid=1126)[0m [4, 18000] loss: 0.150 [2m[36m(pid=1098)[0m [7, 8000] loss: 0.273 [2m[36m(pid=1126)[0m [4, 20000] loss: 0.135 [2m[36m(pid=1145)[0m [4, 12000] loss: 0.182 [2m[36m(pid=1098)[0m [7, 10000] loss: 0.217 [2m[36m(pid=1145)[0m [4, 14000] loss: 0.158 Result for DEFAULT_77a44_00007:accuracy: 0.576date: 2020-10-09_20-02-19done: falseexperiment_id: 1e0a3b1304eb470898956b381db607e6experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029hostname: 234fef3cc6b0iterations_since_restore: 7loss: 1.24756854121387node_ip: 172.17.0.2pid: 1098should_checkpoint: truetime_since_restore: 372.3224792480469time_this_iter_s: 48.63429665565491time_total_s: 372.3224792480469timestamp: 1602273739timesteps_since_restore: 0training_iteration: 7trial_id: 77a44_00007== Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.18206 | 0.5798 | 3 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.42269 | 0.4727 | 3 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.24757 | 0.576 | 7 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+Result for DEFAULT_77a44_00006:accuracy: 0.4961date: 2020-10-09_20-02-20done: falseexperiment_id: 696157fc029f42e781f0779431a5902fexperiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961hostname: 234fef3cc6b0iterations_since_restore: 4loss: 1.3667119354642927node_ip: 172.17.0.2pid: 1126should_checkpoint: truetime_since_restore: 373.32873916625977time_this_iter_s: 85.32923746109009time_total_s: 373.32873916625977timestamp: 1602273740timesteps_since_restore: 0training_iteration: 4trial_id: 77a44_00006[2m[36m(pid=1145)[0m [4, 16000] loss: 0.134 [2m[36m(pid=1126)[0m [5, 2000] loss: 1.317 [2m[36m(pid=1098)[0m [8, 2000] loss: 1.013 [2m[36m(pid=1145)[0m [4, 18000] loss: 0.120 [2m[36m(pid=1126)[0m [5, 4000] loss: 0.660 [2m[36m(pid=1098)[0m [8, 4000] loss: 0.521 [2m[36m(pid=1126)[0m [5, 6000] loss: 0.438 [2m[36m(pid=1145)[0m [4, 20000] loss: 0.108 [2m[36m(pid=1098)[0m [8, 6000] loss: 0.350 [2m[36m(pid=1126)[0m [5, 8000] loss: 0.331 [2m[36m(pid=1098)[0m [8, 8000] loss: 0.267 Result for DEFAULT_77a44_00001:accuracy: 0.6009date: 2020-10-09_20-02-54done: falseexperiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168hostname: 234fef3cc6b0iterations_since_restore: 4loss: 1.1593985119301593node_ip: 172.17.0.2pid: 1145should_checkpoint: truetime_since_restore: 407.62501096725464time_this_iter_s: 93.59073877334595time_total_s: 407.62501096725464timestamp: 1602273774timesteps_since_restore: 0training_iteration: 4trial_id: 77a44_00001== Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.1594 | 0.6009 | 4 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.36671 | 0.4961 | 4 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.24757 | 0.576 | 7 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1126)[0m [5, 10000] loss: 0.271 [2m[36m(pid=1098)[0m [8, 10000] loss: 0.218 [2m[36m(pid=1145)[0m [5, 2000] loss: 0.967 [2m[36m(pid=1126)[0m [5, 12000] loss: 0.221 Result for DEFAULT_77a44_00007:accuracy: 0.5664date: 2020-10-09_20-03-08done: falseexperiment_id: 1e0a3b1304eb470898956b381db607e6experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029hostname: 234fef3cc6b0iterations_since_restore: 8loss: 1.3161735702279955node_ip: 172.17.0.2pid: 1098should_checkpoint: truetime_since_restore: 421.1367325782776time_this_iter_s: 48.81425333023071time_total_s: 421.1367325782776timestamp: 1602273788timesteps_since_restore: 0training_iteration: 8trial_id: 77a44_00007== Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.1594 | 0.6009 | 4 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.36671 | 0.4961 | 4 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.31617 | 0.5664 | 8 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1145)[0m [5, 4000] loss: 0.496 [2m[36m(pid=1126)[0m [5, 14000] loss: 0.186 [2m[36m(pid=1098)[0m [9, 2000] loss: 0.986 [2m[36m(pid=1145)[0m [5, 6000] loss: 0.332 [2m[36m(pid=1126)[0m [5, 16000] loss: 0.164 [2m[36m(pid=1098)[0m [9, 4000] loss: 0.503 [2m[36m(pid=1126)[0m [5, 18000] loss: 0.144 [2m[36m(pid=1145)[0m [5, 8000] loss: 0.243 [2m[36m(pid=1098)[0m [9, 6000] loss: 0.342 [2m[36m(pid=1126)[0m [5, 20000] loss: 0.129 [2m[36m(pid=1145)[0m [5, 10000] loss: 0.204 [2m[36m(pid=1098)[0m [9, 8000] loss: 0.266 [2m[36m(pid=1145)[0m [5, 12000] loss: 0.167 Result for DEFAULT_77a44_00006:accuracy: 0.5285date: 2020-10-09_20-03-45done: falseexperiment_id: 696157fc029f42e781f0779431a5902fexperiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961hostname: 234fef3cc6b0iterations_since_restore: 5loss: 1.2945664445526899node_ip: 172.17.0.2pid: 1126should_checkpoint: truetime_since_restore: 458.353075504303time_this_iter_s: 85.02433633804321time_total_s: 458.353075504303timestamp: 1602273825timesteps_since_restore: 0training_iteration: 5trial_id: 77a44_00006== Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.1594 | 0.6009 | 4 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.29457 | 0.5285 | 5 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.31617 | 0.5664 | 8 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1098)[0m [9, 10000] loss: 0.213 [2m[36m(pid=1145)[0m [5, 14000] loss: 0.144 [2m[36m(pid=1126)[0m [6, 2000] loss: 1.270 Result for DEFAULT_77a44_00007:accuracy: 0.5803date: 2020-10-09_20-03-56done: falseexperiment_id: 1e0a3b1304eb470898956b381db607e6experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029hostname: 234fef3cc6b0iterations_since_restore: 9loss: 1.3147958470012993node_ip: 172.17.0.2pid: 1098should_checkpoint: truetime_since_restore: 469.72292470932007time_this_iter_s: 48.58619213104248time_total_s: 469.72292470932007timestamp: 1602273836timesteps_since_restore: 0training_iteration: 9trial_id: 77a44_00007== Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.1594 | 0.6009 | 4 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.29457 | 0.5285 | 5 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.3148 | 0.5803 | 9 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1126)[0m [6, 4000] loss: 0.624 [2m[36m(pid=1145)[0m [5, 16000] loss: 0.127 [2m[36m(pid=1098)[0m [10, 2000] loss: 0.949 [2m[36m(pid=1126)[0m [6, 6000] loss: 0.430 [2m[36m(pid=1145)[0m [5, 18000] loss: 0.112 [2m[36m(pid=1098)[0m [10, 4000] loss: 0.502 [2m[36m(pid=1126)[0m [6, 8000] loss: 0.323 [2m[36m(pid=1145)[0m [5, 20000] loss: 0.099 [2m[36m(pid=1098)[0m [10, 6000] loss: 0.346 [2m[36m(pid=1126)[0m [6, 10000] loss: 0.258 Result for DEFAULT_77a44_00001:accuracy: 0.6221date: 2020-10-09_20-04-28done: falseexperiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168hostname: 234fef3cc6b0iterations_since_restore: 5loss: 1.0875221006242093node_ip: 172.17.0.2pid: 1145should_checkpoint: truetime_since_restore: 501.5412850379944time_this_iter_s: 93.91627407073975time_total_s: 501.5412850379944timestamp: 1602273868timesteps_since_restore: 0training_iteration: 5trial_id: 77a44_00001== Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.08752 | 0.6221 | 5 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.29457 | 0.5285 | 5 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.3148 | 0.5803 | 9 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1126)[0m [6, 12000] loss: 0.211 [2m[36m(pid=1098)[0m [10, 8000] loss: 0.253 [2m[36m(pid=1145)[0m [6, 2000] loss: 0.827 [2m[36m(pid=1126)[0m [6, 14000] loss: 0.177 [2m[36m(pid=1098)[0m [10, 10000] loss: 0.210 [2m[36m(pid=1145)[0m [6, 4000] loss: 0.448 [2m[36m(pid=1126)[0m [6, 16000] loss: 0.160 Result for DEFAULT_77a44_00007:accuracy: 0.5713date: 2020-10-09_20-04-45done: trueexperiment_id: 1e0a3b1304eb470898956b381db607e6experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029hostname: 234fef3cc6b0iterations_since_restore: 10loss: 1.2877456236266531node_ip: 172.17.0.2pid: 1098should_checkpoint: truetime_since_restore: 518.6297419071198time_this_iter_s: 48.90681719779968time_total_s: 518.6297419071198timestamp: 1602273885timesteps_since_restore: 0training_iteration: 10trial_id: 77a44_00007== Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.08752 | 0.6221 | 5 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.29457 | 0.5285 | 5 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1126)[0m [6, 18000] loss: 0.143 [2m[36m(pid=1145)[0m [6, 6000] loss: 0.297 [2m[36m(pid=1126)[0m [6, 20000] loss: 0.127 [2m[36m(pid=1145)[0m [6, 8000] loss: 0.235 [2m[36m(pid=1145)[0m [6, 10000] loss: 0.184 Result for DEFAULT_77a44_00006:accuracy: 0.5484date: 2020-10-09_20-05-10done: falseexperiment_id: 696157fc029f42e781f0779431a5902fexperiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961hostname: 234fef3cc6b0iterations_since_restore: 6loss: 1.2631257870631292node_ip: 172.17.0.2pid: 1126should_checkpoint: truetime_since_restore: 543.5542225837708time_this_iter_s: 85.20114707946777time_total_s: 543.5542225837708timestamp: 1602273910timesteps_since_restore: 0training_iteration: 6trial_id: 77a44_00006== Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.08752 | 0.6221 | 5 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.26313 | 0.5484 | 6 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1145)[0m [6, 12000] loss: 0.157 [2m[36m(pid=1126)[0m [7, 2000] loss: 1.256 [2m[36m(pid=1126)[0m [7, 4000] loss: 0.631 [2m[36m(pid=1145)[0m [6, 14000] loss: 0.131 [2m[36m(pid=1126)[0m [7, 6000] loss: 0.407 [2m[36m(pid=1145)[0m [6, 16000] loss: 0.121 [2m[36m(pid=1126)[0m [7, 8000] loss: 0.311 [2m[36m(pid=1145)[0m [6, 18000] loss: 0.101 [2m[36m(pid=1126)[0m [7, 10000] loss: 0.243 [2m[36m(pid=1145)[0m [6, 20000] loss: 0.094 [2m[36m(pid=1126)[0m [7, 12000] loss: 0.203 Result for DEFAULT_77a44_00001:accuracy: 0.61date: 2020-10-09_20-06-01done: falseexperiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168hostname: 234fef3cc6b0iterations_since_restore: 6loss: 1.1592615005358762node_ip: 172.17.0.2pid: 1145should_checkpoint: truetime_since_restore: 594.7056727409363time_this_iter_s: 93.1643877029419time_total_s: 594.7056727409363timestamp: 1602273961timesteps_since_restore: 0training_iteration: 6trial_id: 77a44_00001== Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.15926 | 0.61 | 6 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.26313 | 0.5484 | 6 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1126)[0m [7, 14000] loss: 0.176 [2m[36m(pid=1126)[0m [7, 16000] loss: 0.156 [2m[36m(pid=1145)[0m [7, 2000] loss: 0.802 [2m[36m(pid=1126)[0m [7, 18000] loss: 0.141 [2m[36m(pid=1145)[0m [7, 4000] loss: 0.393 [2m[36m(pid=1126)[0m [7, 20000] loss: 0.123 [2m[36m(pid=1145)[0m [7, 6000] loss: 0.282 Result for DEFAULT_77a44_00006:accuracy: 0.5369date: 2020-10-09_20-06-34done: falseexperiment_id: 696157fc029f42e781f0779431a5902fexperiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961hostname: 234fef3cc6b0iterations_since_restore: 7loss: 1.2813393794611097node_ip: 172.17.0.2pid: 1126should_checkpoint: truetime_since_restore: 627.4627993106842time_this_iter_s: 83.90857672691345time_total_s: 627.4627993106842timestamp: 1602273994timesteps_since_restore: 0training_iteration: 7trial_id: 77a44_00006== Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.15926 | 0.61 | 6 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.28134 | 0.5369 | 7 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1145)[0m [7, 8000] loss: 0.206 [2m[36m(pid=1126)[0m [8, 2000] loss: 1.200 [2m[36m(pid=1145)[0m [7, 10000] loss: 0.171 [2m[36m(pid=1126)[0m [8, 4000] loss: 0.602 [2m[36m(pid=1145)[0m [7, 12000] loss: 0.138 [2m[36m(pid=1126)[0m [8, 6000] loss: 0.407 [2m[36m(pid=1145)[0m [7, 14000] loss: 0.121 [2m[36m(pid=1126)[0m [8, 8000] loss: 0.296 [2m[36m(pid=1145)[0m [7, 16000] loss: 0.109 [2m[36m(pid=1126)[0m [8, 10000] loss: 0.247 [2m[36m(pid=1145)[0m [7, 18000] loss: 0.098 [2m[36m(pid=1126)[0m [8, 12000] loss: 0.205 [2m[36m(pid=1145)[0m [7, 20000] loss: 0.086 [2m[36m(pid=1126)[0m [8, 14000] loss: 0.175 [2m[36m(pid=1126)[0m [8, 16000] loss: 0.152 Result for DEFAULT_77a44_00001:accuracy: 0.6115date: 2020-10-09_20-07-35done: falseexperiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168hostname: 234fef3cc6b0iterations_since_restore: 7loss: 1.1567747425308288node_ip: 172.17.0.2pid: 1145should_checkpoint: truetime_since_restore: 687.9970579147339time_this_iter_s: 93.29138517379761time_total_s: 687.9970579147339timestamp: 1602274055timesteps_since_restore: 0training_iteration: 7trial_id: 77a44_00001== Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.15677 | 0.6115 | 7 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.28134 | 0.5369 | 7 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1126)[0m [8, 18000] loss: 0.136 [2m[36m(pid=1145)[0m [8, 2000] loss: 0.721 [2m[36m(pid=1126)[0m [8, 20000] loss: 0.122 [2m[36m(pid=1145)[0m [8, 4000] loss: 0.373 Result for DEFAULT_77a44_00006:accuracy: 0.5222date: 2020-10-09_20-07-58done: falseexperiment_id: 696157fc029f42e781f0779431a5902fexperiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961hostname: 234fef3cc6b0iterations_since_restore: 8loss: 1.3225798389766366node_ip: 172.17.0.2pid: 1126should_checkpoint: truetime_since_restore: 711.2751452922821time_this_iter_s: 83.8123459815979time_total_s: 711.2751452922821timestamp: 1602274078timesteps_since_restore: 0training_iteration: 8trial_id: 77a44_00006== Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.3225798389766366 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.15677 | 0.6115 | 7 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.32258 | 0.5222 | 8 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1145)[0m [8, 6000] loss: 0.246 [2m[36m(pid=1126)[0m [9, 2000] loss: 1.150 [2m[36m(pid=1145)[0m [8, 8000] loss: 0.191 [2m[36m(pid=1126)[0m [9, 4000] loss: 0.587 [2m[36m(pid=1145)[0m [8, 10000] loss: 0.153 [2m[36m(pid=1126)[0m [9, 6000] loss: 0.383 [2m[36m(pid=1145)[0m [8, 12000] loss: 0.128 [2m[36m(pid=1126)[0m [9, 8000] loss: 0.297 [2m[36m(pid=1145)[0m [8, 14000] loss: 0.116 [2m[36m(pid=1126)[0m [9, 10000] loss: 0.239 [2m[36m(pid=1145)[0m [8, 16000] loss: 0.098 [2m[36m(pid=1126)[0m [9, 12000] loss: 0.200 [2m[36m(pid=1145)[0m [8, 18000] loss: 0.093 [2m[36m(pid=1126)[0m [9, 14000] loss: 0.173 [2m[36m(pid=1126)[0m [9, 16000] loss: 0.155 [2m[36m(pid=1145)[0m [8, 20000] loss: 0.083 [2m[36m(pid=1126)[0m [9, 18000] loss: 0.135 Result for DEFAULT_77a44_00001:accuracy: 0.6234date: 2020-10-09_20-09-07done: falseexperiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168hostname: 234fef3cc6b0iterations_since_restore: 8loss: 1.1474703996328957node_ip: 172.17.0.2pid: 1145should_checkpoint: truetime_since_restore: 780.5215935707092time_this_iter_s: 92.52453565597534time_total_s: 780.5215935707092timestamp: 1602274147timesteps_since_restore: 0training_iteration: 8trial_id: 77a44_00001== Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.3193767046023162 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.14747 | 0.6234 | 8 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.32258 | 0.5222 | 8 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1126)[0m [9, 20000] loss: 0.122 [2m[36m(pid=1145)[0m [9, 2000] loss: 0.652 Result for DEFAULT_77a44_00006:accuracy: 0.5382date: 2020-10-09_20-09-21done: falseexperiment_id: 696157fc029f42e781f0779431a5902fexperiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961hostname: 234fef3cc6b0iterations_since_restore: 9loss: 1.2859820882213302node_ip: 172.17.0.2pid: 1126should_checkpoint: truetime_since_restore: 794.5377962589264time_this_iter_s: 83.26265096664429time_total_s: 794.5377962589264timestamp: 1602274161timesteps_since_restore: 0training_iteration: 9trial_id: 77a44_00006== Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.3193767046023162 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.14747 | 0.6234 | 8 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.28598 | 0.5382 | 9 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1145)[0m [9, 4000] loss: 0.336 [2m[36m(pid=1126)[0m [10, 2000] loss: 1.142 [2m[36m(pid=1145)[0m [9, 6000] loss: 0.233 [2m[36m(pid=1126)[0m [10, 4000] loss: 0.570 [2m[36m(pid=1145)[0m [9, 8000] loss: 0.178 [2m[36m(pid=1126)[0m [10, 6000] loss: 0.395 [2m[36m(pid=1145)[0m [9, 10000] loss: 0.143 [2m[36m(pid=1126)[0m [10, 8000] loss: 0.299 [2m[36m(pid=1145)[0m [9, 12000] loss: 0.118 [2m[36m(pid=1126)[0m [10, 10000] loss: 0.228 [2m[36m(pid=1145)[0m [9, 14000] loss: 0.104 [2m[36m(pid=1126)[0m [10, 12000] loss: 0.196 [2m[36m(pid=1145)[0m [9, 16000] loss: 0.093 [2m[36m(pid=1126)[0m [10, 14000] loss: 0.169 [2m[36m(pid=1126)[0m [10, 16000] loss: 0.151 [2m[36m(pid=1145)[0m [9, 18000] loss: 0.083 [2m[36m(pid=1126)[0m [10, 18000] loss: 0.132 [2m[36m(pid=1145)[0m [9, 20000] loss: 0.078 [2m[36m(pid=1126)[0m [10, 20000] loss: 0.118 Result for DEFAULT_77a44_00001:accuracy: 0.6124date: 2020-10-09_20-10-40done: falseexperiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168hostname: 234fef3cc6b0iterations_since_restore: 9loss: 1.2186276267750566node_ip: 172.17.0.2pid: 1145should_checkpoint: truetime_since_restore: 873.050055027008time_this_iter_s: 92.52846145629883time_total_s: 873.050055027008timestamp: 1602274240timesteps_since_restore: 0training_iteration: 9trial_id: 77a44_00001== Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.3193767046023162 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.21863 | 0.6124 | 9 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.28598 | 0.5382 | 9 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+Result for DEFAULT_77a44_00006:accuracy: 0.5454date: 2020-10-09_20-10-45done: trueexperiment_id: 696157fc029f42e781f0779431a5902fexperiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961hostname: 234fef3cc6b0iterations_since_restore: 10loss: 1.290222985061258node_ip: 172.17.0.2pid: 1126should_checkpoint: truetime_since_restore: 878.2885060310364time_this_iter_s: 83.75070977210999time_total_s: 878.2885060310364timestamp: 1602274245timesteps_since_restore: 0training_iteration: 10trial_id: 77a44_00006== Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=9 Bracket: Iter 8.000: -1.3193767046023162 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.21863 | 0.6124 | 9 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.29022 | 0.5454 | 10 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+[2m[36m(pid=1145)[0m [10, 2000] loss: 0.564 [2m[36m(pid=1145)[0m [10, 4000] loss: 0.304 [2m[36m(pid=1145)[0m [10, 6000] loss: 0.210 [2m[36m(pid=1145)[0m [10, 8000] loss: 0.165 [2m[36m(pid=1145)[0m [10, 10000] loss: 0.132 [2m[36m(pid=1145)[0m [10, 12000] loss: 0.107 [2m[36m(pid=1145)[0m [10, 14000] loss: 0.096 [2m[36m(pid=1145)[0m [10, 16000] loss: 0.089 [2m[36m(pid=1145)[0m [10, 18000] loss: 0.082 [2m[36m(pid=1145)[0m [10, 20000] loss: 0.071 Result for DEFAULT_77a44_00001:accuracy: 0.6152date: 2020-10-09_20-12-10done: trueexperiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168hostname: 234fef3cc6b0iterations_since_restore: 10loss: 1.3026221742785826node_ip: 172.17.0.2pid: 1145should_checkpoint: truetime_since_restore: 963.3746852874756time_this_iter_s: 90.32463026046753time_total_s: 963.3746852874756timestamp: 1602274330timesteps_since_restore: 0training_iteration: 10trial_id: 77a44_00001== Status == Memory usage on this node: 4.1/240.1 GiB Using AsyncHyperBand: num_stopped=10 Bracket: Iter 8.000: -1.3193767046023162 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (1 RUNNING, 9 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.30262 | 0.6152 | 10 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | TERMINATED | | 2 | 8 | 8 | 0.000359613 | 1.29022 | 0.5454 | 10 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+== Status == Memory usage on this node: 4.1/240.1 GiB Using AsyncHyperBand: num_stopped=10 Bracket: Iter 8.000: -1.3193767046023162 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 0/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (10 TERMINATED) +---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | TERMINATED | | 2 | 256 | 128 | 0.000461678 | 1.30262 | 0.6152 | 10 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | TERMINATED | | 2 | 8 | 8 | 0.000359613 | 1.29022 | 0.5454 | 10 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------+Best trial config: {'l1': 128, 'l2': 16, 'lr': 0.0020289809406172947, 'batch_size': 4} Best trial final validation loss: 1.2877456236266531 Best trial final validation accuracy:
如果运行该代码,示例输出可能如下所示:
为了避免浪费资源,大多数试验已及早停止。 性能最好的试验达到了大约58%的验证精度,这可以在测试集上得到确认。
就这样!大家现在可以调整PyTorch模型的参数。
?
接下来,给大家介绍一下租用GPU做实验的方法,我们是在智星云租用的GPU,使用体验很好。具体大家可以参考:智星云官网: http://www.ai-galaxy.cn/,淘宝店:https://shop36573300.taobao.com/公众号: 智星AI