以上出自:Deep learning-based job placement in distributed machine learning clusters
Dorm [33]是用于调度ML作业的利用率公平性优化器。
Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach
OASiS [34]是用于ML作业的在线调度算法。
Online Job Scheduling in Distributed Machine Learning Clusters
SLAQ [35]和Optimus [18]为每个ML作业建立一个性能模型,并动态调整资源分配。
SLAQ: Quality-Driven Scheduling for Distributed Machine Learning
Optimus: an efficient dynamic resource scheduler for deep learning clusters
DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters