DLRover-RM: Resource Optimization for Deep Recommendation Models Training in the cloud
Wang Q, Lan T, Tang Y, et al. DLRover-RM: Resource Optimization for Deep Recommendation Models Training in the Cloud[J].
Published in VLDB, 2024
Wang Q, Lan T, Tang Y, et al. DLRover-RM: Resource Optimization for Deep Recommendation Models Training in the Cloud[J].
Published in VLDB, 2024
DLRover makes the distributed training of large AI models easy, stable, fast and green. It can automatically train the Deep Learning model on the distributed cluster. It helps model developers to focus on model arichtecture, without taking care of any engineering stuff, say, hardware acceleration, distributed running, etc. 