MLPerf–ML benchmark suite
A broad ML benchmark suite for measuring performance of ML software frameworks, ML hardware accelerators, and ML cloud platforms.
Image Classification
Dataset: Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M. S.; Berg, A. C. & Fei-Fei, L. (2015), ‘ImageNet Large Scale Visual Recognition Challenge’, International Journal of Computer Vision (IJCV).
Model: He, K.; Zhang, X.; Ren, S. & Sun, J. (2015), ‘Deep Residual Learning for Image Recognition’, CoRR abs/1512.03385.
Object Identification
Dataset: Lin, T.-Y.; Maire, M.; Belongie, S. J.; Bourdev, L. D.; Girshick, R. B.; Hays, J.; Perona, P.; Ramanan, D.; Dollбr, P. & Zitnick, C. L. (2014), ‘Microsoft COCO: Common Objects in Context’, CoRR abs/1405.0312.
Model: He, K.; Gkioxari, G.; Dollбr, P. & Girshick, R. B. (2017), ‘Mask R-CNN’, CoRR abs/1703.06870.
Translation
Dataset: WMT English-German from Bojar, O.; Buck, C.; Federmann, C.; Haddow, B.; Koehn, P.; Monz, C.; Post, M. & Specia, L., ed. (2014), Proceedings of the Ninth Workshop on Statistical Machine Translation, Association for Computational Linguistics, Baltimore, Maryland, USA.
Model: Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L. & Polosukhin, I. (2017), ‘Attention Is All You Need’, CoRR abs/1706.03762.
Speech-to-Text
Dataset: Panayotov, V.; Chen, G.; Povey, D. & Khudanpur, S. (2015), Librispeech: An ASR corpus based on public domain audio books, in ‘2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)’, pp. 5206-5210.
Model: Amodei, D.; Anubhai, R.; Battenberg, E.; Case, C.; Casper, J.; Catanzaro, B.; Chen, J.; Chrzanowski, M.; Coates, A.; Diamos, G.; Elsen, E.; Engel, J.; Fan, L.; Fougner, C.; Han, T.; Hannun, A. Y.; Jun, B.; LeGresley, P.; Lin, L.; Narang, S.; Ng, A. Y.; Ozair, S.; Prenger, R.; Raiman, J.; Satheesh, S.; Seetapun, D.; Sengupta, S.; Wang, Y.; Wang, Z.; Wang, C.; Xiao, B.; Yogatama, D.; Zhan, J. & Zhu, Z. (2015), ‘Deep Speech 2: End-to-End Speech Recognition in English and Mandarin’, CoRR abs/1512.02595.
Recommendation
Dataset: Harper, F. M. & Konstan, J. A. (2015), ‘The MovieLens Datasets: History and Context’, ACM Trans. Interact. Intell. Syst. 5(4), 19:1–19:19.
Model: He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X. & Chua, T.-S. (2017), ‘Neural Collaborative Filtering’, CoRR abs/1708.05031.
Sentiment Analysis
Dataset: Maas, A. L.; Daly, R. E.; Pham, P. T.; Huang, D.; Ng, A. Y. & Potts, C. (2011), Learning Word Vectors for Sentiment Analysis, in ‘Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies’, Association for Computational Linguistics, Portland, Oregon, USA, pp. 142–150.
Model: Johnson, R. and Zhang, T. (2014), Effective use of word order for text categorization with convolutional neural networks, CoRR abs/1412.1058.
Reinforcement Learning
Dataset: Games from Iyama Yuta 6 Title Celebration, between contestants Murakawa Daisuke, Sakai Hideyuki, Yamada Kimio, Hyakuta Naoki, Yuki Satoshi, and Iyama Yuta.
Model: Tensorflow/minigo implementation by Andrew Jackson.
——————————