Posts by Collection

portfolio

ExtraTrees

Random Forests are often the textbook example of highly parallel algorithms that are unsuitable for GPUs. In this project, we propose and evaluate our CUDA implementation of Extremely Randomized Trees (or ExtraTrees), a variant of Random Forests which are more amenable to GPU-driven parallel programming.
[Code] [Report]

sndict

Structured Nested Dictionaries. This module provides extensions to dicts in the python standard library, providing fast and clean manipulation of nested dictionary structures.
[Code]

publications

Deep Neural Networks Improve Radiologists’ Performance in Breast Cancer Screening

[Paper] [arXiv] [Code] [Data Report] [Medium Post] - Abstract
Citation: Nan Wu, Jason Phang, Jungkyu Park, Yiqiu Shen, Zhe Huang, Masha Zorin, Stanisław Jastrzębski, Thibault Févry, Joe Katsnelson, Eric Kim, Stacey Wolfson, Ujas Parikh, Sushma Gaddam, Leng Leng Young Lin, Kara Ho, Joshua D. Weinstein, Beatriu Reig, Yiming Gao, Hildegard Toth, Kristine Pysarenko, Alana Lewin, Jiyon Lee, Krystal Airola, Eralda Mema, Stephanie Chung, Esther Hwang, Naziya Samreen, S. Gene Kim, Laura Heacock, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras. Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening. IEEE Transactions on Medical Imaging, 2019.

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

[Paper] - Abstract
Citation: Sidney Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, Usvsn Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach GPT-NeoX-20B: An Open-Source Autoregressive Language Model. Preprint.

What Language Model to Train if You Have One Million GPU Hours?

[Paper] - Abstract
Citation: Teven Le Scao, Thomas Wang, Daniel Hesslow, Lucile Saulnier, Stas Bekman, M Saiful Bari, Stella Biderman, Hady Elsahar, Niklas Muennighoff, Jason Phang, Ofir Press, Colin Raffel, Victor Sanh, Sheng Shen, Lintang Sutawika, Jaesung Tae, Zheng Xin Yong, Julien Launay, Iz Beltagy What Language Model to Train if You Have One Million GPU Hours?. Findings of EMNLP 2022.

Tool Learning with Foundation Models

[Paper] - Abstract
Citation: Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, Yi Ren Fung, Yusheng Su, Huadong Wang, Cheng Qian, Runchu Tian, Kunlun Zhu, Shihao Liang, Xingyu Shen, Bokai Xu, Zhen Zhang, Yining Ye, Bowen Li, Ziwei Tang, Jing Yi, Yuzhang Zhu, Zhenning Dai, Lan Yan, Xin Cong, Yaxi Lu, Weilin Zhao, Yuxiang Huang, Junxi Yan, Xu Han, Xian Sun, Dahai Li, Jason Phang, Cheng Yang, Tongshuang Wu, Heng Ji, Zhiyuan Liu, Maosong Sun Tool Learning with Foundation Models. Preprint.

teaching