Experience

Researcher Engineear @ InstaDeep

Full-time, Sep. 2019 - present.

I am working on the development of our reinforcement-learning-based decision-making product.

Part-time, Oct. 2018 - Sep. 2019.

I participated in the development of our reinforcement-learning-based decision-making product. I worked on the neural network design and experimental comparisons with mainstream reinforcement learning algorithms (PPO, AlphaZero, etc.). Previously, I developed a novel model-free algorithm which outperformed plain Monte-Carlo tree search (MCTS).

During 2018, my principal project was to extend the research results of Ranked Reward to glass cutting optimization problem, an industrial combinatorial optimization problem. Among all participants, our solution was the only one which did not depend on traditional optimization techniques and we ranked 16 out of 60 teams.

Internship, Feb. 2018 - Sep. 2018.

My main project was to use reinforcement learning to solve the bin packing problem, which is an NP-hard combinatorial problem. The objective of bin packing is to place boxes in containers efficiently, i.e. minimizing the waste of space, while respecting multiple physical constraints.

Based on AlphaZero, a reinforcement learning algorithm designed for two-player games, we proposed Ranked Reward to enable self-playing training in single-player games. Ranked Reward reshapes the rewards based on the agent’s previous performances to create a relative metric. Without specific human knowledge, the proposed method achieves super-human performance and surpasses MCTS, supervised agent, heuristic algorithm, and commercial integer programming solver (Gurobi) on both 2D and 3D bin packing problems.

This work has been published (arXiv) as workshop paper in both NeurIPS and AAAI. I also received the research internship award from École Polytechnique, France.

Besides the bin packing project, I participated in some other research projects as well, including evolutionary algorithms and auto machine learning. Specially, we presented our latest results on neural network architecture search at Deep Learning Indaba 2018.

Operations Research Analyst @ SNCF Réseau

Internship, Jun. 2017 - Sep. 2017.

Publication

More unlabelled data or label more data? A study on semi-supervised laparoscopic image segmentation

Yunguan Fu, Maria R. Robu, Bongjin Koo, Crispin Schneider, Stijn van Laarhoven, Danail Stoyanov, Brian Davidson, Matthew J. Clarkson, Yipeng Hu

Accepted to MICCAI 2019 Medical Image Learning with Less Labels and Imperfect Data Workshop, arXiv

Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization

Alexandre Laterre, Yunguan Fu, Mohamed Khalil Jabri, Alain-Sam Cohen, David Kas, Karl Hajjar, Hui Chen, Torbjorn S. Dahl, Amine Kerkeni, and Karim Beguir.

Accepted to NeurIPS 2018 Deep RL Workshop and AAAI 2019 RL in Games Workshop, arXiv.