中国机械工程 ›› 2022, Vol. 33 ›› Issue (03): 329-338.DOI: 10.3969/j.issn.1004-132X.2022.03.009

• 智能制造 • 上一篇    下一篇

基于长短期记忆近端策略优化强化学习的等效并行机在线调度方法

贺俊杰1;张洁1;张朋1;汪俊亮1;郑鹏2;王明1   

  1. 1.东华大学机械工程学院,上海,201620
    2.上海交通大学机械与动力工程学院,上海,200240
  • 出版日期:2022-02-10 发布日期:2022-02-23
  • 通讯作者: 张洁(通信作者),女,1963年生,教授、博士研究生导师。研究方向为智能制造系统、工业大数据等。E-mail:mezhangjie@dhu.edu.cn。
  • 作者简介:贺俊杰,男,1995年生,硕士研究生。研究方向为智能优化调度、强化学习等。E-mail:hejunjie@mail.dhu.edu.cn。
  • 基金资助:
    国家重点研发计划(2019YFB1706300);
    东华大学青年教师科研启动基金

Related Parallel Machine Online Scheduling Method Based on LSTM-PPO Reinforcement Learning

HE Junjie1;ZHANG Jie1;ZHANG Peng1;WANG Junliang1;ZHENG Peng2;WANG Ming1   

  1. 1.School of Mechanical Engineering,Donghua University,Shanghai,201620
    2.School of Mechanical Engineering,Shanghai Jiao Tong University,Shanghai,200240
  • Online:2022-02-10 Published:2022-02-23

摘要: 针对等效并行机在线调度问题,以加权完工时间和为目标,提出了一种基于长短期记忆近端策略优化(LSTM-PPO)强化学习的在线调度方法。通过设计融合LSTM的智能体记录车间的历史状态变化和调度策略,进而根据状态信息进行在线调度。设计了车间状态矩阵对问题约束和优化目标进行描述,在调度决策中引入额外的设备等待指令来扩大解空间,并设计奖励函数将优化目标分解为分步奖励值实现调度决策评价。最后基于PPO算法进行模型更新和参数全局优化。实验结果表明所提方法优于现有的几种启发式规则,并将所提算法应用于实际车间的生产调度,有效减小了加权完工时间和。

关键词: 等效并行机, 在线调度, 强化学习, 长短期记忆近端策略优化

Abstract: To solve the related parallel machine online scheduling problems, the total weighted completion time was taken into account, and an online scheduling method was proposed based on LSTM-PPO reinforcement learning. A LSTM-integrated agent was designed to record the historical variations of workshop states and the corresponding scheduling policy adjustment, and then online scheduling decision was made according to the state information. Meanwhile, the workshop state matrix was designed to describe the problem constraints and optimization goals, additional machine waiting was introduced in scheduling action space to expand solution space, and the reward function was designed to decompose the optimization goal into step-by-step rewards to achieve scheduling decision evaluation. Finally, the model updating and global optimization of parameters was achieved by PPO algorithm. Experimental results show that the proposed method has competitive solutions than the existing heuristic rules, and the proposed algorithm is applied to the production scheduling of the actual workshops, which effectively reduces the total weighted completion time.

Key words: related parallel machine, online scheduling, reinforcement learning, proximal policy optimization with long short-term memory(LSTM-PPO)

中图分类号: