中国机械工程 ›› 2026, Vol. 37 ›› Issue (4): 977-986.DOI: 10.3969/j.issn.1004-132X.2026.04.022

• 再制造与退役产品资源化技术 • 上一篇    下一篇

基于异构图与改进近端策略优化的退役机电产品选择性拆卸序列规划

郭洪飞1(), 傅文杰1, 任亚平2()   

  1. 1.内蒙古工业大学智能科学与技术学院(网络空间安全学院), 呼和浩特, 010080
    2.北京理工大学(珠海)智能制造技术研究中心, 珠海, 519088
  • 收稿日期:2025-07-28 出版日期:2026-04-25 发布日期:2026-05-11
  • 通讯作者: 任亚平
  • 作者简介:郭洪飞,男,1980年生,教授、博士研究生导师。研究方向为智能制造、工业互联网、数字孪生等。E-mail:ghf-2005@163.com
    任亚平*(通信作者),男,1995年生,副教授、博士研究生导师。研究方向为可持续设计与制造、产品拆卸决策理论与方法、优化算法设计及应用等。E-mail:renyp1@163.com
  • 基金资助:
    国家自然科学基金(52465061);国家自然科学基金(52205526);内蒙古自治区科技创新重大示范工程“揭榜挂帅”项目(2024JBGS0035);内蒙古自治区自然科学基金重点项目(2024ZD26);内蒙古自治区重点研发和成果转化计划(2023YFJM0007);广州市科技计划(202201010284);中央高校基本科研业务费专项资金(21623219)

Selective Disassembly Sequence Planning for Retired Electromechanical Products Based on Heterogeneous Graph with Improved Proximal Policy Optimization

GUO Hongfei1(), FU Wenjie1, REN Yaping2()   

  1. 1.College of Intelligent Science and Technology(College of Cyberspace Security),Inner Mongolia University of Technology,Hohhot,010080
    2.Research Center of Intelligent Manufacturing Technology,Beijing Institute of Technology,Zhuhai,Guangdong,519088
  • Received:2025-07-28 Online:2026-04-25 Published:2026-05-11
  • Contact: REN Yaping

摘要:

针对当前选择性拆卸序列规划问题中存在物理建模复杂、适应性差以及算法泛化能力不足等问题,结合结构化异构图建模方法和自适应近端策略优化算法,提出了一种高效的拆卸序列优化方法。通过结构化异构图建模,统一表示产品中零部件的多约束关系,为后续优化提供更具表达力的状态表示;在优化算法中引入优势函数标准化与熵正则化机制,对不同训练阶段数据因量纲差异所带来的分布不一致进行规范化调整,同时自适应调节训练过程中的探索强度,以提高模型的训练稳定性和泛化能力。实验结果表明,引入优势函数标准化显著提高了算法的收敛速度和训练稳定性,而熵正则化机制则增强了算法的探索能力。与传统深度强化学习算法相比,自适应近端策略优化算法在收敛性和最优策略质量方面均表现更好。

关键词: 退役产品, 拆卸规划, 深度强化学习, 异构图

Abstract:

To address the issues of complex physical modeling, poor adaptability, and insufficient algorithm generalization in the current selective disassembly sequence planning problem, a structured heterogeneous graph modeling method was proposed, which combined with an adaptive proximal policy optimization algorithm to achieve efficient disassembly sequence optimization. Through the structured heterogeneous graph modeling, the multi-constraint relationships of the product components were unified, providing a more expressive state representation for subsequent optimization. Additionally, in the optimization algorithm, advantage function normalization and entropy regularization mechanism were introduced to standardize the data distribution inconsistency caused by dimensional differences across different training stages, while adaptively adjusting the exploration intensity during the training processes to enhance the model's training stability and generalization ability. Experimental results show that the introduction to advantage function normalization significantly improves the algorithm's convergence speed and training stability, while the entropy regularization mechanism enhances the algorithm's exploration ability. Compared with traditional deep reinforcement learning algorithms, the proposed method performes better in terms of convergence and the quality of the optimal policy.

Key words: retired product, disassembly planning, deep reinforcement learning, heterogeneous graph

中图分类号: