基于异构图与改进近端策略优化的退役机电产品选择性拆卸序列规划

doi:10.3969/j.issn.1004-132X.2026.04.022

中国机械工程 ›› 2026, Vol. 37 ›› Issue (4): 977-986.DOI: 10.3969/j.issn.1004-132X.2026.04.022

• 再制造与退役产品资源化技术 • 上一篇下一篇

基于异构图与改进近端策略优化的退役机电产品选择性拆卸序列规划

郭洪飞¹(), 傅文杰¹, 任亚平²()

^1.内蒙古工业大学智能科学与技术学院(网络空间安全学院), 呼和浩特, 010080
^2.北京理工大学(珠海)智能制造技术研究中心, 珠海, 519088

收稿日期:2025-07-28 出版日期:2026-04-25 发布日期:2026-05-11
通讯作者: 任亚平
作者简介:郭洪飞，男，1980年生，教授、博士研究生导师。研究方向为智能制造、工业互联网、数字孪生等。E-mail：ghf-2005@163.com
任亚平^*（通信作者），男，1995年生，副教授、博士研究生导师。研究方向为可持续设计与制造、产品拆卸决策理论与方法、优化算法设计及应用等。E-mail：renyp1@163.com。
基金资助:
国家自然科学基金(52465061);国家自然科学基金(52205526);内蒙古自治区科技创新重大示范工程“揭榜挂帅”项目(2024JBGS0035);内蒙古自治区自然科学基金重点项目(2024ZD26);内蒙古自治区重点研发和成果转化计划(2023YFJM0007);广州市科技计划(202201010284);中央高校基本科研业务费专项资金(21623219)

Selective Disassembly Sequence Planning for Retired Electromechanical Products Based on Heterogeneous Graph with Improved Proximal Policy Optimization

GUO Hongfei¹(), FU Wenjie¹, REN Yaping²()

^1.College of Intelligent Science and Technology（College of Cyberspace Security），Inner Mongolia University of Technology，Hohhot，010080
^2.Research Center of Intelligent Manufacturing Technology，Beijing Institute of Technology，Zhuhai，Guangdong，519088

Received:2025-07-28 Online:2026-04-25 Published:2026-05-11
Contact: REN Yaping

摘要/Abstract

摘要：

针对当前选择性拆卸序列规划问题中存在物理建模复杂、适应性差以及算法泛化能力不足等问题，结合结构化异构图建模方法和自适应近端策略优化算法，提出了一种高效的拆卸序列优化方法。通过结构化异构图建模，统一表示产品中零部件的多约束关系，为后续优化提供更具表达力的状态表示；在优化算法中引入优势函数标准化与熵正则化机制，对不同训练阶段数据因量纲差异所带来的分布不一致进行规范化调整，同时自适应调节训练过程中的探索强度，以提高模型的训练稳定性和泛化能力。实验结果表明，引入优势函数标准化显著提高了算法的收敛速度和训练稳定性，而熵正则化机制则增强了算法的探索能力。与传统深度强化学习算法相比，自适应近端策略优化算法在收敛性和最优策略质量方面均表现更好。

关键词: 退役产品, 拆卸规划, 深度强化学习, 异构图

Abstract:

To address the issues of complex physical modeling， poor adaptability， and insufficient algorithm generalization in the current selective disassembly sequence planning problem， a structured heterogeneous graph modeling method was proposed， which combined with an adaptive proximal policy optimization algorithm to achieve efficient disassembly sequence optimization. Through the structured heterogeneous graph modeling， the multi-constraint relationships of the product components were unified， providing a more expressive state representation for subsequent optimization. Additionally， in the optimization algorithm， advantage function normalization and entropy regularization mechanism were introduced to standardize the data distribution inconsistency caused by dimensional differences across different training stages， while adaptively adjusting the exploration intensity during the training processes to enhance the model's training stability and generalization ability. Experimental results show that the introduction to advantage function normalization significantly improves the algorithm's convergence speed and training stability， while the entropy regularization mechanism enhances the algorithm's exploration ability. Compared with traditional deep reinforcement learning algorithms， the proposed method performes better in terms of convergence and the quality of the optimal policy.

Key words: retired product, disassembly planning, deep reinforcement learning, heterogeneous graph

中图分类号:

X705

郭洪飞, 傅文杰, 任亚平. 基于异构图与改进近端策略优化的退役机电产品选择性拆卸序列规划[J]. 中国机械工程, 2026, 37(4): 977-986.

GUO Hongfei, FU Wenjie, REN Yaping. Selective Disassembly Sequence Planning for Retired Electromechanical Products Based on Heterogeneous Graph with Improved Proximal Policy Optimization[J]. China Mechanical Engineering, 2026, 37(4): 977-986.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: https://www.cmemo.org.cn/CN/10.3969/j.issn.1004-132X.2026.04.022

https://www.cmemo.org.cn/CN/Y2026/V37/I4/977

图/表 13

参考文献 19

[1]	ZHANG X， FU A， ZHAN C， et al. Selective Disassembly Sequence Planning under Uncertainty Using Trapezoidal Fuzzy Numbers： a Novel Hybrid Metaheuristic Algorithm［J］. Engineering Applications of Artificial Intelligence， 2024， 128： 107459.
[2]	朱卓悦，徐志刚，沈卫东，等. 基于遗传蝙蝠算法的选择性拆卸序列规划［J］. 浙江大学学报（工学版），2018， 52（11）： 2120-2127.
	ZHU Zhuoyue， XU Zhigang， SHEN Weidong， et al. Selective-disassembly Sequence Planning Based on Genetic-bat Algorithm［J］.Journal of Zhejiang University（Engineering Science）， 2018， 52（11）：2120-2127.
[3]	GUO H， ZHANG L， REN Y， et al. Optimizing a Stochastic Disassembly Line Balancing Problem with Task Failure via a Hybrid Variable Neighborhood Descent-artificial Bee Colony Algorithm［J］. International Journal of Production Research， 2023， 61（7）： 2307-2321.
[4]	郭洪飞，傅文杰，李雷孝，等.基于最优化的拆卸序列规划研究进展［J］.计算机工程与应用，2025，61（11）：51-66.
	GUO Hongfei， FU Wenjie LI Leixiao， et al. Research Progress on Optimization-based Disassembly Sequence Planning［J］. Computer Engineering and Applications， 2025，61（11）：51-66.
[5]	朱建峰，徐志刚，苏开远. 基于改进蛙跳算法的多目标选择性拆卸序列规划方法［J］. 计算机集成制造系统， 2022， 28（3）： 676-689.
	ZHU Jianfeng， XU Zhigang， SU Kaiyuan. Multi-objective Selective Disassembly Sequence Planning Based on Multi-objective Improved Frog Leaping Algorithm［J］. Computer Integrated Manufacturing Systems， 2022， 28（3）： 676-689.
[6]	任亚平，郭洪飞，张超勇，等. 考虑产品制造过程内含能的选择性拆解规划能耗优化研究［J］. 机械工程学报， 2021， 57（6）： 200-210.
	REN Yaping， GUO Hongfei， ZHANG Chaoyong， et al. Energy Consumption Optimization of Selective Disassembly Planning Considering Product Embodied Energy during Manufacturing［J］. Journal of Mechanical Engineering， 2021， 57（6）： 200-210.
[7]	REN Y， GUO H， LI Y， et al. A Self-adaptive Learning Approach for Uncertain Disassembly Planning Based on Extended Petri Net［J］. IEEE Transactions on Industrial Informatics， 2023， 19（12）： 11889-11897.
[8]	HU Y， LIU C， ZHANG M， et al. An Ontology and Rule-based Method for Human-robot Collaborative Disassembly Planning in Smart Remanufacturing［J］. Robotics and Computer-Integrated Manufacturing， 2024， 89： 102766.
[9]	王蕾，何宸，曹建华，等. 考虑失效状态的废旧机械产品选择性拆卸序列规划方法［J］. 现代制造工程， 2022（5）： 145-152.
	WANG Lei， HE Chen， CAO Jianhua， et al. Failure Status Based Planning Method for Selective Disassembly Sequence of Used Machinery Products［J］. Modern Manufacturing Engineering， 2022（5）： 145-152.
[10]	田永廷，张秀芬，徐劲芳，等. 支持再制造的选择性并行拆卸序列规划方法［J］. 计算机辅助设计与图形学学报， 2018， 30（3）： 531-539.
	TIAN Yongting， ZHANG Xiufen， XU Jinfang， et al. Selective Parallel Disassembly Sequence Planning Method for Remanufacturing［J］. Journal of Computer-Aided Design & Computer Graphics， 2018， 30（3）： 531-539.
[11]	ZHAO X， LI C， TANG Y， et al. Reinforcement Learning-based Selective Disassembly Sequence Planning for the End-of-life Products with Structure Uncertainty［J］. IEEE Robotics and Automation Letters， 2021， 6（4）： 7807-7814.
[12]	XIAO J， ZHANG Z， TERZI S， et al. Dynamic Task Allocations with Q-learning Based Particle Swarm Optimization for Human-robot Collaboration Disassembly of Electric Vehicle Battery Recycling［J］. Computers & Industrial Engineering， 2025， 204： 111133.
[13]	LI Y， ZHU W， GUO J， et al. Multi-objective Collaborative Optimization of Green Disassembly Planning and Recovery Option Decision Considering the Learning Effect［J］. Journal of Manufacturing Systems， 2025， 80： 324-343.
[14]	PENG Y， LI W， ZHOU Y， et al. Dynamic Disassembly Planning of End-of-life Products for Human-robot Collaboration Enabled by Multi-agent Deep Reinforcement Learning［J］. IEEE Transactions on Automation Science and Engineering， 2025， 22： 13907-13919.
[15]	REN Y， MENG L， TIAN G， et al. An Efficient M-step Lookahead Rollout Algorithm for Profit-oriented Selective Disassembly Sequence Planning with Operation Stochastic Failure［J］. Engineering Applications of Artificial Intelligence， 2025， 145： 110173.
[16]	CHU X， LI L， ZHAO F， et al. Disassembly Time Estimation for Used Smartphones Based on Maynard Operation Sequence Technology［J］. Computers & Industrial Engineering， 2024， 193： 110291.
[17]	刘旖菲，李小帅，杨俊安，等. 基于SANER-PPO算法的无人机集群干扰资源分配方法［J］. 控制与决策， 2024， 39（12）： 3937-3945.
	LIU Yifei， LI Xiaoshuai， YANG Jun’an， et al. SANER-PPO Algorithm-based Jamming Resource Allocation for UAV Swarm［J］. Control and Decision， 2024， 39（12）： 3937-3945.
[18]	ALFARO-ALGABA M， RAMIREZ F J. Techno-economic and Environmental Disassembly Planning of Lithium-ion Electric Vehicle Battery Packs for Remanufacturing［J］. Resources， Conservation and Recycling， 2020， 154： 104461.
[19]	WANG K， LI X， GAO L， et al. A Discrete Artificial Bee Colony Algorithm for Multi Objective Disassembly Line Balancing of End-of-life Products［J］. IEEE Transactions on Cybernetics， 2022， 52（8）： 7415-7426.

参数	Actor网络	Critic网络
输入层	Linear，17	Linear，17
隐藏层1	64，leaky_relu	64，leaky_relu
隐藏层2	64，leaky_relu	64，leaky_relu
输出层	17，Softmax	Linear，1

参数	Actor网络	Critic网络
输入层	Linear，17	Linear，17
隐藏层1	64，leaky_relu	64，leaky_relu
隐藏层2	64，leaky_relu	64，leaky_relu
输出层	17，Softmax	Linear，1

序号	零部件	p/元	c/元	n	t/s
1	电池顶壳	42.00	0.2	1	22
2	顶部绝缘层	1.50	0.2	1	4
3	BJB与CMC之间的电缆	2.00	0.1	1	7
4	电池接线盒（BJB）	200.00	0.1	1	36
5	高压电缆和连接器	24.00	0.1	1	15
6	电池管理控制器顶部横盖	10.00	0.4	2	10
7	CMC之间的插入电缆	24.00	3.5	1	14
8	模块连接器	16.00	2.8	4	7
9	侧面模块接口	2.90	1.9	16	3
10	上下层紧固件	22.40	3.8	32	4
11	散热板	60.00	1.2	4	15
12	冷却管	4.00	0.2	2	3
13	底部绝缘层	1.50	0.6	1	5
14	电池底壳	12.00	0.2	1	36
15	电池模块	600.00	0.3	8	87
16	电池管理控制器（CMC）	80.00	1.2	8	24

序号	零部件	p/元	c/元	n	t/s
1	电池顶壳	42.00	0.2	1	22
2	顶部绝缘层	1.50	0.2	1	4
3	BJB与CMC之间的电缆	2.00	0.1	1	7
4	电池接线盒（BJB）	200.00	0.1	1	36
5	高压电缆和连接器	24.00	0.1	1	15
6	电池管理控制器顶部横盖	10.00	0.4	2	10
7	CMC之间的插入电缆	24.00	3.5	1	14
8	模块连接器	16.00	2.8	4	7
9	侧面模块接口	2.90	1.9	16	3
10	上下层紧固件	22.40	3.8	32	4
11	散热板	60.00	1.2	4	15
12	冷却管	4.00	0.2	2	3
13	底部绝缘层	1.50	0.6	1	5
14	电池底壳	12.00	0.2	1	36
15	电池模块	600.00	0.3	8	87
16	电池管理控制器（CMC）	80.00	1.2	8	24

目标件	算法	拆卸序列	奖励值R	拆卸利润p/元	拆卸时间t/s	算法运行时间T/s
4，15，16	PPO	1-3-4-6-8-10-11-9-7-15-16 1-3-4-6-8-9-10-11-7-15-16 1-3-4-6-7-8-10-11-9-15-16 1-3-5-4-6-7-8-9-10-11-15-16	316	912.3	236.8	26.9
	DQN	1-2-3-4-6-8-10-9-7-11-15-16 1-2-3-5-4-6-8-10-9-7-11-15-16 1-2-5-3-4-6-8-9-10-11-7-15-16	290	832.6	249.3	20.8
	SAC	1-3-4-6-8-5-10-7-11-9-15-16 1-2-3-5-4-6-8-9-10-11-7-15-16	300	876.3	246.7	23.4
	ANE-PPO	1-3-4-6-8-10-11-7-9-15-16 1-3-4-6-8-9-10-11-7-15-16 1-3-4-6-8-7-9-10-11-15-16	335	932.7	227.4	18.4

基于异构图与改进近端策略优化的退役机电产品选择性拆卸序列规划

Selective Disassembly Sequence Planning for Retired Electromechanical Products Based on Heterogeneous Graph with Improved Proximal Policy Optimization

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 19

相关文章 4

编辑推荐

Metrics

序号	零件名称	p/元	c/元	t/s	序号	零件名称	p/元	c/元	t/s	序号	零件名称	p/元	c/元	t/s
1	上门体	3.1	0.2	10	25	电线	0.8	0.1	3	47	工艺管路	7.5	1.3	30
2	上保温层	2.5	0.2	8	26	灯泡	1.0	0.2	2	48	压缩机固定件	2.4	0.6	12
3	BJB与CMC 之间的电缆	2.2	0.1	7	27	螺丝	0.5	0.1	4	49	套筒和垫片	1.0	0.2	8
4	螺钉	0.6	0.1	5	28	控温盒	1.2	0.3	5	50	接地螺钉	0.6	0.1	6
5	合页	1.5	0.1	3	29	螺钉	0.8	0.1	12	51	继电器盖板	0.4	0.1	3
6	下门体	6.7	0.4	5	30	托盘滚轮	0.9	0.1	12	52	压缩机	21.3	1.8	6
7	下保温层	2.1	0.2	6	31	蒸发器挡块	0.8	0.2	12	53	垫片	0.6	0.1	2
8	螺钉	0.5	0.1	4	32	蒸发器托盘	0.8	0.1	6	54	接线盒外壳	2.5	0.6	10
9-10	塞门	0.6	0.1	2	33	蒸发器	13.6	1.6	20	55	过载保护器	2.0	0.5	8
11	螺钉	0.6	0.1	5	34	螺钉帽	0.4	0.1	4	56	PTC继电器	2.5	0.7	8
12	铰链	1.5	0.2	3	35	螺钉	0.5	0.1	4	57	继电器线	1.5	0.8	6
13	储物抽屉	7.1	0.6	5	36	管路盖板	1.8	0.2	3	58	电源线	2.0	0.3	8
14	门垫片	2.2	0.2	6	37	螺钉	0.4	0.1	4	59	螺钉	0.6	0.1	12
15	螺钉	0.5	0.1	4	38	顶板	1.2	0.4	3	60	压缩机底盘	4.1	1.7	3
16-17	塞门	1.4	0.2	3	39	调节按键	0.5	0.1	4	61	前地脚	1.3	0.5	5
18	灯罩	0.4	0.1	2	40	螺钉	1.0	0.1	8	62	螺钉	1.2	0.1	8
19	螺钉	0.3	0.1	2	41	控制板	7.2	1.1	6	63	下铰链	1.4	0.5	3
20	温控按钮	0.6	0.1	2	42	控制板线	1.1	0.2	4	64	前地脚	1.3	0.3	5
21	齿轮	0.4	0.1	2	43	门开关	1.4	0.3	5	65	螺钉	0.6	0.1	4
22	螺钉	0.5	0.1	4	44	螺钉	1.0	0.2	10	66	加强地脚	0.9	0.4	3
23	温控器	1.2	0.2	4	45	压缩机盖板	2.5	0.7	4
24	变温器	1.5	0.3	3	46	接水盘	2.3	0.4	5

目标件	算法	拆卸序列	R	p/元	t/s	T/s
6、13、33、41、47、52	PPO	2-37-1-38-40-41-4-5-6-11-12-13-42-3-43-31-32-33-34-35-36-44-45-47-54-48-49-52 2-29-1-37-38-40-41-4-5-6-11-12-13-3-43-31-32-33-34-35-36-44-45-47-54-48-49-52 2-37-1-38-40-41-4-5-6-11-12-13-42-30-3-43-31-32-33-34-35-36-44-45-47-48-49-52	107.7	76.3	211.3	255.3
	DQN	1-2-37-38-40-41-4-5-6-11-12-13-3-43-31-32-33-34-35-36-44-45-47-48-49-50-51-52 1-2-37-38-40-41-4-5-6-11-12-13-39-29-30-3-43-31-32-33-34-35-36-44-45-47-48-49-52 2-1-37-38-4-40-41-5-6-11-12-13-3-43-31-32-33-34-35-36-44-45-47-48-49-46-50-51-52	88.4	68.4	216.7	236.7
	SAC	2-37-1-38-40-41-4-5-6-11-12-13-39-3-43-31-32-33-34-35-36-44-45-47-48-49-52 2-37-1-38-4-5-6-40-11-12-13-41-42-39-3-43-31-32-33-34-35-36-44-45-47-48-49-52	98.1	73.4	209.4	262.6
	ANE-PPO	2-1-37-38-40-41-4-5-6-11-12-13-3-43-31-32-33-34-35-36-44-45-47-48-49-52 1-2-37-38-40-41-4-5-6-11-12-13-3-43-31-32-33-34-35-36-44-45-47-48-49-52 2-37-38-1-40-41-4-5-6-11-12-13-3-43-31-32-33-34-35-36-44-45-47-48-49-52	124.2	87.6	198.2	233.4

组别	熵正则化	优势函数标准化	R_g、R_p、R_t、η	1/2（R_g、R_p、R_t_、η）	2（R_g、R_p、R_t_、η）	奖励值	拆卸利润/元	拆卸时间/s	收敛时迭代次数
1			√			316	912.3	236.8	175
2	√		√			334.7	931.4	225.7	170
3		√	√			313.9	910.7	238.8	92
4				√		312.5	903.8	239.7	184
5					√	300.2	894.6	243.4	167
6	√	√	√			335	932.7	227.4	105

[1]	李益兵1, 2, 曹岩1, 郭钧1, 2, 王磊1, 2, 李西兴3, 孙利波4. 考虑峰值功率受限约束的柔性作业车间调度研究[J]. 中国机械工程, 2025, 36(02): 280-293.
[2]	胡楷雄1, 宋远航1, 周勇1, 李卫东2. 基于深度强化学习的混杂场景目标物体推抓协同策略[J]. 中国机械工程, 2025, 36(01): 133-140.
[3]	祝正宇1, 郭具涛2, 吕佑龙3, 左丽玲1, 张洁3. 面向柔性作业车间生产调度的深度强化学习方法[J]. 中国机械工程, 2024, 35(11): 2007-2014，2034.
[4]	郭具涛, 吕佑龙, 戴铮, 张洁, 郭宇. 基于复合规则和强化学习的混流装配线调度方法[J]. 中国机械工程, 2023, 34(21): 2600-2606,2614.