China Mechanical Engineering ›› 2023, Vol. 34 ›› Issue (08): 976-981,992.DOI: 10.3969/j.issn.1004-132X.2023.08.012

Previous Articles     Next Articles

Semantic Extraction Method of Multi-scale Nuclear Power Quality Text Fault Information

WU Tingwei1;WANG Mengling1;YI Shuping2;GUO Jingren3   

  1. 1.Key Laboratory of Smart Manufacturing in Energy Chemical Process,Ministry of Education,East China University of Science and Technology,Shanghai,200237
    2.College of Mechanical Engineering,Chongqing University,Chongqing,400044
    3.China Nuclear Power Engineering Co.,Ltd.,Shenzhen,Guangdong,518000
  • Online:2023-04-25 Published:2023-05-17

多尺度核电质量文本故障信息语义抽取方法

吴庭伟1;王梦灵1;易树平2;郭景任3   

  1. 1.华东理工大学能源化工过程智能制造教育部重点实验室,上海,200237
    2.重庆大学机械工程学院,重庆,4000443.中广核工程有限公司,深圳,518000
  • 通讯作者: 王梦灵(通信作者),女,1980年生,副教授。研究方向为数据挖掘、人工智能算法。发表论文30余篇。E-mail:wml_ling@ecust.edu.cn。
  • 作者简介:吴庭伟,男,1998年生,硕士研究生。研究方向为文本分类、信息抽取。E-mail:y30200997@mail.ecust.edu.cn。
  • 基金资助:
    国家重点研发计划(2020YFB1711700)

Abstract: A semantic extraction method of multi-scale nuclear power quality text fault information was proposed to obtain the information of fault equipment and their stages from nuclear power quality text. The quality text included the faulty equipment and normal equipment, while the whole value chain stages of design, procurement, construction, and commissioning were not described. Firstly, based on Transformer bidirectional encoding, the pre-trained language model were used to convert nuclear equipment quality text into text vectors. The bidirectional gated recurrent unit network with attention mechanism was introduced to mine the key semantic features of quality text defects. On the basis of those above, the conditional random field was used to predict the key semantic features and output the fault equipment. Fine-tuning the extracted key semantic features by multi-layer perceptron, the stages of fault equipment was interpreted. Finally, the experimental verification was conducted based on real nuclear power quality text datasets, and the F1 value reached 94.3%. The results show that the proposed method has good feasibility and effectiveness. 

Key words: multi-scale, nuclear power quality text, semantic extraction, pre-trained language model, conditional random field

摘要: 提出了多尺度核电质量文本故障信息语义抽取方法,从核电质量文本描述中获取了存在质量缺陷的故障设备与所属阶段的信息。针对故障设备与正常设备并存,以及所属设计、采购、施工和调试的全价值链阶段未描述的问题,提出了多尺度故障信息抽取策略。基于Transformer双向编码的预训练语言模型将核电质量文本转化为文本向量;采用注意力机制的双向门控循环神经网络挖掘出质量缺陷的关键语义特征;采用条件随机场对关键语义特征进行实体预测,输出故障设备;通过多层感知机对提取的关键语义特征进行微调及推理,解译出故障设备所属阶段。最后,在真实的核电质量文本数据集上进行验证,F1值达到94.3%,表明提出的方法具有较好可行性和有效性。

关键词: 多尺度, 核电质量文本, 语义抽取, 预训练语言模型, 条件随机场

CLC Number: