China Mechanical Engineering ›› 2007, Vol. 18 ›› Issue (19): 2326-2329.

Previous Articles     Next Articles

Extraction Approach of Patent Information Based on Regular Expression

Qiu Qingying;Zheng Guomin;Feng Pei’en;Wu Jianwei   

  • Received:1900-01-01 Revised:1900-01-01 Online:2007-10-10 Published:2007-10-10

基于正则表达式的专利信息提取方法研究

邱清盈;郑国民;冯培恩;武建伟   

Abstract:

Since current patent documents are saved as image-based type such as .TIF, .PDF, and so on, they are difficult for full-text search and further analysis. The approach that adoped the optical character recognition (OCR) tool and the fault-tolerant regular expressions was proposed for patent digitization and information extraction according to the structural features of patent documents. The software system was developed to support the batch extraction of patent information, which provided the data resources for the following automatic patent analysis and knowledge mining.

Key words: patent analysis, information extraction, regular expression, design knowledge

摘要:

针对图像格式专利文献难以进行全文检索和深入分析利用的问题,根据专利文献的结构特点,通过集成光学字符识别工具和建立具有容错性的专利信息提取正则表达式,提出了专利文献的数字化和信息提取方法。开发了相应的软件系统,实现了专利信息的批量提取,为后续高效率地对专利文献进行自动分析和知识挖掘提供了数据基础。

关键词: 专利分析, 信息提取, 正则表达式, 设计知识

CLC Number: