1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2020)

at the ACM/IEEE Joint Conference on Digital Libraries 2020 (JCDL2020), Wuhan, China

JCDL 2020

 

News: The workshop proceedings of EEKE 2020 are published now and you can see at http://ceur-ws.org/Vol-2658/

News:  Since EEKE is hosted by JCDL, at least one author per paper must register, see instructions here
<https://2020.jcdl.org/Registration.html>. Deadline for Regular Registration is July 30.

News:  EEKE2020 will be an all-virtual workshop as JCDL will be online only. Deadline for submission: June 22, 2020 (extended)

Keynote by Min Song (Yonsei University, South Korea): Entitymetrics 2.0: Measuring the Impact of Entities and Relations Extracted from Scientific Documents.

Keynote by Markus Stocker (TIB – Leibniz Information Centre for Science and Technology and University Library, Germany): Building Scholarly Knowledge Bases with Crowdsourcing and Text Mining.

Accepted Papers

The following papers have been accepted and will be presented at EEKE2020.

Long Papers

  • Mengjia Wu and Yi Zhang. Intelligent Bibliometrics for Discovering the Associations between Genes and Diseases: Methodology and Case study.
  • Jennifer D'Souza and Sören Auer. NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature.
  • Liangping Ding, Zhixiong Zhang, Huan Liu, Jie Li and Gaihong Yu. Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling.

Short Papers

  • Jin Mao, Shiyun Wang and Xianli Shang. Investigating interdisciplinary knowledge flow through citances.
  • Xin An, Jinghong Li, Shuo Xu, Liang Chen and Sainan Pi. A Novel Approach for Patent Similarity Measurement Based on Sequence Alignment.
  • Fang Tan, Siting Yang, Xiaoyan Wu and Jian Xu. Exploring the Relation between Biomedical Entities and Government Funding.
  • Sahand Vahidnia, Alireza Abbasi and Hussein A. Abbass. Document Clustering and Labeling for Research Trend Extraction and Evolution Mapping.
  • Liang Chen, Shuo Xu, Weijiao Shang, Zheng Wang, Chao Wei and Haiyun Xu. What is Special about Patent Information Extraction?
  • Yu Li, Tao Yue and Wu Zhenxin. IEKM-MD: An Intelligent Platform for Information Extraction and Knowledge Mining in Multi-Domains.

Poster Papers

  • Qikai Liu, Pengcheng Li, Wei Lu and Qikai Cheng. Long-tail dataset entity recognition based on Data Augmentation.
  • Xiaole Li, Yuzhuo Wang. Assessing Impact of Method Entities in a Special Task.
  • Chong Chen, Jingying Zhang, Xiaoyu Chu and Jinglin Zheng. Study on the Difference between Summary Peer Reviews and Abstracts of Scientific Papers.
  • Wei Shao and Hua Bolin. A Unsupervised Method for Terminology Extraction from Scientific Text.

Demo Paper

  • Zi Xiong, Yue Qi, Wei Lu and Qikai Cheng. Design and Implementation of an Academic Search System Based on a General Query Language and Automatic Question Answering.

  

Aim of the Workshop

In the era of big data, massive amounts of information and data have dramatically changed human civilization. The broad availability of information provides more opportunities for people, but there has appeared a new challenge: how can we obtain useful knowledge from numerous information sources. A knowledge entity is a relatively independent and integral knowledge module in a special discipline or a research domain [1]. As a crucial medium for knowledge transmission, scientific documents that contain a large number of knowledge entities attract the attention of scholars [2]. In scientific documents, knowledge entities refer to the knowledge mentioned or cited by authors, such as algorithms, models, theories, datasets and software, which reflect the various resources used by the authors in solving problems. Extracting knowledge entities from scientific documents in an accurate and comprehensive way becomes a significant topic. We may recommend documents related to a given knowledge entity (e.g. LSTM model) for scholars, especially for beginners in a research field. DARPA has recently launched the ASKE (Automating Scientific Knowledge Extraction) project [3], which aims to develop next-generation applications of artificial intelligence.
Therefore, the goal of this workshop is to engage the related communities in open problems in the extraction and evaluation of knowledge entities from scientific documents. At present, scholars have used knowledge entities to construct general knowledge-graphs [4] and domain knowledge-graphs [5]. Data sources for these studies include text (news, policy files, email, etc.) and multimedia (video, image, etc.) data. Compared to existing research and workshops like Joint workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL) [6] or Workshop on Mining Scientific Publications (WOSP) [7], this workshop aims to extract knowledge entities from scientific documents, and explore the feature of entities to conduct practical applications. The results of this workshop are expected to provide scholars, especially early career researchers, with knowledge recommendations and other knowledge entity-based services.

See more details and related work about the workshop in the JCDL 2020 overview paper.

 

Workshop Topics

This workshop will be relevant to scholars in computer and information science, specialized in Information Extraction, Text Mining, NLP, IR and Digital Libraries. It will also be of importance for all stakeholders in the publication pipeline: implementers, publishers and policymakers. This workshop entitles this cutting-edge and cross-disciplinary direction Extraction and Evaluation of Knowledge Entity, highlighting the development of intelligent methods for identifying knowledge claims in scientific documents, and promoting the application of knowledge entities. We invite stimulating research on topics including, but not limited to, methods of knowledge entity extraction and applications of knowledge entity. Specific examples of fields of interest include:

  • Task and methodology from scientific documents
  • Model and algorithmize entity extraction from scientific documents
  • Dataset and evaluation metrics extraction from scientific documents
  • Software and tool extraction from scientific documents [8]
  • Construction of a knowledge entity graph and roadmap [9]
  • Knowledge entity summarization
  • Relation extraction of knowledge entity
  • Construction of a knowledge base of knowledge entities
  • Modeling function of knowledge entity citation
  • Bibliometrics of knowledge entity
  • Application of knowledge entity extraction

 

Programme

1st Keynote

Entitymetrics 2.0: Measuring the Impact of Entities and Relations Extracted from Scientific Documents [slides]

Abstract: Since the concept of entitymetrics was first introduced in 2013, entitymetrics has been applied to measure the impact of entities as well as to gauge the knowledge usage and transfer anchored on entities for knowledge discovery. This concept extends informetrics by quantifying the importance of various types of entities such as concept, dataset, and domain entities buried in a large amount of full-text collections. Entitymetrics uses entities for knowledge usage as well as discovery. We claim that it is the next generation of content-based citation analysis in that it aims to utilize entities to create a knowledge graph for scientific discovery where entities are connected to each other either by citation or predicate relation. In this talk, the previous studies employing entitiymetrics are summarized and the limitations of the current approaches are discussed. In addition, the future directions of entitymetrics are suggested.

min_song_picture.jpgMin Song is an Underwood Distinguished Professor in the Department of Library and Information Science at Yonsei University. Prior to Yonsei, Min was an Associate Professor of the Department of Information Systems and co-director of the Informatics Research Laboratory at New Jersey Institute of Technology, where the goal of his research is discovery of knowledge from large natural language data such as blogs, doctor’s notes, and scientific publications. His research interests are in biomedical text mining, social media mining, and informetrics. He has published more than 150 journal and conference papers. He is a section chief editor of Frontiers in Text Mining and Literature-based Discovery and is an editorial board member of Information Processing & Management and Data & Knowledge Engineering.

 

2nd Keynote

Building Scholarly Knowledge Bases with Crowdsourcing and Text Mining [slides]

Abstract: For centuries, scholarly knowledge has been buried in documents. While articles are great to convey the story of scientific work to peers, they make it hard for machines to process scholarly knowledge. The recent proliferation of the scholarly literature and the increasing inability of researchers to digest, reproduce, reuse its content are constant reminders that we urgently need a transformative digitalization of the scholarly literature. Building on the Open Research Knowledge Graph (http://orkg.org) as a concrete research infrastructure, in this talk we present how using crowdsourcing and text mining humans and machines can collaboratively build scholarly knowledge bases, i.e. systems that acquire, curate and publish data, information and knowledge published in the scholarly literature in structured and semantic form. We discuss some key challenges that human and technical infrastructures face as well as the possibilities scholarly knowledge bases enable.

Markus_Stockerpicture.jpgMarkus Stocker is Head of the Knowledge Infrastructures research group at the TIB Leibniz Information Centre for Science and Technology and co-lead of the Open Research Knowledge Graph project. He holds a PhD in Environmental Informatics from the University of Eastern Finland; a MSc in Environmental Science from the University of Eastern Finland; and a Diploma (MSc) in Informatics from the University of Zurich, Switzerland. His research interests lie at the intersection between research infrastructures and research communities, and how such knowledge infrastructures acquire, maintain, and share scholarly knowledge about human and natural worlds.

 

Sessions

The workshop will be held on August 1, 2020 (Beijing Time), and specific activities include keynotes, paper presentations and a poster & demonstration session.

13:10-13:30 Connection setup: we will provide details    
13:30-13:40 Introduction [slides] Co-Chairs of EEKE2020 (Chengzhi Zhang, Philipp Mayr, Wei Lu, Yi Zhang)
 
13:40-14:20 Keynote 1: Entitymetrics 2.0: Measuring the Impact of Entities and Relations Extracted from Scientific Documents [slides] Min Song Chair: Chengzhi Zhang
 
14:20-15:00 Session 1: Knowledge Entity Extraction and Application Chair: Shuo Xu
14:20-14:40 NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature [slides] Jennifer D'Souza and Sören Auer  
14:40-15:00 Intelligent Bibliometrics for Discovering the Associations between Genes and Diseases: Methodology and Case study [slides] Mengjia Wu and Yi Zhang  
 
15:00-15:30 Coffee break    
 
15:30-16:35 Session 2: Entity Extraction from Scientific Documents Chair: Yingyi Zhang
15:30-15:50 Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling [slides] Liangping Ding, Zhixiong Zhang, Huan Liu, Jie Li and Gaihong Yu  
15:50-16:05 Investigating interdisciplinary knowledge flow through citances [slides] Jin Mao, Shiyun Wang and Xianli Shang  
16:05-16:20 IEKM-MD: An Intelligent Platform for Information Extraction and Knowledge Mining in Multi-Domains [slides] Yu Li, Tao Yue (Speaker) and Wu Zhenxin  
16:20-16:35 What is Special about Patent Information Extraction? [slides] Liang Chen, Shuo Xu, Weijiao Shang, Zheng Wang, Chao Wei and Haiyun Xu  
 
16:35-17:00

Sesson 3: Interactive demos

  Chair: Chong Chen
16:35-17:00 Design and Implementation of an Academic Search System Based on a General Query Language and Automatic Question Answering [slides] Zi Xiong, Yue Qi, Wei Lu and Qikai Cheng  
 
17:00-17:30 Coffee break    
 
17:30-18:10 Keynote 2: Building Scholarly Knowledge Bases with Crowdsourcing and Text Mining [slides] Markus Stocker Chair: Philipp Mayr
 
18:10-18:55 Session 4: Entity Relation Extraction and Application Chair: Yi Zhang
18:10-18:25 A Novel Approach for Patent Similarity Measurement Based on Sequence Alignment [slides] Xin An, Jinghong Li, Shuo Xu, Liang Chen and Sainan Pi  
18:25-18:40 Exploring the Relation between Biomedical Entities and Government Funding [slides] Fang Tan, Siting Yang, Xiaoyan Wu and Jian Xu  
18:40-18:55 Document Clustering and Labeling for Research Trend Extraction and Evolution Mapping [slides] Sahand Vahidnia, Alireza Abbasi and Hussein A. Abbass  
 
18:55-19:50 Session 5: Poster/ Greeting Notes of EEKE2020 Chair: Jin Mao
18:55-19:05 Long-tail dataset entity recognition based on Data Augmentation [slides] Qikai Liu, Pengcheng Li, Wei Lu and Qikai Cheng  
19:05-19:15 Assessing Impact of Method Entities in a Special Task [slides] Xiaole Li, Yuzhuo Wang  
19:15-19:25 Study on the Difference between Summary Peer Reviews and Abstracts of Scientific Papers [slides] Chong Chen, Jingying Zhang, Xiaoyu Chu and Jinglin Zheng  
19:25-19:35 A Unsupervised Method for Terminology Extraction from Scientific Text [slides] Wei Shao and Hua Bolin  
19:35-19:50 Greeting Notes of EEKE2020 Co-Chairs of EEKE2020 (Chengzhi Zhang, Philipp Mayr, Wei Lu, Yi Zhang)
19:50 End of workshop    

 

Call for Papers

You are invited to participate in the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2020), to be held as part of the ACM/IEEE Joint Conference on Digital Libraries 2020 in Wuhan, China on August 1, 2020.

https://eeke2020.github.io/

Submission Information

Regular papers: All submissions must be written in English, following the ACM Proceedings template (10 pages for full papers and 4 pages for short papers exclusive of unlimited pages for references) and should be submitted as PDF files to EasyChair.

Poster & demonstration: We welcome submissions detailing original, early findings, works in progress and industrial applications of knowledge entities extraction ande evaluation for a special poster session, possibly with a 2-minute presentation in the main session. Some research track papers will also be invited to the poster track instead, although there will be no difference in the final proceedings between poster and research track submissions. These papers should follow the same format as the research track papers but can be shorter (2 pages for poster and demo papers).

Submit a paper

All submissions will be reviewed by at least two independent reviewers. Please be aware of the fact that at least one author per paper needs to register for the workshop and attend the workshop to present the work. In light of the recent events regarding the Coronavirus, EEKE2020 will be an all-virtual workshop as JCDL will be online only.

Workshop proceedings will be deposited online in the CEUR workshop proceedings publication service. This way the proceedings will be permanently available and citable (digital persistent identifiers and long term preservation).

Important Dates

All dates are Anywhere on Earth (AoE).

Deadline for submission: June 22, 2020 (extended)
Notification of acceptance: July 14, 2020
Camera ready: July 27, 2020
Workshop: August 1, 2020

Organising Committee

chengzhi_zhang_picture.png Chengzhi Zhang is a professor of Department of Information Management, Nanjing University of Science and Technology, China. He received his PhD degree of Information Science from Nanjing University, China. He has published more than 100 publications, including JASIST, Aslib JIM, JOI, OIR, SCIM, ACL, NAACL, etc. His current research interests include scientific text mining, knowledge entity extraction and evaluation, social media mining. He serves as Editorial Board Member and Managing Guest Editor for 7 international journals (Patterns, OIR, TEL, IDD, NLE, DI, etc.) and PC members of several international conferences in fields of natural language process and scientometrics. (zhangcz@njust.edu.cn)

 

philip_pmayr_picture.jpg Philipp Mayr is a team leader at the GESIS - Leibniz-Institute for the Social Sciences department Knowledge Technologies for the Social Sciences (WTS). He received his PhD in applied informetrics and information retrieval from the Berlin School of Library and Information Science at Humboldt University Berlin. He has published in top conferences and prestigious journals in the areas informetrics, information retrieval and digital libraries. His research group focuses on methods and techniques for interactive information retrieval and data set search. He was the main organizer of the BIR workshops at ECIR 2014-2020 and the BIRNDL workshops at JCDL 2016 and SIGIR 2017-2019. (philipp.mayr@gesis.org)

 

wei_lu_picture.png Wei Lu is a professor of School of Information Management and director of Information Retrieval and Knowledge Mining Center. He received his PhD degree of Information Science from Wuhan University, China. His current research interests include information retrieval, text mining, QA etc. He has papers published on SIGIR, Information Sciences, JASIT, Journal of Information Science etc. He serves as diverse roles (e.g., Associate Editor, Editorial Board Member, and Managing Guest Editor) for several journals. (weilu@whu.edu.cn)

 

yi_zhang_picture.jpeg Yi Zhang is a Lecturer at the Centre for Artificial Intelligence, Faculty of Engineering and Information Technology, University of Technology Sydney (UTS), Australia. He received dual PhD degrees, one from Beijing Institute of Technology, China and the other from UTS. He has authored more than 50 publications. His current research interests align with bibliometrics, text analytics, and information systems. He serves as diverse roles (e.g., Associate Editor, Editorial Board Member, and Managing Guest Editor) for one IEEE Trans and four other international journals. He is also a PC Member of several international conferences. (Yi.Zhang@uts.edu.au)

 

Programme Committee

  • Alireza Abbasi, University of New South Wales (Canberra)
  • Katarina Boland,GESIS - Leibniz Institute for the Social Sciences
  • Gaohui Cao, Central China Normal University
  • Chong Chen, Beijing Normal University
  • Gong Cheng, Nanjing University
  • Ed Fox,Virginia Tech
  • Saeed-Ul Hassan, Information Technology University, Pakistan
  • Zhigang Hu, Dalian University of Technology
  • Chenliang Li, Wuhan Univerisity
  • Jing Li, The Hong Kong Polytechnic University
  • Munan Li, South China University of Technology
  • Hongfei Lin, Dalian University of Technology
  • Wolfgang Otto, GESIS - Leibniz-Institute for the Social Sciences
  • Dwaipayan Roy, GESIS - Leibniz-Institute for the Social Sciences
  • Mayank Singh, Indian Institute of Technology Gandhinagar
  • Arho Suominen, VTT Technical Research Centre of Finland
  • Suppawong Tuarob, Mahidol University,Thailand
  • Xuefeng Wang, Beijing Institute of Technology
  • Yuzhuo Wang, Nanjing Univeristy of Science and Technology
  • Yanghua Xiao, Fudan University
  • Shuo Xu,Beijing University of Technology
  • Erjia Yan, Drexel University
  • Xiaojuan Zhang, Southwest University
  • Yingyi Zhang, Nanjing Univeristy of Science and Technology
  • Zhixiong Zhang, National Science Library, Chinese Academy of Sciences

References

  1. Chang, X., & Zheng, Q. (2007). Knowledge element extraction for knowledge-based learning resources organization. In International Conference on Web-Based Learning (pp. 102-113). Springer, Berlin, Heidelberg.
  2. Ying, D., Min, S., Jia, H., Qi, Y., Erjia, Y., Lili, L., Tamy, C. Entitymetrics: Measuring the Impact of Entities. Plos One, 2013, 8(8), e71416.
  3. https://www.darpa.mil/program/automating-scientific-knowledge-extraction
  4. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). Dbpedia: A nucleus for a web of open data. In The semantic web (pp. 722-735). Springer, Berlin, Heidelberg.
  5. http://www.geonames.org/
  6. Cabanac, G., Chandrasekaran, M. K., Frommholz, I., Jaidka, K., Kan, M. Y., Mayr, P., & Wolfram, D. (2017). Report on the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016). In ACM SIGIR Forum (Vol. 50, No. 2, pp. 36-43). New York, NY, USA: ACM.
  7. https://wosp.core.ac.uk/lrec2018/
  8. Boland, K., & Krüger, F. (2019). Distant supervision for silver label generation of software mentions in social scientific publications. In Proceedings of the 4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (pp. 15-27).
  9. Zha, H., Chen, W., Li, K., & Yan, X. (2019). Mining Algorithm Roadmap in Scientific Publications. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1083-1092).

Links

Related Workshops
BIRNDL 2019The 4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries
Venue: SIGIR 2019 in Paris, France
Organizing Committee: Muthu Kumar Chandrasekaran, Philipp Mayr, Dayne Freitag, Min-Yen Kan.
Proceedings: http://ceur-ws.org/Vol-2414/ 


SDP 2020First Workshop on Scholarly Document Processing
Venue: 2020 Conference on Empirical Methods in Natural LanguageProcessing (EMNLP 2020)
Website: https://ornlcda.github.io/SDProc/