Data Discovery for the SDGs: A Systematic Rule-based Approach

In 2015, the United Nations put forward 17 Sustainable Development Goals (SDGs) to be achieved by 2030, where data has been promoted as a focus to innovating sustainable development and as a means to measuring progress towards achieving the SDGs. In this study, we propose a systematic approach towards discovering data types and sources that can be used for SDG research. The proposed method integrates a systematic mapping approach using manual qualitative coding over a corpus of SDG-related research literature followed by an automated process that applies rules to perform data entity extraction computationally. This approach is exemplified by an analysis of literature relating to SDG 7, the results of which are also presented in this paper. The paper concludes with a discussion of the approach and suggests future work to extend the method with more advance NLP and machine learning techniques.


INTRODUCTION
In 2015, the UN adopted a new agenda for Sustainable Development, including 17 Sustainable Development Goals (SDGs) and 169 related targets to be achieved by each member state by 2030 [31].To monitor the achievement of SDGs, it is significant to obtain data and then conduct an analysis.Surveys and interviews are typical data collection methods in sustainable development.The advantage of surveys is that it is possible to solicit a large amount of data from target populations, for example, via a census.Interviews give researchers more in-depth information and allow for discussions between interviewers and interviewees to dig deeper into specific issues.However, they are time-consuming and cannot be generalised, especially on a large scale [26].Therefore, the typical limitation of these traditional data collection methods is that they are often expensive and time-consuming [7].
With the rapid development of the Internet and technology, new data sources have emerged, such as data from social media and data gathered through different types of sensors.There are also various data-sharing resources, such as open databases to which the public has free access.Such online resources provide us with a new perspective on how we can monitor the progress of the SDGs.The diversity of data sources and recent advances in technologies like the Cloud and the Internet of Things (IoT) has made the speed of data generation reach an unprecedented level [4].We are currently in the Big Data era [17], where abundance of data bring opportunities to many fields, including sustainable development.
Furthermore, data-driven technology such as artificial intelligence (AI) can contribute to the SDGs by enabling more accurate predictions, efficient resource allocation, and targeted interventions for achieving sustainable development goals [34].While the traditional methods of data collection, such as household surveys, play a vital role in monitoring Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.Request permissions from permissions@acm.org.the development of society from a long-term perspective, other kinds of data may supplement traditional methods to provide real-time statistics [35].In some cases, data can be used as proxies that are better than traditional methods for gathering information relating to sustainable development.For example, when natural disasters like earthquakes occur, emergency services may need to make decisions based on real-time data, such as the number of casualties and estimated economic loss; something is evident that surveys and interviews cannot provide in real-time [18].
There is a wide range of research exploring how to use various data sources to monitor the progress of the SDGs.
Many different types of data include satellite imagery, sensing data, health records, social media data, and more [16].
Open databases such as UNdata and World Bank Open Data also store and share many SDG-related data.A report commissioned by the UN Economic and Social Commission for Asia and the Pacific (ESCAP) systematically analysed and presented around 140 projects to show what types of data and innovative data-driven approaches could support tracking the fulfilment of SDGs [7].However, there are no dedicated resources, especially for those stakeholders in the SDGs, which describe what data sources may be helpful for monitoring or addressing the SDGs, except for occasionally published analyses such as those mentioned earlier presented by UN ESCAP.We, therefore, come to ask the following research question: "What methodological approach can enable the discovery and analysis of data relevant to the SDGs?" In the remainder of this paper, we describe related work reported in the literature about infrastructures enabling data discovery and their relation to the SDGs and systematic approaches to knowledge discovery.We then follow with a description of our methodology for data source discovery.We then describe in the results section the application of our method to discovering data sources relevant to SDG 7. Finally, we discuss the implications for our approach and future lines of inquiry.

Data infrastructures and the SDGs
Data infrastructures are defined by [13] as "...the institutional, physical and digital means for storing, sharing and consuming data across networked technologies".Gray et al. [10] state that data infrastructures contain data sets and describe how data could be used and shared."Raw" data makes no sense unless people understand the context of the data, which is how it is generated and what it can be used for.Gray et al state that data infrastructures were not only data resources but were interpreted as a set of relations in terms of datasets, standards, policies, and more.Kitchin [14] points out that data infrastructures can be likened to recipes, where chefs describe how to use the ingredients.
When considering these perspectives, we can think of data infrastructures as a means to make data available to others interested in it, such as data analysts, data scientists and software developers, and thus, to those who are likely to make the best use of the shared data.UN ESCAP's report describes how Big Data, IoT, and AI already have great power in their fields, and a combination of these three technologies would create more value, especially in the area of sustainable development [7].IoT serves to gather a large amount of data from loads of different sources apart from traditional methods, and the data of various types and sources are said to characterise Big Data approaches, which are analysed through AI methods to explore patterns and help with decision-making.MacFeely [16] states that if such methods were correctly applied, they would reduce the time spent on calculating SDG indicators and improve the accuracy of results.Not only have scholars noted that different types of data, and their combination thereof, have the potential for monitoring SDGs, but the UN has also realised the opportunities brought by the widespread abundance of data.There are various generalised research data sharing platforms such as Harvard's Dataverse [12], as well as platforms for specific domains such as COPO for the life sciences [30], and data catalogues such as FAIRsharing [29] aim to facilitate finding relevant data within the data diaspora.Within sustainable development, the UN Global Platform was set up to support the worldwide collaboration for data sharing for the SDGs [32].
While efforts such as the UN Global Platform aim to provide a central resource for data sharing relating to the SDGs, it is still challenging to discover the relevant data for use in SDG activities.Even with these available data infrastructures, the there is still a gap towards organisations using data analytics to contribute towards the SDGs [6].This paper aims to provide an approach to discovering data used in SDG research.

Systematic approaches to knowledge discovery
While data infrastructures may provide facilities to find data sources, they often rely on explicit contributions from a community or dedicated individuals to build and maintain such resources.For example, the infrastructures mentioned earlier (Dataverse, COPO and FAIRsharing) depend on researchers' contributions to deposit or index their data, and the UN Global Platform relies on maintenance by its own staff.Research is often reported in the literature, published in journals and conferences, and indexed by bibliographic databases (e.g., PubMed, Web of Science, JSTOR).Such publications report the research outputs rather than the specifics of the data they may have used.Thus, some work is required for data infrastructures to be up-to-date and curate themselves to report on new data types and sources (for brevity, we will refer to text mentioning data types and sources as "data entities").
Automated data discovery has frequently been proposed in the literature [8,9,19], but its adoption in scientific research has yet to be established.There exist general-purpose search engines (e.g.Google, Bing), knowledge engines (e.g., Wolfram Alpha, Watson) and, more recently, large language model-based tools (e.g., ChatGPT, Bard) that can assist in data discovery.There also exists only one dedicated dataset search engine, Google's Dataset Search.While these are powerful tools, they are also opaque in how they function and are designed to be general-purpose.When considering specific tasks within a specific domain, such as discovering data for particular SDGs, we often prefer a systematic and transparent approach to maintain trust in a given discovery.Thus, systematic research approaches remain relevant despite these powerful automated tools being available.
The Systematic Literature Review (SLR) is a method that typically aims to identify, evaluate, and synthesise a corpus research literature in order to address a research question that may or may not be related to the aims of the constituent research reported [36].SLRs are characterised by defining a reproducible method of selecting, appraising, and combining research evidence relevant to that question.This method aims to minimise bias by performing an exhaustive search and quality appraisal of the literature and by making explicit the methods used for data extraction and synthesis.
The Systematic Quantitative Literature Review (SQLR) is similar to the SLR in its approach to providing a reproducible method while being comprehensive in review and minimising bias.However, SQLRs, as the name suggests, are quantitative, aiming to quantitatively summarise and analyse the literature [25].This is through counting features of interest in the literature, such as the categories of studies published, geographical and temporal trends, types of methods used and characteristics of the results.In contrast to traditional SLRs, SQLRs do not aim to synthesise the published results but rather synthesise the studies' characteristics.
Similar to SQLRs, but differing in aims, the Systematic Mapping Study (SMS) is a method used to provide an overview of a broad field of interest [24].It helps to identify gaps in research, trends, and the type and quantity of evidence available, usually by quantitatively summarising and visualising a body of literature.The main aim of an SMS is to provide an overview and create a 'map' of the research done in a particular field.SMSs do not usually delve into the relationships between study characteristics and outcomes, while SQLRs go further by quantitatively analysing these characteristics and their outcomes.
Finally, an emerging trend is in automating such systematic approaches, wholly or in part, with notable examples being the Computational Literature Review [20] and the Smart Literature Review [1].However, these are designed to automate performing SLR-like reviews on substantial bodies of literature by which a traditional human-performed review would be infeasible, rather than extracting specific features or entities from a corpus.Typically these use methods such as topic modelling [2] and bibliographic analyses [5].

Summary
While there exist various systematic approaches to discovering knowledge in literature, our research question focuses on extracting knowledge about data entities relevant to the SDG.We propose that this knowledge about data entities, which would typically be discoverable by using data infrastructures, can be extracted from the literature systematically.
We describe our method for achieving this in the next section.

METHODOLOGY
To address the research question, we develop a methodology to discover what SDG data are available based on the analysis of published literature.We based our approach on SMSs, which aim to provide an overview of a particular research area by classifying research studies to understand the covered topics and contributions of these studies [24].
To scale the approach, we propose a computational pipeline based on rule-based entity extraction to analyse larger corpora to find data sources.Entity extraction is a simple but powerful technique for finding specific entities in text [21], whereby one can define a set of rules to extract data entities.
SMSs have been used to map topics relating to IT; for example, where Petersen et al. [24] investigated the field of software engineering, and Pedriera et al. [22] explored how gamification has been applied to software engineering.Both of these studies mentioned above adopted similar research processes to obtain the mappings by using four main steps: (1) Define the keywords for searching for literature from selected bibliographic databases.
(2) Filter out the results according to a defined selection and quality criteria considering the study field, language, accessibility, etc.
(3) Qualitatively analyse the remaining papers by reading them and extracting necessary features.
(4) Validate the process by evaluating the final mapping that is produced.
Our methodology is inspired by this SMS process, where we aim to identify data entities as reported in SDG-related research literature, which are our features of interest, and create a descriptive mapping of those results.The value in taking this approach is that we can delimit the literature searches to relate to specific SDGs within the SDG framework -that is, by each of the 17 constituent SDGs or even to a subset of the 169 individual SDG Targets -and then discover the relevant data entities that may have nuances in different areas of sustainable development.
In the following subsections, we describe each of the following broad steps: (1) Construct a corpus of literature within an SDG or SDG Target of interest (steps M1 and M2).
(2) Manually analyse the literature to extract data entities (steps M3 and M4), combining similar entities to map into higher-level data categories.
(3) Develop entity extraction rules based on the data entities found in (2) to bootstrap an automated pipeline.
(4) Construct an expanded corpus of literature within an SDG or SDG Target of interest.

Corpus construction
First, we must decide on the focus of our data entity discovery by choosing a particular SDG or SDG Target to constrain the document search.While it may be desirable to try and perform our data entity extraction for all SDGs collectively, we assume that a more valuable structure to the SDG data mapping is to place them within the constraints of each of the 17 SDGs or 169 SDG Targets.This places a practical constraint to make the analysis more feasible by limiting the scope of possible documents to analyse.One can then decide on a literature source (or sources) on which to perform the document search, as one would in a regular SMS, for example, using one or more bibliographic search engines such as the ACM Digital Library, Scopus or Web of Science.Two corpora are constructed -a first one that is manually analysed to develop the SDG data taxonomy, and a second larger corpus to apply the data taxonomy in the automated entity extraction pipeline to develop the final SDG data mapping.
3.1.1SDG-related keyword search.Next, we must define the search with specific keywords relating to the SDG of interest.There are no standard method for this kind of search, as each SDG can be characterised with keywords in multiple ways.In one approach in a study by Brugman et al. [3], the authors identified sustainability-related courses and explored ways to enhance students' involvement in these courses.To build an inventory of sustainability-related courses, a list of keywords was defined for 17 SDGs by analysing the contents of each SDG and any possible topics.For example, where SDG 7 is about affordable and clean energy, the keywords might include "energy", "renewable", "wind", "solar", "geothermal", and "hydroelectric".
Since 2017, he Aurora Universities Network in collaboration with Scopus provides comprehensive peer-reviewed Scopus queries for each of the SDGs to search for SDG-relevant documents [33].However, each SDG query returns a vast number of papers, in the orders of 10 × 10 3 for some SDGs up to 10 × 10 6 results with others, making it challenging to construct and analyse the corresponding corpora.
A highly restrictive strategy to define the search is to search for mentions of the relevant SDG; for example, with SDG 7, we would search for documents containing the keywords "SDG7" and "SDG 7".This has the advantage of focusing the search specifically on documents that mention the SDG of interest, ensuring a very high relevance in the search results.On the other hand, this strategy inevitably leads to missing documents of relevance to the SDG area that are not explicit in their relevance to the SDG.

SDG-document selection.
After retrieving a list of documents via the search, we apply any inclusion and exclusion criteria to filter the initial search results further.Regarding the inclusion criteria, firstly, the document's full text must be accessible because the text is needed to extract what data was used or referenced in the document.Not all documents are open access, and many journals and conference proceedings require paid subscriptions to access.Therefore, this criterion is highly dependent on the resources available to the researcher.The second criterion is that the paper should be SDG-related, which is initially checked manually by reading the document's title and abstract.For example, if we used the abbreviation "SDG" in the initial search, there is no guarantee that each search result is about the SDG.SDG may sometimes refer to other terms than expected.It could also be that a document mentions the relevant SDG but has a different focus.In such cases, the papers would be excluded for further analysis.A third criterion is only to include papers published after 2015.This is because the SDGs were put forward in 2015, so we only consider something before 2016 to ensure unrelated documents are excluded.Further criteria that could be applied include document types (e.g., conference or journal paper), language, and publication stage (e.g., accepted, published, pre-print etc.).Fig. 1.Illustration of the filtering applied using the inclusion and exclusion criteria, exemplified with the search results on SDG 7 the search described in Table 1.

Corpus analysis for data source extraction
After setting the search strategy and document selection criteria, and consequently compiling a corpus of documents to analyse, we perform a two-stage process to extract the data entities reported as being used in the published research.
The first stage consists of manual qualitative coding on a literature sample to inductively build a taxonomy of data entity terms and their groupings.This is done through open coding of the literature followed by axial coding [11] to develop the SDG data taxonomy.This taxonomy is then used to bootstrap a set of rules implemented in Python to label documents according to the taxonomy automatically.These rules are used in an automated entity extraction pipeline to label a more extensive corpus with the data entities relating to the same SDG to develop the final SDG data mapping.
Bootstrapping refers to a semi-supervised learning approach [27] where a small set of manually annotated data (in our case the manually coded literature) is used to train an initial model (the resulting data taxonomy from which we develop rules), which is then used to annotate a larger set of data (the application of the rules to the expanded dataset).

Manual Qualitative
Coding of Data Sources.The first stage of the analysis consists of a manual qualitative coding process divided into four steps, as shown in Figure 2 in steps M1-M4.Steps M1 and M2 implement the search strategy and apply the inclusion/exclusion criteria discussed above.
Step M3 requires the researcher to read each document to understand the data used or referenced by the document.The data described is usually collected by the document Step A1

Get PDFs of SDG-related research documents
Step A2 Covert PDFs to text Extract data types and sources (automated) Develop SDG data mapping and visualise Extract sentences containing "data", "dataset" and related terms Step M1

Document search for SDG-related research
Step M2

Apply inclusionexclusion criteria to list of documents found
Step M3 Open coding to inductively extract data types and sources Step M4

Expanded bibliographic search
Initial bibliographic search, e.g., using Scopus, etc.

SDG data taxonomy Final SDG data mapping
Covert PDF to image using Poppler Machine learning to extract text from images using Tesseract-OCR

Automated entity extraction
For each PDF document

Search for data types and sources using rules based on initial SDG data taxonomy
For each text document Step A3 Step A4 Fig. 2.An overview of the qualitative coding process applied to an initial corpus of documents relating to a specific SDG or SDG Target.Note that manual coding first develops an initial SDG data taxonomy that is then used as the basis for rules developed for the automated coding stage.The automated entity extraction is then applied to an expanded document corpus to produce a final SDG data mapping.
authors or sourced from other organisations or public databases.No matter where the data is from, it should be recorded because any data could be a potential for the SDG of interest.Traditional data sources like interviews and surveys may be mentioned, as well as sources from databases or other digital systems.For some documents, the authors might not describe the data source but illustrate what the data relates to (e.g., electricity) or the data type (e.g., hourly household smart electricity meter readings).Therefore, data types should be considered as another type of data entity.A key consideration during this process is that the researcher should inductively develop the categorisation without any preconceived expectation of the resulting taxonomy.Finally, in step M4, these results are summarised as a taxonomy and labelling of the papers.

Automated Entity
Extraction of Data Sources.The second stage of the analysis consists of an automated entity extraction process divided into four steps, as shown in Figure 2 in steps A1-A4, implemented as a data processing pipeline in Python.The main idea behind this stage is to take the SDG data taxonomy developed from the manual qualitative coding by the human researcher and implement entity extraction rules as a decision list [28].We can then run an automated analysis on an expanded document corpus to label each document with data entity reported.
Step A1 involves getting PDF files of each document from the list of documents determined in the initial stage.Next, the PDFs are converted into raw text in step A2.This is not trivial as PDF documents vary in formatting and layouts, so our pipeline transforms the pages of each PDF into images using the Poppler software package, where the text is then extracted using the Tesseract-OCR (Optical Character Recognition) engine to text files.Not only does this help us more comprehensively capture the text of a document, but it also captures text embedded in figures such as charts and images that may contain text referencing data.In step A3, our pipeline then performs some data cleaning and finds sentences that mention variations on "data" or "data set" and applies the decision list rules to label each document with a data entity.Finally, in step A4, we produce the final mapping as a table of documents and labels and plot the corresponding visualisations of the distributions as sunburst charts [37] using the Plotly Python package.

EXEMPLAR ANALYSIS ON SDG 7
We applied the methodology described in the previous section to analyse literature drawn from SDG 7 (Affordable and Clean Energy) to demonstrate our approach to discovering SDG data.

Constructing an SDG 7 Data Taxonomy
Scopus was used as the bibliographic database, and the keywords "SDG 7", "SDG7", and "data" were applied to the title, abstract, and keywords search fields, resulting in 87 candidate documents for inclusion.We then applied each selection criteria to focus the document corpus further to arrive at a final corpus of 53 documents to use in the first stage of our methodology, manual quantitative coding.A summary of the document search and selection that constitute steps M1 and M2 are shown earlier in Table 1, and the effect of applying each selection criterion is shown in Figure 1.
The final list of documents was exported from the Scopus database to an Excel spreadsheet containing the document title, abstract, keywords, and Web links to the Scopus record.The researcher then read each document online while performing open coding and added a short summary description about each in the spreadsheet, along with any data entity of interest noted (step M3).Once each document had been analysed, the open codes were reviewed via axial coding and grouped into higher-level classifications thus building up an initial SDG 7 data taxonomy (step M4).
This taxonomy organised data entities of interest into two broad groups: data sources and data types.The documents were then further organised into hierarchical classification levels based on shared traits to better understand the taxonomy.
4.1.1Data sources.The data sources were initially divided into traditional statistics and organisational databases.
Traditional statistics included data collected from surveys, censuses, interviews, focus groups, and questionnaires.
Organisational databases provided an easier way for researchers to find where to obtain the needed data.Under this category, two subgroups were put forward according to the scope of an organisation or a database so that it was either international or national.
As the largest international organisation in the world, the United Nations serves to maintain world peace, keep friendly relations among countries, and provide better lives for human beings.Apart from the UN statistical databases, four more agencies of the UN were referred to as data sources in the papers.Mineral data was also mentioned because it met some requirements for technologies to generate clean energy, like magnets in wind turbines.Electricity data was another element under the category of resource data.An indicator of SDG 7 is the "proportion of population with access to electricity", so electricity data has a close connection with SDG 7, and it needs to be considered when measuring the progress of SDG 7. Energy use could also affect other resources, so water and land use data were discussed in the selected papers.As for the weather data, it was also mentioned in the papers.The weather affected the generation of wind, water, and bioenergy.The geographic data included satellite imagery, Geographic Information System (GIS) data, Global Positioning System (GPS) data, and OpenStreetMap data.
These kinds of data helped to understand the distribution of energy consumption.The data collected from sensors was another typical data type.As a critical component of IoT, the sensor could be applied in various fields.For example, it could collect energy consumption and waste to measure energy efficiency.All the data types described above were obtained manually from the selected 53 papers.

Data Entity Extraction for SDG 7
Next, we use the SDG 7 data taxonomy from the first stage to prepare for the second stage of analysis, automated entity extraction, by developing a decision list that classifies each document entry based on the SDG 7 data taxonomy from the first stage.For example, in the manual qualitative coding stage it was found that the International Energy Agency was a relevant data entity (and data source), and this was identified by noting the words "International Energy Agency" and its abbreviation "IEA" within the context of a sentence about data or data sets.Based on this observation from the open coding, we model this classification computationally as a Boolean decision function in Python code.We build the decision list with similar functions for each code developed in the open and axial coding steps.The first step, A1, in the entity extraction pipeline, is to obtain the PDF file for each of the documents.This was done by manually downloading the PDFs from the relevant publisher and storing them locally in a folder on a local disk.Next, steps A2 to A4 are run where Python code loads each document from disk and converts the PDFs to text (step A2), applies the decision list classifier (step A3), and compiles the final mapping and corresponding visualisation (step A4).

Data discovered for SDG 7
In the results obtained from the automated entity extraction, the percentage of data sources and data types accounted for was nearly half, 55%, and 45%.A visualisation of the summary distribution of data entities and categories for the SDG 7 data mapping is shown in Figure 3 da ta so ur ce s 55 .Loading [MathJax]/extensions/MathMenu.jsFig. 3.An example sunburst chart showing the results of an application of the entity extraction approach for SDG 7. The centre of the chart shows the highest level of the taxonomy hierarchies, dividing the categories found as data sources and data types, followed by more specific subcategories, down to their lowest-level data entities.
In the category of data sources, the number of organisations and databases that were referred to was more than twice that of traditional statistics.The survey was still the most popular conventional way of data collection among these studies.International organisations and databases were the traditional data sources, accounting for more than one-third of the total data entities.The World Bank, databases of the UN, and Eurostat were the top three frequently referred data sources, meaning they were the primary sources among the SDG 7 related papers.
Regarding the data types, the source and geographic data were discussed more often than the sensor data and weather.
The proportion of diverse resource and geographic data chosen in the research was around 95% of data types.Data on electricity, land use, solar, and water use were the most popular being used to monitor the progress of SDG 7. and nuanced data entities more effectively.This could be achieved through more sophisticated NLP or machine learning techniques.
Another promising direction is to improve the handling of non-traditional or emerging data sources reported in SDG literature.The current mapping only provides a quantitative summary distribution of data entities.However, it may be interesting to investigate how one can quantify novelty, such as in [15], of data uses in SDG research to determine innovative uses of data.Finally, it may also be interesting to see researchers apply our method across the range of SDGs to review the landscape of data used in sustainable development research.

Table 1 .
Summary of the literature search strategy, exemplified with the highly restrictive strategy on SDG 7.
The Food and Agriculture Organisation (FAO) cares about ending hunger and food security issues; the Sustainable Development Solutions Network (SDSN) seeks to better solutions for achieving SDGs through cooperation among worldwide experts; the United Nations Educational, Scientific and Cultural Organisation (UNESCO) is responsible for the world peace in education, science, and culture; the World Health Organisation (WHO) works to promote public health worldwide.Four European Union (EU) organisations were also components of international organisations.Eurostat provides statistics on various topics covering different social themes, like energy, agriculture, and more for the member states.The Emissions Database for Global Atmospheric Research (EDGAR) records statistics for emissions of greenhouse gases like carbon dioxide on Earth.As an EU program focusing on our environment, Copernicus provides data and services concerning the atmosphere, marine, land, climate change, and more.European Soil Data Center (ESDAC) delivers soil-related data at different European scales.As for those international organisations that neither belongs to the UN nor the EU were categorised into the same group.The World Bank is a global partnership aiming to end world poverty and reach shared prosperity, containing indicators as measurements and series data on diverse topics like energy.International Energy Agency (IEA) is a platform for delivering comprehensive global energy statistics of high quality, including indicators, energy demand, prices, and more.Apart from the World Bank and IEA, other organisations under this category pay attention to energy and economics, country spatial data, emissions, and population.4.1.2Data types.The data types were divided into resource, weather, geographic, and sensor data.Different kinds of resource data were analysed in the obtained papers.Renewable energy was one of the core contents of SDG 7. Biomass energy, solar energy, and heat, typical examples of renewable energy, were discussed and referred to in the articles.