Construction and Analysis of Collaborative Educational Networks based on Student Concept Maps

Network Analysis has traditionally been applied to analyzing interactions among learners in online learning platforms such as discussion boards. However, there are opportunities to bring Network Analysis to bear on networks representing learners' mental models of course material, rather than learner interactions. This paper describes the construction and analysis of collaborative educational networks based on concept maps created by undergraduates. Concept mapping activities were deployed throughout two separate quarters of a large General Education (GE) course about sustainability and technology at a large university on the West Coast of the United States. A variety of Network Analysis metrics are evaluated on their ability to predict an individual learner's understanding based on that learner's contributions to a network representing the collective understanding of all learners in the course. Several of the metrics significantly correlated with learner performance, especially those that compare an individual learner's conformity to the larger group's consensus. The novel network metrics based on collective networks of learner concept maps are shown to produce stronger and more reproducible correlations with learner performance than metrics traditionally used in the literature to evaluate concept maps. This paper thus demonstrates that Network Analysis in conjunction with collective networks of concept maps can provide insights into learners' conceptual understanding of course material.


INTRODUCTION
The Computer-Supported Collaborative Learning (CSCL) community has frequently turned to Network Analysis (often referred to as Social Network Analysis) to supplement traditional pedagogical assessments in characterizing learner understanding [33,49,56,66].Network Analysis in CSCL is typically performed over webs of nodes (usually representing individual learners), and edges (usually representing interactions or relationships between learners).The idea behind this kind of analysis is that individual actors are embedded in webs of relationships with other actors, and that these webs can be analyzed to gain insights about a particular actor based on that actor's ties to others [7].CSCL seeks to understand the relationships between learning, learner interaction, and digital technologies; as such, Network Analysis is a natural fit for CSCL due to its ability to characterize these relationships in a quantifiable way [32,47].Various studies have provided evidence that analyzing learner interaction with Network Analysis techniques can provide insights such as identifying learner roles [55,65], assessing learner problem-solving abilities [74], and understanding how learners make sense of complex systems [16].It has been argued that using Network Analysis over such a network of learners and their interactions is key to understanding how learning occurs [18,73].
However, CSCL has not fully taken advantage of the power of Network Analysis for investigating learner cognition.Cognitive Science research indicates that the cognitive processes in our brains form complex systems that help us solve problems, and that Network Analysis has enormous potential to model and investigate these processes [60,61].Unfortunately, current CSCL Network Analysis methodologies do not typically seek to construct or analyze networks of the cognitive models of learners.The current work takes a step towards advancing the subfield of Network Analysis in CSCL by taking advantage of insights gleaned from the application of Network Analysis in the field of Cognitive Science to model learner conceptions themselves as a network.
This study moves the conversation away from Social Network Analysis and towards a form of Epistemic Network Analysis [58] in which, rather than analyzing online interactions between learners in a discussion board setting, the focus is on analyzing learners' mental models about the central themes of a course.Learner concept map submissions are merged together to form a collaboratively-constructed collective network that can be seen as the entire class's consensus about the course material.Each individual learner map is then assessed based on its conformity to the course-wide consensus.This methodology provides several benefits, including 1) measuring what a learner knows at a given moment and how their knowledge changes over time, 2) understanding what the learners collectively know and how this collective understanding changes over time, and 3) understanding how an individual learner's mental model compares to the consensus of the larger group.
The work in this paper is novel in that it uses a group consensus-based approach to evaluate individual learners, based on the relative positioning of the learners' contributions within the collective network.The motivation for doing so is based on past results indicating that a group's collective mental model can approximate that of an expert [4,26,38].The specific contributions of this work are 1) a novel methodology for collecting, merging, and analyzing concept maps generated by learners, and 2) an empirical comparison of a variety of novel and traditional metrics for evaluating learners concept maps using data from two offerings of an undergraduate course.

BACKGROUND AND RELATED WORK 2.1 Network Analysis in CSCL
Network Analysis research in CSCL settings has examined whether network metrics can be indicative of a learner's performance.Traditionally, this has been done by studying unimodal networks of learner interactions, often generated automatically from collaborative online learning platforms such as discussion boards [18,56].Unimodal networks are those which have only one type of network actor, such as learners.These networks are often constructed using digital trace data such as log files documenting online interactions amongst learners or between learners and teachers [2,39,53].These works often argue that learners who engage more with other learners are likely to perform better, and this is tested by comparing the centrality of a learner's position within a network of learners to some external performance metric [8].The majority of studies found strong correlations between the centrality of a learner in a network of interactions and learner performance [13,51,62], though at least one found little to no correlation [47].
Although creating and analyzing social interaction networks is convenient and accessible, focusing solely on this type of data limits the understanding that can be gleaned about learning patterns in CSCL.These types of learner interaction networks are generally, though not always, unimodal.A unimodal network of learners does not contain information on what content was discussed, only who discussed it; thus, it can be difficult to track learning about specific topics using this approach.
A small amount of past work has focused on networks connecting learners with course topics (i.e., bimodal networks).Agarwal and Ahmed [3] parse learner editing of Wiki pages to create a bimodal network of learners and pages, in order to assess learner collaboration and engagement.Likewise, Kim et al. [35] construct a bimodal network representing both which learners interacted with each other and which course topics they discussed.This approach has the advantage of providing information about more than just with whom learners interacted, additionally including information about which specific topics and themes from the course learners focused on.
A pair of literature surveys from the past decade have pointed out the lack of diversity of network actors in CSCL Network Analysis research.These surveys have made calls to expand the breadth of actors and types of relational ties used, as well as increase the diversity of the metrics used to analyze the networks and correlate the results with learner performance or learning outcomes [11,18].

Concept Mapping
The phrase "concept mapping" is generally used to refer to Novakian concept mapping [9].Novakian concept mapping is an educational activity in which learners create network-like representations of the relationships between related concepts [46].For example, learners might map out their understanding of the Solar System by including concepts for planets, moons, and the Sun, and utilizing the relationship "revolves around" to connect the concepts in a logical manner.
Novakian concept mapping is often used to elicit an individual learner's mental model within a particular domain and/or to track individual learning [19,45,48,52].Novakian concept maps benefit both learners and instructors.For instance, they help learners focus on the connections between course topics [15,20,63] as well as integrate knowledge across modules of a given course [14].Additionally, they help instructors understand learner conceptions of course material, which is critical to improving teaching and achieving learning objectives [21,40,69].
Another technique commonly referred to as "concept mapping" is Group concept mapping.In contrast to Novakian concept mapping, Group concept mapping is a mixed-methods technique in which participants (often domain experts) brainstorm concepts related to a specific domain and sort these concepts into categories.Multidimensional Scaling (MDS) is applied to the participants' submissions in order to automatically generate a map of concepts displayed in two-dimensional conceptual space [54].In contrast to Novakian concept mapping, Group concept mapping does not seek to elicit the mental model of an individual nor generate relationships between concepts.It is instead concerned with the emergent properties of the algorithmically-generated group map, which may be used as a base of knowledge about the domain [54].
In this paper, learners constructed Novakian concept maps that are merged together to form a collective network.For clarity, future references in this paper to "concept maps" or "concept mapping" without a qualifier can be assumed to refer to Novakian concept maps.
Concept maps can be analyzed based on both content and structure.Often, the structural analysis of concept maps occurs in a qualitative manner; for example, maps can be visually classified using various structural templates such as "network", "chain", "tree", etc. [5,36].It is also common to analyze maps via simple quantitative metrics such as the number of concepts or the number of relationships present in the map.More intricate concept map evaluation methods also exist; for example, Biswas et al. [6] deploy a teachable automated agent that takes a pre-programmed quiz based on student concept map submissions; this allows students to receive real-time feedback on their maps.

Network Analysis and Concept
Mapping.Both Group and Novakian concept maps are highly compatible with Network Analysis methodologies, due to their inherent network structure as well as their emphasis on the importance of relationships between entities.McLinden [44] notes that, while the goals of concept mapping and Network Analysis differ, the underlying data structure is the same.Although there are many studies exploring Network Analysis on Group concept maps [29,44,68,71], there are far fewer exploring Network Analysis on Novakian concept maps.However, Network Analysis has the potential to quantitatively characterize individual Novakian concept map structure, which can inform the educator if the course goals are being met [60].
A common assumption underlying this claim is that the structure of a learner's concept map approximates the learner's level of understanding about a topic.Specifically, certain network structures are thought to be more indicative of expert-level knowledge, while others are more indicative of novice-level knowledge [38,59].Intuitively, those with more knowledge about a domain have a richer, more interconnected mental model than those with less knowledge about the domain.
In spite of the informative value of using Network Analysis with learner concept maps, only a small quantity of work has applied such techniques.In one such case, Siew et al. [59] analyze individual learner concept maps about Psychology using network metrics such as Average Shortest Path Length (ASPL) and Clustering Coefficient (CC), finding that these metrics were able to predict learner performance on quizzes.In another, Koponen and Nousiainen [38] create a collective network by merging 12 individual learner concept maps together, and use centrality metrics to identify key nodes in the network; however, they did not use the collective network to estimate individual learner performance.Schwendimann [57] uses a centrality metric to track changes in learner understanding of certain expert-determined "indicator concepts" via iterative concept mapping.The findings of Markham [43] show that maps about biological knowledge of mammals created by novices exhibited fewer hierarchical levels and fewer edges between concepts than maps created by experts.

Collaborative Concept
Mapping.The subfield of Computer-Supported Collaborative Concept Mapping (CSCCM) defines collaborative concept mapping as two or more individuals using an online system to collaboratively work together on constructing one or more concept maps as a tool to facilitate shared understanding and construct knowledge [24,27,37,42].Studies have shown the effectiveness of this approach in facilitating problem solving [22,64].
An alternative, less common approach to collaborative concept mapping involves learners working on concept maps individually, after which the individual maps are merged together to create a representation of the collective mental model of the participants [12,17,38].A slight variation on this approach allows learners to optionally "share" specific elements from their individual map to the collective representation [50].One benefit of this approach is that the collective representation is an aggregation of each individual's mental model on the topic.Thus, the contribution of each individual can be identified and analyzed.

Concept Mapping as Consensus
Forming.Concept mapping activities with multiple participants can be viewed as a consensus forming activity, in which the prior ideas of individual learners are built upon via group communication [28] and the final map reflects the agreement of multiple mental models [25].In some cases, a group consensus on concept map content and structure can be formed through social phenomena such as negotiation [41] or mediated via online learning tools [12].In other cases, consensus is formed via merging individual learner maps without direct social interaction between the participants.For instance, Koponen and Nousiainen [38] find that knowledge of physics concepts was highly dispersed among learners, but a collective network aggregating each individual learner's map nearly perfectly matched a similar map created by an expert.
In this paper, we hypothesize that learners whose maps align more closely with the group consensus will have better understanding of the course material, and therefore higher performance.In order to test this hypothesis, several novel centrality and consensus-based metrics are compared against traditionally-used metrics, based on their ability to evaluate individual learner performance.

Deployment of Assignments
Concept mapping assignments were deployed in two separate iterations of the same undergraduate course at a large US research university over the course of two academic quarters, enrolling a total of 679 participants.The course is a general education course about the intersection of sustainability and technology, and enrolls students from a wide variety of schools, majors, and academic standing within the university.Throughout both iterations, learners were asked to make concept maps by creating nodes (representing concepts) and linking them with edges (representing relationships).For example, learners might propose that "biodiversity increases sustainability."By doing this repeatedly, learners created networks of related concepts and their relationships.
For both of the quarters, a dataset was generated containing the statements in each individual learner concept map along with a score for that map.Scores were assigned on a per-statement basis, either by a member of the research team or a team of Teaching Assistants (TAs).Statements were awarded a score of 1 if they were correct and relevant to the course material, and a score of 0 if they were incorrect or irrelevant to the course material.The learner's final score was the percentage of statements they included in their concept map that received a score of 1.
In Spring '21, learners used the freely available software CmapTools [10] to construct their concept maps, and in Fall '21, learners used a custom concept mapping tool created by the research team.Figure 1 shows examples of two concept maps, one being constructed in CmapTools and the other in the custom tool.
The differences between the assignments across both quarters are summarized in Table 1. 1 Team of TA's scores all relationships after performing consensus building activity 3.1.1Spring '21.The first deployment of the concept mapping assignment occurred in the Spring '21 iteration of the course.To ensure consistency and avoid the use of synonyms between maps, the assignment restricted learners to use concepts from the list of all English Wikipedia article titles and relationships from a list of 20 provided by the instruction staff.Learners were also required to include the two central themes of the course, "sustainability" and "technology", as nodes in their maps.This quarter focused on learners improving their maps iteratively.In the first assignment, learners created an initial map, while in the second and third assignments, learners revised their maps based on course material introduced since the previous revision.The data analyzed for this study was taken from the third and final concept mapping submission.A member of the research team assigned scores to each statement, which were used to calculate the final per-learner scores.
In order to better focus the analysis on learners' original contributions, the two required nodes ("sustainability" and "technology") were removed from the collective network before the Network Analysis metrics were calculated.

Fall '21.
In the second deployment, learners were again required to use Wikipedia articles as concepts.However, as the instructor wanted to examine the ways that learners constructed causality networks, the only allowed relationship was "causes".Learners were not required to include any specific nodes in their submissions; thus, this quarter's assignment was slightly less restrictive in terms of possible concepts but much more restrictive in terms of permitted relationships.
In this quarter, there was no revise-and-resubmit process as in the previous quarter; the data analyzed is from the learners' first and only submissions.Learners submitted the assignments midway through the quarter.Each individual statement was assigned a score of 0 or 1 via a TA review process, in which the TAs first participated in a consensus forming activity to establish shared criteria and then were assigned an anonymized spreadsheet containing a portion of the learner statements to review and score.

Merging of Individual Concept Maps into a Collective Network
By merging individual learner concept maps together, one can form a network that exhibits the class's collective understanding of the material.However, in order to create a collective network, a merge strategy must first be selected.
Such a strategy must define how duplicate edges between individual maps will be handled (i.e., whether to weight an edge by the number of learners reporting it or represent all edges as unweighted).A weighted strategy places higher values on edges added by more students, while an unweighted strategy allows for the possibility that individual students made unique but valuable edges not included by other students.
A merge strategy must also define whether the directionality of the relationships will be preserved.For example, say that a learner argues that Climate Change -> causes -> Sea Level Rise.A directed merge strategy would preserve the fact that Climate Change "points" at Sea Level Rise.An undirected merge strategy would omit this information, and merely note that these two nodes were connected by at least one learner.
A directed merge strategy preserves the most information from the original concept map; however, an undirected merge strategy acknowledges that the semantics of the relationship label affect the directionality in an arbitrary way.For example, if the relationship label "causes" were to be replaced by "caused by", the directionality of every relationship using this label would be reversed, but this says nothing about the actual relationship between the two nodes, only the semantics.By using an undirected strategy, we remove the bias introduced by the semantics of the relationship label.
When creating the collective networks, only the presence or non-presence of an edge between two nodes was included; the specific label chosen by learners for each relationship was ignored.This was done in order to allow for more overlap between individual learner maps in the collective network, and because, from a network perspective, the structure of the connections learners made between concepts was more interesting than the specific semantics they used to make the connections.
For example, if one learner included the statement Climate Change -> synonym -> Global Warming, and another learner included the statement Climate Change -> causes -> Global Warming in their respective maps, the collective map would contain one of four possible edges based on the merge strategy selected: 1) Climate Change -> Global Warming with a weight of 2 for the directed/weighted strategy, 2) Climate Change -> Global Warming with no weight for the directed/unweighted strategy, 3) Climate Change -Global Warming with a weight of 2 for the undirected/weighted strategy, and 4) Climate Change -Global Warming with no weight for the undirected/unweighted strategy.Here, "->" implies a directed edge and "-" implies an undirected edge.Figure 2 shows the four types of merge strategies used in the analysis.Due to differences in assignment requirements and scoring between the two quarters, combining both quarters of data into a single collective network was problematic.Instead, individual collective networks were created for each quarter.

Analysis of Maps
Analysis of the learner concept maps is broken down into several categories: Categories annotated with a single asterisk (*) are traditional Network Analysis metrics.Categories annotated with a double asterisk (**) are novel metrics using the methodology of merging individual learner concept maps into a collective network.The NetworkX Python library [31] was used to carry out the analyses, aside from the qualitative metrics.Note that only Comprehensiveness (in the qualitative metrics category) considers the actual content of the concept map; all of the other metrics are based either on individual map structure (traditional metrics) or the positioning of individual learner maps within the collective network (novel metrics).

Qualitative Metrics.
Qualitative metrics (also referred to as holistic metrics) are one of the most common forms of concept map evaluation.Evaluation of such metrics is typically carried out by an instructor or expert looking at a learner's concept map and assessing its merit based on content and/or structure, but without performing any counting or computation.Besterfield-Sacre et al. [5] define a rubric for several qualitative concept map metrics, including Comprehensiveness (the degree to which a map's content covers the relevant material) and Organization (orderliness of the arrangement of nodes and edges).Additionally, Yin et al. [72] define five common structural templates (Structural Form) that can be used to classify the structures of learner concept maps: "linear", "circular", "hub-spoke", "tree", and "network".The "network" structural template is characterized by a web of interconnected concepts, and is considered more indicative of meaningful learning than the other structural templates [36].Structural form of concept map One of "linear", "circular", "hub-spoke", "tree", "network" For the qualitative metrics, one member of the research team reviewed every individual learner concept map from both academic quarters and assigned each a score for Comprehensiveness, Organization, and Structural Form.Table 2 describes the qualitative metrics used in this analysis.One issue with qualitative analysis of concept maps is that such analysis is subjective in nature.For instance, many maps share characteristics of two or more of the structural templates, leaving it up to the scorer to make a judgment call.

Individual concept map metrics.
The quantitative analysis of the individual learner concept maps includes both simple counting-based metrics such as number of concepts and number of relationships, as well as traditional network metrics for concept map analysis: Average Shortest Path Length (ASPL), Clustering Coefficient (CC), Network Density, and Complexity.ASPL and CC are metrics that can be used to identify Small-World Networks (SWNs).SWNs are characterized by highly clustered groups and short path lengths from any given node to another.These types of networks are representative of many types of real-world systems [70].Past research has hypothesized that learners who build concept maps that conform to the characteristics of SWNs have a more complete understanding of the course material.Specifically, Siew [61] found that learners with lower ASPL scores and higher CC scores performed better on quizzes.
Figure 3 shows the structure of the individual learner concept maps with the highest and lowest scores for the metrics ASPL, CC, and Density from the Fall '21 quarter.This figure demonstrates that such metrics can be used as a quantitative alternative to the typically qualitative task of characterizing concept map structure.For example, both high ASPL and low CC scores are associated with what would qualitatively be labeled as a "chain" structure, whereas low ASPL and high Density scores reflect more of a "network" structure in the qualitative coding.Another category of individual concept map metric is based on identifying hierarchy within a concept map.Besterfield-Sacre et al. [5] define a hierarchy as a shortest path from a root concept (a concept with no parents) to a leaf concept (a concept with no children).They define three hierarchybased metrics that are used for quantifying concept map structure: number of hierarchies in the map, length of the deepest hierarchy, and number of cross-links between hierarchies.A higher total number of hierarchies, deeper hierarchies, and higher number of cross-links are associated with more complex concept map structure.
Table 3 contains descriptions of each of the individual metrics.

Centrality metrics.
The centrality-based metrics measure whether learners are able to identify key nodes, as determined by their peers.Individual learner scores are calculated based on the average centrality of the nodes in the collective network that were part of their individual concept map.Three centrality metrics were calculated: Betweenness, Degree, and Closeness (described in Table 4).Figure 4 shows the positioning of two individual learner networks (red edges) within the collective network of all the learner concept maps from the Fall '21 quarter (blue edges).While one learner's individual concept map is positioned in the center of the collective network (left), the other learner's map passes through the center but mostly lies on the periphery of the collective network (right).We hypothesized that learners who made maps that are positioned more centrally within the collective network will have higher scores on the concept map assignment.

Consensus metrics.
While the centrality metrics measure learner identification of key nodes, this section explores four metrics that instead measure learner identification of the correct edge structure between nodes.Specifically, these metrics compare an individual learner contribution to the local region of the collective network pertaining to the same nodeset (i.e., the set of concepts included in the individual learner's concept map).Thus, they measure the "edge overlap" that an individual learner concept map shares with the maps of other learners.
These metrics are important because global centrality metrics that consider the entire collective network do not account for how a learner's choice of nodeset may limit their centrality scores.Consider a learner who makes a high quality concept map but leaves out a single key node that was referenced by the majority of the other learners.This would substantially reduce their centrality score, which is based on which nodes a learner included.The consensus-based metrics introduced in this section do not directly penalize a learner for omitting popular nodes.Instead, they assess the ways that other learners collectively connected nodes within the same local network region, allowing for learners who chose to include a more diverse nodeset to still potentially receive high scores.Weighted sum of all paths between pairs of nodes Higher number is more overlapping Table 5 shows descriptions of each of the four consensus metrics.

Edge Consensus
Edge Consensus assesses how well a learner conforms with the general agreement of the class based on whether pairs of nodes are connected via edges.Each edge in the concept map of an individual learner is assigned a score based on the number of other learners that also included that edge in their map.The final score is calculated from the summed score of all of the edges in their map, divided by the total number of edges they included.The simplicity of this metric makes it easy to calculate, even on large networks.
We hypothesized that Edge Consensus will positively correlate with individual learner map quality, as it rewards learners who share a common understanding of the course material with their classmates.While the metric does not directly reward learners for including popular nodes, it does reward learners for including popular edges, which will indirectly lead to learners who included popular nodes receiving higher Edge Consensus scores.

Subgraph Coverage
Calculating Subgraph Coverage involves creating a subgraph of the collective network defined as the set of edges added by all of the learners between the nodes referenced in an individual learner's map.The number of edges in the individual map is then compared with the number of total edges in the subgraph.The intuition behind this metric is to evaluate the edge coverage that an individual learner has made over their selected nodeset compared to the rest of the class.
It is hypothesized that higher Subgraph Coverage will be associated with higher individual concept map quality.In theory, learners who were able to cover more of their selected subgraph will be demonstrating a more thorough understanding of the concepts in the subgraph than learners who covered less of the subgraph.

Collective Shortest Path
The Collective Shortest Path metric assesses the ways that other learners collectively represented the relationships present in each individual learner's concept map.For each individual concept map, a modified collective network is created based on all of the other learners' concept maps but excluding the current learner's map.Then, for each edge connecting a pair of nodes in the learner's map, the shortest path between this pair of nodes in the modified collective network is calculated.If another learner added the same edge, then the shortest path is 1; if a path does not exist, then the longest shortest path in the collective network is assigned as the score for the edge in question.
The Collective Shortest Path metric rewards learners who included edges also included by other learners, which is also the case in the Edge Consensus metric.However, Collective Shortest Path also rewards learners who made unique connections between nodes that are in adjacent regions of the collective network, even if no other learner directly connected those nodes.Conversely, it punishes learners that connected nodes that would otherwise have been far away from each other in the collective network.
The guiding principle behind this metric is that learners may include unique edges that are still of high quality; in fact, unique edges may actually provide an important bridge between concepts that would otherwise not have been connected.However, we argue that unique edges of high quality are more likely to occur between two nodes that are already close together, whereas unique edges of low quality may bridge two nodes that otherwise would have been far apart.Following from this, we hypothesized that learners who receive a lower Collective Shortest Path score will have higher quality concept maps.

Communicability
Unlike the other three metrics in this section, Communicability is borrowed from previous literature [23]; however, its application to assigning individual learner scores based on their contributions to a collective network is novel.It uses the same modified collective network as Collective Shortest Path; however, in contrast to Collective Shortest Path, which only considers the shortest alternative path between each pair of nodes, Communicability applies a weighted sum of the lengths of all alternative paths in the collective network between every pair of nodes in each individual student network, such that longer paths are weighted less heavily than shorter paths.It rewards learners that connected nodes that were reachable by a wide variety of other paths, indicating that such learners made connections in important parts of the collective network.Therefore, we hypothesized that a higher score will lead to higher quality concept maps.Figure 5 is a visualization showing a simple calculation of each of the three consensus metrics introduced in this paper (excluding Communicability, which is detailed in [23]).

Summary Statistics
Summary statistics for the collective networks are shown in Table 6.Learners in the Spring '21 quarter made larger networks than in the Fall '21 quarter, in terms of both nodes and edges.This is likely due to differences in the assignment instructions; learners had one edge label to choose from in the fall, vs. 20 in the spring.However, Network Density and Average Node Degree are similar between the networks, indicating that the structure of the two collective networks is similar.The average learner score differs greatly between quarters due to different scoring methodologies; the TA grading team assigned much higher learner map scores than the research team member, leading to a much higher average score for the Fall '21 quarter than the Spring '21 quarter.
Table 7 shows summary statistics for the qualitative metrics.Learner maps in the Spring '21 quarter scored higher for both Comprehensiveness and Organization than maps for the Fall '21 quarter.The Spring '21 quarter data also contained far more learner maps coded as a "network" structure than Fall '21 (88.5% of all maps vs 27.9% of all maps).Summary statistics for quantitative metrics are shown in Table 8.The mean, standard deviation, and range of each metric output are shown.

Correlation Results
To evaluate the performance of each metric, the statistical correlation between each network metric and the concept map score is calculated on a learner-by-learner basis.The Pearson correlation coefficient was used, except in the cases of the qualitative network structure metric, which is a categorical variable.To handle the statistical challenge of regression over a categorical and a continuous variable, this metric was broken out into five separate Boolean-valued metrics representing each of the five structural templates (i.e., "tree", "chain", etc.), and the point-biserial correlation was used.
In the tables below, each cell contains the Pearson or point-biserial correlation coefficient representing the correlation between the network metric and the learner concept map scores, as well as the p-value of the correlation.Due to the large number of statistical tests run in this analysis, we apply Sidak-Holm adjustment to account for the Family-Wise Error Rate [30].Based on this result, we consider p < .0006 to be a statistically significant result.Such results are shown in bold.Results meeting the standard threshold of statistical significance of p < .05are marked with an asterisk (*).

Traditional Metrics from the Literature.
Qualitative Metrics: The qualitative results in Table 9 show that, in spite of their frequent use in past literature [34], the qualitative metrics are not highly correlated with learner performance on the concept mapping assignment, with the exception of Comprehensiveness for the Fall '21 quarter, which addresses the scope of the content contained within the map rather than the structure of the map.Individual, Non-Hierarchy-based Metrics: The results of the quantitative metrics in Table 10 show that the individual, non-hierarchy-based metrics correlated with learner performance for the Spring '21 quarter but not for the Fall '21 quarter.Unexpectedly, CC and Density both correlated negatively with learner performance, where positive correlations were hypothesized in both cases.This can be explained by the observation that, for the dataset in question, CC and Density are both positively correlated with each other (p < .0001)and both are negatively correlated with number of concepts (p < .0001),while number of concepts was strongly positively correlated with learner performance.The general thinking behind using CC and Density to analyze concept maps is that they provide an easy-to-compute method to characterize the complexity of the network; however, a confounding factor is that a map with fewer nodes can lead to higher CC and Density scores, but also be indicative of lower learner effort.Individual, Hierarchy-based Metrics: The individual, hierarchy-based results in Table 11 show that, while some of these metrics correlate with learner performance, none of the results are reproduced across quarters.The number of hierarchical levels was a useful predictor of performance in the Fall '21 quarter, and the number of hierarchies was highly correlated with performance for Spring '21.In general, the traditional metrics from the literature do not excel at predicting learner performance across the two datasets used in this study.While there are some isolated statistically significant correlations, none of the results for any of the traditional metric categories were reproduced across both quarters of data.These results motivate the investigation of metrics that move beyond assessing individual learner conceptions in isolation to assessing them in relation to the conceptions of their peers.

Metrics based on Collective Network.
For the metrics in this subsection, there are four rows for each academic quarter, representing the results of the various merge strategies: unweighted/directed (UD), weighted/directed (WD), unweighted/undirected (UU), and weighted/undirected (WU).
Centrality Metrics: Table 12 shows the results of the centrality metrics.In general, the centrality metrics produced stronger correlations for the Fall '21 quarter, where the majority of the results were statistically significant.Consensus Metrics: Table 13 shows the results of the consensus metrics.These metrics produced a large number of statistically significant correlations with learner performance, including many significant results.As hypothesized, Edge Consensus and Communicability produced positive correlations with student score across both datasets, while Collective Shortest Path produced negative correlations.Interestingly, Subgraph Coverage produced strong positive correlations, despite negative correlations being hypothesized.In this metric, a lower score indicates higher coverage, and higher coverage implies more overlap with other learners.This means that concept maps with higher coverage contained lower quality content, an unexpected result.
However, an important downside of the Subgraph Coverage metric is that it has the potential to penalize learners that included popular nodes.Popular nodes are more likely to have many edges between them, so it is more difficult for an individual learner to cover all of these edges.Conversely, in the extreme case where a learner creates a concept map with a nodeset completely distinct from the nodes included by all other learners, this learner would achieve a perfect coverage score.This introduces a confounding factor in the Subgraph Coverage metric, indicating that the metric needs to be refined before it can be considered useful.On the dataset used in this paper, most of the novel metrics were better indicators of student performance than the traditional metrics.The results from the novel metrics were also more consistent across both academic quarters than the results from the traditional metrics.The consensus-based metrics were overall the best predictors of student map quality, although the centrality metrics were also relatively useful predictors.In contrast, the qualitative metrics and individual concept map metrics were less reliable predictors.
The simplicity of the qualitative and individual map metrics may partially explain their past use for concept map analysis tasks; however, these metrics consider each student's concept map individually and do not consider the interplay between an individual student's conceptions and the course's collective conceptions of the material.Creating a collective network of merged student concept maps enables a variety of novel metrics that treat the entire course's collective understanding as a proxy for expert understanding, providing a knowledge base against which individual student contributions can be compared.
An additional distinction between the traditional and novel metrics is that the traditional metrics typically make the assumption that a learner's performance is tied to the structural shape of their concept map, irrespective of the selected topics.In contrast, the novel metrics assign a score based on the positioning within the collective network of the concepts and relationships that a student included, without considering the structure of the individual student map.The results from the novel metrics indicate that a learner's correct selection of key nodes (centrality) as well as correct selection of specific edges between them (consensus) should be considered valuable indicators of learner performance.
While the novel metrics in this paper introduce a new framework for concept map evaluation, they are not perfect indicators of student performance and are not intended to replace traditional metrics.Past research has shown that many of the traditional metrics also provide predictive value.Particularly, it is widely agreed that concept maps demonstrating expert-level understanding more commonly assume a "network"-like structure with high interconnectivity between concepts, which can be assessed via qualitative evaluation or traditional network metrics such as ASPL.This type of evaluation is not enabled by the novel metrics, as these metrics only assess individual structure of the map in terms of its relation to the collective network.
Rather than replacing traditional metrics, the novel metrics introduced in this paper should be viewed as viable accompaniments to previous methodology.In many cases, the best results may be obtained from an ensemble of different metrics.As our own results show, the efficacy of various metrics depends largely on the data itself; however, practically no studies have tried to apply these metrics to a diverse set of concept map data from various educational contexts.Such a study is needed to better determine the generalizability of these metrics.

Future Work
This paper is in part a response to calls from the CSCL community to explore more diverse forms of network actors, relational ties, and metrics within the Network Analysis space.This call is answered via a Network Analysis of learner concept map data, a nascent but intriguing area of investigation.While initial results are promising, this work can be improved or extended in a few key areas.
For one thing, both of the datasets used for this study had different characteristics, largely due to the instructor varying the assignment parameters between quarters.These differences caused changes in the effectiveness of the metrics, especially the individual concept map metrics.This observation points to the importance of aligning one's metrics with respect to particulars of the concept mapping assignment parameters.As concept mapping assignments vary widely, further research is needed to pinpoint the most appropriate set of metrics for each variation or category of assignment.
An open question about this research is how well the findings of this paper can translate to other classroom settings.For instance, courses about technical subjects such as programming or math contain more abstract concepts than the course about sustainability used in this study; it is unclear how well the metrics introduced in this paper could aid instructors in predicting the understanding of learners in such courses.It will be important to collect data across various domains and levels of education in order to properly address concerns about the scalability of such metrics beyond the single course used in this analysis.
Our findings indicate that the novel metrics introduced in this paper have potential as predictive indicators of learner understanding of course material.Such information could be useful to instructors in a variety of different ways.For instance, instructors could use the results of these metrics in order to identify potentially struggling learners without having to assess each learner's concept map qualitatively.Additionally, characterizing what learners know at a given moment in time could help instructors to assemble project or study groups based on overlaps (or non-overlaps) in knowledge or interest.
There is also potential to use the collective network of merged learner concept maps as a pedagogical tool.For example, learners could research key nodes and/or edges that they overlooked but that other learners included, and write up a description of the nature of the missed content.Another possibility is to allow learners to browse an online visualization of the collective network to promote open-ended reflection.This experience allows learners to reflect on how their individual conceptual understanding connects to those of their peers.Having learners actively participate in the process of network construction, with each learner or group of learners taking responsibility for one piece of the final network, is in line with past CSCW research that emphasizes the human role in knowledge management [1], and can also be viewed as a form of Collaborative Sensemaking [67].While the concept mapping assignments themselves are performed individually, by collecting mental models from learners in a standardized format, these assignments enable downstream collaboration and interplay with existing resources [26].
Finally, the current work does not address potential issues introduced by varying levels of concept specificity, a generally challenging task in concept mapping.Future steps to address this issue may take the form of enhancing network visualizations to emphasize chains of intermediary nodes between two broad concepts, or using external resources such as Wikipedia to suggest potential alternatives for concepts that are too broad to be informative in a given context.

CONCLUSION
Network Analysis is shown to be a promising technique for analyzing the concepts and relationships in collaboratively-constructed educational networks.Typically, these networks have been composed of learners and their interactions, and it has been shown that learners occupying central positions within these networks tend to have better learning outcomes.This paper answers the call for analyses of more diverse types of networks within the CSCL space, and at the same time contributes to the relatively unexplored area of Network Analysis over Novakian concept map data.In particular, this paper introduces the novel methodology of merging learner concept maps into a collective network and using this collective network to calculate centrality and consensus-based metrics for individual learners.When evaluated on two academic quarters of concept mapping data, these novel metrics are shown to be more significantly correlated with learner performance and more reproducible across datasets than the metrics traditionally used to evaluate concept maps in the literature.
While it is clear that network metrics can, in some sense, predict learner understanding of the material, it is also clear that the complexity of concept map-based networks will require more work to understand which metrics (or combinations of metrics) are the best predictors of learner conceptual understanding in a given context.Moving forward, it will be important to perform such studies across larger and more diverse datasets, in order to further evaluate the suitability of these metrics, and Network Analysis as a whole, for predicting learner understanding.

ACKNOWLEDGMENTS
This work was supported by the National Science Foudation award number 2121572.

Fig. 1 .
Fig. 1.The upper screenshot shows part of one learner's concept map submission from the Spring '21 quarter in the CmapTools interface.The lower screenshot shows an example concept map created by a research team member using the custom concept mapping interface used for the Fall '21 quarter.

Fig. 2 .
Fig. 2. Two example learner concept maps merged into a collective network using each of the 4 merge strategies used to conduct the analysis.The networks with blue (top left) and red (bottom left) nodes represent each of the two learner maps, and the networks with purple nodes (four most rightward) represent the 4 possible collective networks.

Fig. 3 .
Fig. 3.The learner networks with the highest and lowest scores for three of the individual network metrics from the Fall '21 quarter, showing a diversity of network structures in individual learner submissions

Fig. 4 .
Fig. 4. Two individual learner concept maps from Fall 2021 layered in red over the course-wide collective network in blue.The learner map on the left exhibits a relatively high Node Betweenness score, whereas the map on the right exhibits a relatively low Node Betweenness score.

Fig. 5 .
Fig. 5.An example individual learner map (left, blue), alongside a collective network that includes the individual learner map (second from left, red), and the calculation of three novel consensus metrics (three most rightward, purple).In the three purple graphs showing the calculations, red arrows indicate edges in the individual learner map, black arrows indicate edges in the collective network not present in the individual learner map, and the dotted red arrow indicates an edge that is present in the individual learner map that was not made by any other learner.These examples use the weighted/directed merge strategy.For Edge Consensus, the weights of the relevant edges in the collective network are summed.For Subgraph Coverage, the individual learner map contains 3 of the 7 edges (highlighted in red) that the collective network contains between the nodeset referenced in the individual learner map.For Collective Shortest Path, 2 of the 3 edges in the individual learner map were referenced by at least one other learner, leading to scores of 1 for those edges.The final edge is assigned a score of 2 as the algorithm found a shortest path from A->D through the node C.

Table 1 .
Quarter-by-Quarter Concept Map Assignment Details

Table 2 .
Summary Descriptions of Qualitative Metrics

Table 3 .
Summary Descriptions of Individual Network Metrics

Table 4 .
Summary Descriptions of Node Centrality Network Metrics

Table 5 .
Summary Descriptions of the Consensus Network Metrics

Table 6 .
Summary Statistics for Quarter-by-Quarter Collective Networks.Here, the "Number of Statements" column is the total number of statements in all of the individual learner submissions, whereas the two "Number of Edges" columns show the number of edges remaining in the collective network once directed or undirected merge strategies have been applied.

Table 7 .
Summary Statistics for Qualitative Metrics

Table 8 .
Summary Statistics for Quantitative Metrics.The Mean, Standard Deviation (SD), and Range (RNG) of all metric outputs are shown, broken down by quarter.The Mean is represented as the first number in each cell, while the SD and RNG are shown in parenthesis.

Table 10 .
Individual, Non-Hierarchy-based Metrics Results.Note that Number of Concepts is not listed for Fall '21 because learners were required to include a fixed number of nodes for this quarter.

Table 13 .
Consensus Metrics Results.Note that only weighted merge strategies were applied to Edge Consensus, as this metric can only function in the presence of an edge weight.Also note that only the unweighted, undirected strategy was applied to Communicability, due to the NetworkX implementation of this metric.This paper presents several novel Network Analysis metrics for analyzing learner concept maps based on merging individual maps into a collective network.Correlations are computed between centrality and consensus-based metrics derived from this collective network as well as from several categories of metrics traditionally used in the concept mapping literature.