Context

This website provides some materials related to a study investigating the development of instrumental orchestration over time and the way in which it is used and adapted in mathematics education research practices. This study is based on a literature review including bibliometric clustering techniques whose results were interpreted by the experience perspective of two experts in the field.

If you want more information about this study, you may refer to our paper: Drijvers P., Grauwin S., Trouche L. (2020). When bibliometrics met mathematics education research: the case of instrumental orchestration. [Link to be provided later]

Methodology

Corpus selection

The publications metadata were extracted from Scopus on August 28, 2019. Our goal being to map the scientific fields where Instrumental Orchestration may have had an impact, we built a succession of corpora by iteratively including the publications we could find in Scopus citing (at least) one of the previously selected publication.

Data cleaning

A semi-automatic algorithm was used to detect different variants of a same reference: author names with one or two initials, references sources variant , errors in the volumes, etc. In total, about 500 variant names of 150 sources were cleaned. For example:

This cleaning processes is performed in a semi-automatic way. It is NOT exhaustive (and does not pretend to be), since it is based on the detection of frequently observed patterns (replacing "and" by "&" or "Jl." by "Journal" in journal names, look for source differing only by one characters, etc), and can always be improved upon.

Bibliographic Coupling Networks & Clusters

Cluster detection on a BC network of ∼ 600 publications.

Network construction: Bibliographic Coupling (BC) is based on the degree of overlap between the references of each pair of publications. Specifically, BC is performed by computing Kessler' similarity between publications: ωij=Rij/√(RiRj), where Rij is the number of shared references between publications i and j and Ri is the number of references of publication i. If two publications do not share any reference, they are not linked; if they have identical references, the strength of their connexion is maximal. On Fig. a, each node represents a publication, and the thickness of a link is proportional to the similarity between two publications. On this figure and the next, the layouts are determined by a force-based spatialisation algorithm (ensuring that strongly linked nodes are closer to each other).

Cluster detection: a community detection algorithm based on modularity optimization (we use an implementation of the Louvain algorithm) is applied to partition the publications into clusters. Basically, the algorithm groups publications belonging to the same "dense" - in terms of links - region of the BC network, cf Fig. b. The quality of the cluster partitioning can be quantified by the modularity Q, a measure comprised between -1 and 1. The higher it is, the more meaningful the partitioning.

Cluster representation: publications belonging to the same cluster are gathered into a single node, or circle, whose size is proportional to the number of publications it contains, cf Fig. c. A standard frequency analysis is then performed to characterise each cluster with its more frequent / significant items (keywords, references, authors, etc), which can then be used as automatic labels.

Hierarchical clustering: the exact same methodology can be applied to the subsets of publications belonging to each detected cluster to split them into sub-clusters.

What is the goal of BC analysis? Assuming that publications sharing (more) references are thematically close(r), the heuristic of BC clustering is to partition a corpus of publications into groups corresponding to scientific topics.
What are the advantages of BC analysis? Compared to what happens in co-citation analysis (the other main bibliographic technique, linking publications that are cited together in other publications), the membership of a given publication in this or that cluster is immediate: it is determined by the references used by the authors and does not depend on how the publication will be cited later. In that respect, BC is - among other things - a relevant technique to detect emerging communities.

Visualisation tools

Interactive list of publications

Filter Corpus
Sort by

The Corpus Description dashboard

The corpus description dashboard allows you to explore the frequency distributions of different metadata within the studied corpora. The dashboard is separated in three parts: controls, list view, graph view. You may choose which field you want to explore (keywords, reference, journals, authors, etc) and which type of graph you want to display. Be aware that there is a difference between the "Authors' Keywords" - given by the publications' authors, and the "Keywords", which are build by Scopus based on an analysis on the authors' keywords, the title and the abstract. In most cases, Scopus's keywords are more consistent/normalized/exhaustive. You should also be careful not to lean too much on the keywords to buit your representation of the nature of a corpus since keywords are not available or existing for all the publications. For example, in the "layer 1" corpus, there are no (Scopus) keywords in the metadata of about 81% of the publications, and no authors' keywords in the metadata of about 30% of them.

Thematic maps

The Thematic Maps tool to visualize & explore the results of the BC analysis performed on each corpus. Note that it is usual to consider that the partitions yield by the community detection algorithm are meaningful for values of modularity above Q=0.3-0.4n which is only the case for the "Layer 2" and "Layer 3" corpora. You may refer to our paper for more details on the interpretation of these results, but the quick interpretation is that

Team

Paul DrijversLead researcher, Utrecht University
Sébastian GrauwinData scientist, Université de Lyon, ENS de Lyon
Luc TroucheLead researcher, Université de Lyon, ENS de Lyon