How to use biblionetwork

Aurélien Goutsmedt

This vignette introduces you to the different functions of the package with the data integrated in the package.

The basic coupling angle (or cosine) function

The biblio_coupling() function is the most general function of the package. This function takes as an input a direct citation data frame (entities, like articles, authors or institutions, citing references) and produces an edge list for bibliographic coupling network, with the number of references that different articles share together, as well as the coupling angle value of edges (Sen and Gan 1983). This is a standard way to build bibliographic coupling network using Salton’s cosine measure: it divides the number of references that two articles share by the square root of the product of both articles bibliography lengths. It avoids giving too much importance to articles with a large bibliography. It looks like:

\[ \frac{R(A) \bullet R(B)}{\sqrt{L(A).L(B)}} \]

with \(R(A)\) and \(R(B)\) the references of documents A and B, \(R(A) \bullet R(B)\) being the number of shared references by A and B, and \(L(A)\) and \(L(B)\) the length of the bibliographies of documents A and B.

The output is an edge list linking nodes together (see the from and to columns) with a weight for each edge being the coupling angle measure. If normalized_weight_only is set to be FALSE, another column displays the number of references shared by the two nodes.

This example use the Ref_stagflation data frame.

library(biblionetwork)
biblio_coupling(Ref_stagflation, 
                source = "Citing_ItemID_Ref", 
                ref = "ItemID_Ref", 
                normalized_weight_only = FALSE, 
                weight_threshold = 1)
#>             from         to     weight nb_shared_references     Source
#>    1:     214927    2207578 0.14605935                    4     214927
#>    2:     214927    5982867 0.04082483                    1     214927
#>    3:     214927    8456979 0.09733285                    3     214927
#>    4:     214927   10729971 0.29848100                    7     214927
#>    5:     214927   16008556 0.04714045                    1     214927
#>   ---                                                                 
#> 2712: 1111111161 1111111172 0.03434014                    1 1111111161
#> 2713: 1111111161 1111111180 0.02003610                    1 1111111161
#> 2714: 1111111161 1111111183 0.04050542                    2 1111111161
#> 2715: 1111111172 1111111180 0.03646625                    1 1111111172
#> 2716: 1111111182 1111111183 0.27060404                    8 1111111182
#>           Target
#>    1:    2207578
#>    2:    5982867
#>    3:    8456979
#>    4:   10729971
#>    5:   16008556
#>   ---           
#> 2712: 1111111172
#> 2713: 1111111180
#> 2714: 1111111183
#> 2715: 1111111180
#> 2716: 1111111183

This function is a relatively general function that can also be used:

  1. for co-citation, just by inverting the sourceand ref columns, but rather use the [biblio_cocitation()];
  2. for title co-occurence networks (taking care of the length of the title thanks to the coupling angle measure);
  3. for co-authorship networks (taking care of the number of co-authors an author has collaborated with on a period), but rather use the [coauth_network()].

The function just keeps the edges that have a non-normalized weight superior to the weight_threshold. In a large bibliographic coupling network, you can consider for instance that sharing only one reference is not sufficient/significant for two articles to be linked together. This parameter could also be modified to avoid creating intractable networks with too many edges.

biblio_coupling(Ref_stagflation, 
                source = "Citing_ItemID_Ref", 
                ref = "ItemID_Ref", 
                weight_threshold = 3)
#>            from         to     weight     Source     Target
#>   1:     214927    2207578 0.14605935     214927    2207578
#>   2:     214927    8456979 0.09733285     214927    8456979
#>   3:     214927   10729971 0.29848100     214927   10729971
#>   4:     214927   19627977 0.11202241     214927   19627977
#>   5:    1021902   12824456 0.06537205    1021902   12824456
#>  ---                                                       
#> 958: 1111111147 1111111156 0.17325923 1111111147 1111111156
#> 959: 1111111147 1111111161 0.13333938 1111111147 1111111161
#> 960: 1111111156 1111111161 0.08580846 1111111156 1111111161
#> 961: 1111111159 1111111171 0.24333213 1111111159 1111111171
#> 962: 1111111182 1111111183 0.27060404 1111111182 1111111183

As explained above, you can use the biblio_coupling() function for creating a co-citation network, you just have to put the references in the source column (they will be the nodes of your network) and the citing articles in ref. As it is likely to create some confusion, the package also integrates a biblio_cocitation() function, which has a similar structure to biblio_coupling(), but which is explicitly for co-citation: citing articles stay in source and references stay in ref. You can see in the next example that they produce the same results:

biblio_coupling(Ref_stagflation, 
                source = "ItemID_Ref", 
                ref = "Citing_ItemID_Ref")
#>              from         to    weight     Source     Target
#>     1:      49248     180162 1.0000000      49248     180162
#>     2:      49248     804988 0.3162278      49248     804988
#>     3:      49248    1999903 1.0000000      49248    1999903
#>     4:      49248    2031010 1.0000000      49248    2031010
#>     5:      49248    3580645 0.7071068      49248    3580645
#>    ---                                                      
#> 87664: 1111112223 1111112225 1.0000000 1111112223 1111112225
#> 87665: 1111112223 1111112227 1.0000000 1111112223 1111112227
#> 87666: 1111112224 1111112225 1.0000000 1111112224 1111112225
#> 87667: 1111112224 1111112227 1.0000000 1111112224 1111112227
#> 87668: 1111112225 1111112227 1.0000000 1111112225 1111112227

biblio_cocitation(Ref_stagflation, 
                  source = "Citing_ItemID_Ref", 
                  ref = "ItemID_Ref")
#>              from         to    weight     Source     Target
#>     1:      49248     180162 1.0000000      49248     180162
#>     2:      49248     804988 0.3162278      49248     804988
#>     3:      49248    1999903 1.0000000      49248    1999903
#>     4:      49248    2031010 1.0000000      49248    2031010
#>     5:      49248    3580645 0.7071068      49248    3580645
#>    ---                                                      
#> 87664: 1111112223 1111112225 1.0000000 1111112223 1111112225
#> 87665: 1111112223 1111112227 1.0000000 1111112223 1111112227
#> 87666: 1111112224 1111112225 1.0000000 1111112224 1111112225
#> 87667: 1111112224 1111112227 1.0000000 1111112224 1111112227
#> 87668: 1111112225 1111112227 1.0000000 1111112225 1111112227

Testing another method: the coupling_strength() function

This coupling_strength() calculates the coupling strength measure Shen et al. (2019) from a direct citation data frame. It is a refinement of biblio_coupling(): it takes into account the frequency with which a reference shared by two articles has been cited in the whole corpus. In other words, the most cited references are less important in the links between two articles, than references that have been rarely cited. To a certain extent, it is similar to the tf-idf measure. It looks like:

\[ \frac{1}{L(A)}.\frac{1}{L(A)}\sum_{j}(log({\frac{N}{freq(R_{j})}})) \]

with \(N\) the number of articles in the whole dataset and \(freq(R_{j})\) the number of time the reference j (which is shared by documents A and B) is cited in the whole corpus.

coupling_strength(Ref_stagflation, 
                  source = "Citing_ItemID_Ref", 
                  ref = "ItemID_Ref", 
                  weight_threshold = 1)
#>             from         to      weight     Source     Target
#>    1:     214927    2207578 0.019691698     214927    2207578
#>    2:     214927    5982867 0.005331122     214927    5982867
#>    3:     214927    8456979 0.011752248     214927    8456979
#>    4:     214927   10729971 0.046511251     214927   10729971
#>    5:     214927   16008556 0.008648490     214927   16008556
#>   ---                                                        
#> 2712: 1111111161 1111111172 0.005067554 1111111161 1111111172
#> 2713: 1111111161 1111111180 0.001168603 1111111161 1111111180
#> 2714: 1111111161 1111111183 0.002580798 1111111161 1111111183
#> 2715: 1111111172 1111111180 0.003870999 1111111172 1111111180
#> 2716: 1111111182 1111111183 0.037748271 1111111182 1111111183

Aggregating at the “entity” level

Rather than focusing on documents, you can want to study the relationships between authors, institutions/affiliations or journals. The coupling_entity() function allows you to do that. Coupling links are calculated using the coupling angle measure (like biblio_coupling()) or the coupling strength measure (like coupling_strength()). Coupling links are calculated depending of the number of references two authors share, taking into account the minimum number of times two authors are citing each reference. For instance, if two entities share a reference in common, the first one citing it twice (in other words, citing it in two different articles), the second one three times, the function takes two as the minimum value. In addition to the features of the coupling strength measure or the coupling angle measure, it means that, if two entities share two references in common, the fact that the first reference is cited at least four times by the two entities, whereas the second reference is cited at least only once, the first reference contributes more to the edge weight than the second reference. This use of minimum shared reference for entities coupling comes from Zhao and Strotmann (2008). With the coupling strength measure, it looks like:

\[ \frac{1}{L(A)}.\frac{1}{L(A)}\sum_{j} Min(C_{Aj},C_{Bj}).(log({\frac{N}{freq(R_{j})}})) \]

with \(C_{Aj}\) and \(C_{Bj}\) the number of time documents A and B cite the reference \(j\).

This example use the Ref_stagflation and the Authors_stagflation data frames.

# merging the references data with the citing author information in Nodes_stagflation
entity_citations <- merge(Ref_stagflation, 
                          Authors_stagflation, 
                          by.x = "Citing_ItemID_Ref", 
                          by.y = "ItemID_Ref",
                          allow.cartesian = TRUE) 
# allow.cartesian is needed as we have several authors per article, thus the merge results 
# is longer than the longer merged data frame

coupling_entity(entity_citations, 
                source = "Citing_ItemID_Ref", 
                ref = "ItemID_Ref", 
                entity = "Author.y", 
                method = "coupling_angle")
#>             from           to      weight     Source       Target
#>    1: ALBANESI-S      CHARI-V 0.032897585 ALBANESI-S      CHARI-V
#>    2: ALBANESI-S CHRISTIANO-L 0.025302270 ALBANESI-S CHRISTIANO-L
#>    3: ALBANESI-S       BALL-L 0.024296477 ALBANESI-S       BALL-L
#>    4: ALBANESI-S     MANKIW-G 0.038924947 ALBANESI-S     MANKIW-G
#>    5: ALBANESI-S  ROTEMBERG-J 0.030457245 ALBANESI-S  ROTEMBERG-J
#>   ---                                                            
#> 3461: WILLIAMS-J      YOUNG-W 0.008684168 WILLIAMS-J      YOUNG-W
#> 3462: WILLIAMS-J   WILLIAMS-N 0.014002801 WILLIAMS-J   WILLIAMS-N
#> 3463: WILLIAMS-J        ZHA-T 0.014002801 WILLIAMS-J        ZHA-T
#> 3464: WILLIAMS-N        ZHA-T 0.040000000 WILLIAMS-N        ZHA-T
#> 3465: WOODFORD-M      YOUNG-W 0.020672456 WOODFORD-M      YOUNG-W
#>       Weighting_method
#>    1:   coupling_angle
#>    2:   coupling_angle
#>    3:   coupling_angle
#>    4:   coupling_angle
#>    5:   coupling_angle
#>   ---                 
#> 3461:   coupling_angle
#> 3462:   coupling_angle
#> 3463:   coupling_angle
#> 3464:   coupling_angle
#> 3465:   coupling_angle

A different world: building co-authorship network

Even if the weights of co-authorship links can be calculated using the biblio_coupling() function with authors as source and articles as ref, the method used is not necessarily the most appropriate for co-authorship networks. The coauth_network() function implements different types of methods for calculating the weights linking different authors:1

  1. a “full counting” method;
  2. a “fractional counting” method (see Perianes-Rodriguez, Waltman, and Van Eck 2016 for an interesting comparison between full counting and fractional counting results);
  3. a “fractional counting refined” method, inspired by Leydesdorff and Park (2017).

In addition, it is possible to take into account the total number of collaborations of two linked authors, by fixing cosine_normalized to True.

This example use the Authors_stagflation.rda file.


full_counting <- coauth_network(Authors_stagflation, 
                                authors = "Author", 
                                articles = "ItemID_Ref", 
                                method = "full_counting")
head(full_counting[order(Source)],10)
#>              from           to weight        Source       Target
#>  1:       CHARI-V   ALBANESI-S      1       CHARI-V   ALBANESI-S
#>  2:  CHRISTIANO-L      CHARI-V      2  CHRISTIANO-L      CHARI-V
#>  3:  CHRISTIANO-L   ALBANESI-S      1  CHRISTIANO-L   ALBANESI-S
#>  4:   CUKIERMAN-A    BRUNNER-K      1   CUKIERMAN-A    BRUNNER-K
#>  5:  EICHENBAUM-M      CHARI-V      1  EICHENBAUM-M      CHARI-V
#>  6:  EICHENBAUM-M CHRISTIANO-L      1  EICHENBAUM-M CHRISTIANO-L
#>  7: EICHENGREEN-B      BORDO-M      1 EICHENGREEN-B      BORDO-M
#>  8:      EUSEPI-S    BULLARD-J      1      EUSEPI-S    BULLARD-J
#>  9:      FARMER-R      BEYER-A      1      FARMER-R      BEYER-A
#> 10:  FITZGERALD-T CHRISTIANO-L      1  FITZGERALD-T CHRISTIANO-L

fractional_counting <- coauth_network(Authors_stagflation, 
                                      authors = "Author", 
                                      articles = "ItemID_Ref", 
                                      method = "fractional_counting")
head(fractional_counting[order(Source)],10)
#>              from           to weight        Source       Target
#>  1:       CHARI-V   ALBANESI-S    0.5       CHARI-V   ALBANESI-S
#>  2:  CHRISTIANO-L      CHARI-V    1.0  CHRISTIANO-L      CHARI-V
#>  3:  CHRISTIANO-L   ALBANESI-S    0.5  CHRISTIANO-L   ALBANESI-S
#>  4:   CUKIERMAN-A    BRUNNER-K    0.5   CUKIERMAN-A    BRUNNER-K
#>  5:  EICHENBAUM-M      CHARI-V    0.5  EICHENBAUM-M      CHARI-V
#>  6:  EICHENBAUM-M CHRISTIANO-L    0.5  EICHENBAUM-M CHRISTIANO-L
#>  7: EICHENGREEN-B      BORDO-M    1.0 EICHENGREEN-B      BORDO-M
#>  8:      EUSEPI-S    BULLARD-J    1.0      EUSEPI-S    BULLARD-J
#>  9:      FARMER-R      BEYER-A    1.0      FARMER-R      BEYER-A
#> 10:  FITZGERALD-T CHRISTIANO-L    1.0  FITZGERALD-T CHRISTIANO-L

fractional_counting_cosine <- coauth_network(Authors_stagflation,
                                             authors = "Author", 
                                             articles = "ItemID_Ref", 
                                             method = "fractional_counting", 
                                             cosine_normalized = TRUE)
head(fractional_counting_cosine[order(Source)],10)
#>              from           to    weight        Source       Target
#>  1:       CHARI-V   ALBANESI-S 0.3535534       CHARI-V   ALBANESI-S
#>  2:  CHRISTIANO-L   ALBANESI-S 0.2500000  CHRISTIANO-L   ALBANESI-S
#>  3:  CHRISTIANO-L      CHARI-V 0.3535534  CHRISTIANO-L      CHARI-V
#>  4:   CUKIERMAN-A    BRUNNER-K 0.3535534   CUKIERMAN-A    BRUNNER-K
#>  5:  EICHENBAUM-M      CHARI-V 0.3535534  EICHENBAUM-M      CHARI-V
#>  6:  EICHENBAUM-M CHRISTIANO-L 0.2500000  EICHENBAUM-M CHRISTIANO-L
#>  7: EICHENGREEN-B      BORDO-M 1.0000000 EICHENGREEN-B      BORDO-M
#>  8:      EUSEPI-S    BULLARD-J 1.0000000      EUSEPI-S    BULLARD-J
#>  9:      FARMER-R      BEYER-A 1.0000000      FARMER-R      BEYER-A
#> 10:  FITZGERALD-T CHRISTIANO-L 0.5000000  FITZGERALD-T CHRISTIANO-L

Incorporated data

The biblionetwork package contains bibliometric data built by Goutsmedt (2021). These data gather the academic articles and books that endeavoured to explain the United States stagflation of the 1970s, published between 1975 and 2013. They also gather all the references cited by these articles and books on stagflation. The Nodes_stagflation file contains information about the academic articles and books on stagflation (the staflation documents), as well as about the references cited at least by two of these stagflation documents. The Ref_stagflation is a data frame of direct citations, with the identifiers of citing documents, and the identifiers of cited documents. The Authors_stagflation is a data frame with the list of documents explaining the US stagflation, and all the authors of these documents (Nodes_stagflation just takes the first author for each document).

References

Goutsmedt, Aurélien. 2021. “From the Stagflation to the Great Inflation: Explaining the US Economy of the 1970s.” Revue d’Economie Politique Forthcoming. https://mega.nz/file/zfJ2QBbb#3OqXBIQRYmuQzptMyfvwW92IXhN-pWApKpILSs_w-pg.
Leydesdorff, Loet, and Han Woo Park. 2017. “Full and Fractional Counting in Bibliometric Networks.” Journal of Informetrics 11 (1): 117–20. https://linkinghub.elsevier.com/retrieve/pii/S1751157716303133.
Perianes-Rodriguez, Antonio, Ludo Waltman, and Nees Jan Van Eck. 2016. “Constructing Bibliometric Networks: A Comparison Between Full and Fractional Counting.” Journal of Informetrics 10 (4): 1178–95. https://www.sciencedirect.com/science/article/pii/S1751157716302036?casa_token=AtzjmZ-1QmYAAAAA:2mlBPZsjGUleYi9mnybHODFw2RmMh3GHvRAuMYXygRm63cQOv07M4ixbAmJXuGq71tx2ug29baTp.
Sen, Subir K., and Shymal K. Gan. 1983. “A Mathematical Extension of the Idea of Bibliographic Coupling and Its Applications.” Annals of Library Science and Documentation 30 (2). http://nopr.niscair.res.in/bitstream/123456789/28008/1/ALIS%2030(2)%2078-82.pdf.
Shen, Si, Danhao Zhu, Ronald Rousseau, Xinning Su, and Dongbo Wang. 2019. “A Refined Method for Computing Bibliographic Coupling Strengths.” Journal of Informetrics 13 (2): 605–15. https://linkinghub.elsevier.com/retrieve/pii/S1751157716300244.
Vladutz, George, and James Cook. 1984. “Bibliographic Coupling and Subject Relatedness.” Proceedings of the American Society for Information Science 21: 204–7.
Zhao, Dangzhi, and Andreas Strotmann. 2008. “Author Bibliographic Coupling: Another Approach to Citation-Based Author Knowledge Network Analysis.” Proceedings of the American Society for Information Science and Technology 45 (1): 1–10. https://asistdl.onlinelibrary.wiley.com/doi/full/10.1002/meet.2008.1450450292.

  1. I take as example authors here, but the function could also be used for calculating a co-authorship network with institutions or countries as nodes.↩︎