Skip to contents

This package has two main categories of functionality:

  • Extracting R package author information: Parse the DESCRIPTION file of R packages, or the equivalent data provided as a data.frame by the pkgsearch package, to extract authors from the Author or Authors@R fields. These features are only useful in the context of the R package ecosystem.
  • Cleaning and deduplicating names: Clean and deduplicate a list of names. It can be applied to the list of R package author names extracted in the previous step, but it also directly applies in other, more generic, tasks where one needs to clean and deduplicate a list of names.

Extracting R package author information

R package authors can be specified in two ways in the DESCRIPTION file.

Extracting R package author information from the Author field

The authors can be listed in the Author field, as free text. However, this method is now actively discouraged by CRAN, and many R user communities, such as rOpenSci.

We would for example have:

Author: Ada Lovelace and Charles Babbage

Because this is free-text, it could be formatted in many different ways, and it is hard to programmatically extract the names. The parse_authors() function provided by the package is designed to split at common delimiter and clean common extra words.

parse_authors("Ada Lovelace and Charles Babbage")
#> [1] "Ada Lovelace"    "Charles Babbage"
parse_authors("Ada Lovelace, Charles Babbage")
#> [1] "Ada Lovelace"    "Charles Babbage"
parse_authors("Ada Lovelace with contributions from Charles Babbage")
#> [1] "Ada Lovelace"    "Charles Babbage"
parse_authors("Ada Lovelace, Charles Babbage, et al.")
#> [1] "Ada Lovelace"    "Charles Babbage"

Extracting R package author information from the Authors@R field

The authors can also be listed in Authors@R field as a string containing R code that generates a vector of person objects. This is the most modern and recommended way to specify authors in the DESCRIPTION file.

Authors@R: c(
  person("Ada Lovelace", role = c("aut", "cre"), email = "ada@email.com"),
  person("Charles Babbage", role = "aut")
)
auts <- parse_authors_r("c(
  person('Ada Lovelace', role = c('aut', 'cre'), email = 'ada@email.com'),
  person('Charles Babbage', role = 'aut')
)")

class(auts)
#> [1] "person"

str(auts)
#> List of 2
#>  $ :Class 'person'  hidden list of 1
#>   ..$ :List of 5
#>   .. ..$ given  : chr "Ada Lovelace"
#>   .. ..$ family : NULL
#>   .. ..$ role   : chr [1:2] "aut" "cre"
#>   .. ..$ email  : chr "ada@email.com"
#>   .. ..$ comment: NULL
#>  $ :Class 'person'  hidden list of 1
#>   ..$ :List of 5
#>   .. ..$ given  : chr "Charles Babbage"
#>   .. ..$ family : NULL
#>   .. ..$ role   : chr "aut"
#>   .. ..$ email  : NULL
#>   .. ..$ comment: NULL
#>  - attr(*, "class")= chr "person"

print(auts)
#> [1] "Ada Lovelace <ada@email.com> [aut, cre]"
#> [2] "Charles Babbage [aut]"

If we only want the names, we can use the format.person() base R function:

format(auts, include = c("given", "family"))
#> [1] "Ada Lovelace"    "Charles Babbage"

Cleaning and deduplicating author names

We can now take the list of authors extracted from the previous step, or an independently gathered list of names, and clean and deduplicate it.

Harmonize differently abbreviated names

The expand_names() function can be used to expand differently abbreviated names to a common form, passed in the expanded argument:

expand_names(c("Ada Lovelace", "A Lovelace"), expanded = "Ada Lovelace")
#> [1] "Ada Lovelace" "Ada Lovelace"

However, a common pattern is to pass the vector to clean itself in expanded. This way, you can harmonize names to their longest form in the vector, even if you do not know the full name of all authors in advance:

my_names <- c("Ada Lovelace", "A Lovelace", "Charles Babbage")
expand_names(my_names, my_names)
#> [1] "Ada Lovelace"    "Ada Lovelace"    "Charles Babbage"