This package has two main categories of functionality:
-
Extracting R package author
information: Parse the
DESCRIPTION
file of R packages, or the equivalent data provided as adata.frame
by the pkgsearch package, to extract authors from theAuthor
orAuthors@R
fields. These features are only useful in the context of the R package ecosystem. - Cleaning and deduplicating names: Clean and deduplicate a list of names. It can be applied to the list of R package author names extracted in the previous step, but it also directly applies in other, more generic, tasks where one needs to clean and deduplicate a list of names.
Extracting R package author information
R package authors can be specified in two ways in the
DESCRIPTION
file.
Extracting R package author information from the Author
field
The authors can be listed in the Author
field, as free
text. However, this method is now actively discouraged by CRAN, and many
R user communities, such as rOpenSci.
We would for example have:
Because this is free-text, it could be formatted in many different
ways, and it is hard to programmatically extract the names. The
parse_authors()
function provided by the package is
designed to split at common delimiter and clean common extra words.
parse_authors("Ada Lovelace and Charles Babbage")
#> [1] "Ada Lovelace" "Charles Babbage"
parse_authors("Ada Lovelace, Charles Babbage")
#> [1] "Ada Lovelace" "Charles Babbage"
parse_authors("Ada Lovelace with contributions from Charles Babbage")
#> [1] "Ada Lovelace" "Charles Babbage"
parse_authors("Ada Lovelace, Charles Babbage, et al.")
#> [1] "Ada Lovelace" "Charles Babbage"
Extracting R package author information from the
Authors@R
field
The authors can also be listed in Authors@R
field as a
string containing R code that generates a vector of person
objects. This is the most modern and recommended way to specify authors
in the DESCRIPTION
file.
Authors@R: c(
person("Ada Lovelace", role = c("aut", "cre"), email = "ada@email.com"),
person("Charles Babbage", role = "aut")
)
auts <- parse_authors_r("c(
person('Ada Lovelace', role = c('aut', 'cre'), email = 'ada@email.com'),
person('Charles Babbage', role = 'aut')
)")
class(auts)
#> [1] "person"
str(auts)
#> List of 2
#> $ :Class 'person' hidden list of 1
#> ..$ :List of 5
#> .. ..$ given : chr "Ada Lovelace"
#> .. ..$ family : NULL
#> .. ..$ role : chr [1:2] "aut" "cre"
#> .. ..$ email : chr "ada@email.com"
#> .. ..$ comment: NULL
#> $ :Class 'person' hidden list of 1
#> ..$ :List of 5
#> .. ..$ given : chr "Charles Babbage"
#> .. ..$ family : NULL
#> .. ..$ role : chr "aut"
#> .. ..$ email : NULL
#> .. ..$ comment: NULL
#> - attr(*, "class")= chr "person"
print(auts)
#> [1] "Ada Lovelace <ada@email.com> [aut, cre]"
#> [2] "Charles Babbage [aut]"
If we only want the names, we can use the
format.person()
base R function:
Cleaning and deduplicating author names
We can now take the list of authors extracted from the previous step, or an independently gathered list of names, and clean and deduplicate it.
Harmonize differently abbreviated names
The expand_names()
function can be used to expand
differently abbreviated names to a common form, passed in the
expanded
argument:
expand_names(c("Ada Lovelace", "A Lovelace"), expanded = "Ada Lovelace")
#> [1] "Ada Lovelace" "Ada Lovelace"
However, a common pattern is to pass the vector to clean itself in
expanded
. This way, you can harmonize names to their
longest form in the vector, even if you do not know the full name of all
authors in advance:
my_names <- c("Ada Lovelace", "A Lovelace", "Charles Babbage")
expand_names(my_names, my_names)
#> [1] "Ada Lovelace" "Ada Lovelace" "Charles Babbage"