Formatting species names in a column in R



I am working with quite a large database containing a column called 'Species_name' this is a factor column and includes the names of around 40 different species. As R is often case sensitive (particularly when plotting graphs) I was wondering if it was possible to write a line of code which formats all the species names in this column to Capital then lower case i.e. Brown crab, Blonde ray etc.

Apologies for my ignorance - I am new to R!

Many thanks!


Posted 2014-04-25T09:38:21.107

Reputation: 104



You first need to define a function that transforms character values to the case you want. R has built in tolower and toupper but nothing that capitalizes them the way you want.

capitalize <- function(x){
  first <- toupper(substr(x, start=1, stop=1)) ## capitalize first letter
  rest <- tolower(substr(x, start=2, stop=nchar(x)))   ## everything else lowercase
  paste0(first, rest)

Then you only apply the function to the levels of your factor variable. That's one advantage of factors:

levels(data$Species_name) <- capitalize(levels(data$Species_name))


Posted 2014-04-25T09:38:21.107

Reputation: 3 031

Brilliant - Works perfectly! Thank you :) – user3489562 – 2014-04-25T10:48:19.653


Use functions from stringi package:

x <- "alA Ma KOTA 123"
## [1] "Ala ma kota 123"

I think is worth mentioning that there is function which transform string to Title Case, but not in way that you are looking for.

## [1] "Ala Ma Kota 123"


Posted 2014-04-25T09:38:21.107

Reputation: 7 699


levels(df$Species_name) <- gsub("^([a-z])", "\\U\\1", tolower(levels(df$Species_name)), perl = TRUE)


First, make all names lower case using tolower, then capitalize first letter using gsub.

^([a-z]) goes after the first letter, while \\U\\1 means to capitalize it in Perl, thus the perl = TRUE

David Arenburg

Posted 2014-04-25T09:38:21.107

Reputation: 77 401