Primary R functions for dealing with regular expressions are grep(), grepl(), regexpr(), gregexpr(), sub(), and gsub().
- grep() and grepl() search for matches of a regular expression/pattern in a character vector. grep() returns the indices into the character vector that contain a match or the specific strings that happen to have the match. grepl() returns a TRUE/FALSE vector indicating which elements of the character vector contain a match.
- regexpr() and gregexpr() search a character vector for regular expression matches and return the indices of the string where the match begins and the length of the match.
- sub() and gsub() search a character vector for regular expression matches and replace that match with another string.
For example, the following code uses grep() to find all of the homicides that occurred in 2011:
> homicides_2011 <- grep("2011", homicides, value = TRUE)
> length(homicides_2011)
[1] 233
This code uses regexpr() to find the indices of the strings in homicides that contain the word "shooting":
> indices <- regexpr("shooting", homicides)
This code uses sub() to replace all of the HTML tags in homicides with the empty string:
> homicides <- gsub("<[^>]*>", "", homicides)
These are just a few of the many R functions that can be used to deal with regular expressions. For more information, you can consult the R documentation or the many online resources that are available.