Here are the common dplyr function properties:
- The first argument is a data frame. This is the data that the function will be operating on.
- The subsequent arguments describe what to do with the data frame. These arguments can be used to filter rows, select columns, mutate columns, group data, and summarize data.
- The return result of a function is a new data frame. This new data frame will contain the results of the function's operations.
- Data frames must be properly formatted and annotated for this to all be useful. This means that the data frames must be tidy, which means that each row should represent a single observation and each column should represent a single feature or characteristic of that observation.
Examples of dplyr functions:
- `filter()`: This function is used to filter rows from a data frame. For example, you could use `filter()` to select all rows where the `age` column is greater than 18.
- `select()`: This function is used to select columns from a data frame. For example, you could use `select()` to select the `name`, `age`, and `gender` columns from a data frame.
- `mutate()`: This function is used to create new columns in a data frame. For example, you could use `mutate()` to create a new column called `height_in_cm` that is the height of each person in centimeters.
- `group_by()`: This function is used to group data by a common value. For example, you could use `group_by()` to group data by the `gender` column.
- `summarize()`: This function is used to summarize data. For example, you could use `summarize()` to calculate the average height of all people in a data frame.