Getting the total of all rows but excluding certain columns Rstudio - r

I need to get the total of each row within my table however I want to exclude certain columns as these contain numeric data such as plot numbers or treatments that I don't want to be counted.
I have tried using mutate and rowsums for this but it doesn't seem to work and I get this error:
Error in UseMethod("mutate") :
no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"
mutate(total=rowSums(select(Flower,-Survey, -Date, -Recorder, -Site, -Block, -Plot, -Treatment)))
Following a comment here is my updated code:
df<- mutate(total = rowSums(select(Flower, !c(Ranunculus.repens, Ranunculus.acris, Ranunculus.ficaria, Trifolium.repens, Geranium.molle, Cardamine.flexuosa, Bellis.perennis, Taraxacum.officinalis, Lamium.purpureum, Glechoma.hederacea, Cardamine.pratensis, Medicago.lupulina, Medicago.arabica, Cerastium.fontanum, Prunella.vulgaris, Sonchus.arvensis, Veronica.persica, Veronica.chamaedrys, Viola.riviniana)), na.rm = TRUE))
I am now getting an error message saying that X must be numeric however after checking all of the columns entered are numeric.

The issue is that the first argument of mutate has to be a dataframe while you try to apply mutate on total which is a numeric. To make your code work you have to do:
library(dplyr)
mutate(Flower, total=rowSums(select(Flower,-Survey, -Date, -Recorder, -Site, -Block, -Plot, -Treatment)))
Using mtcars as example data:
library(dplyr)
mtcars |>
mutate(total = rowSums(select(mtcars, !c(hp, mpg, disp, drat, wt, qsec))))
#> mpg cyl disp hp drat wt qsec vs am gear carb total
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 15
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 15
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 11
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 11
Another option would be to use dplyr::across like so:
mtcars |>
mutate(total = rowSums(across(!c(hp, mpg, disp, drat, wt, qsec))))
#> mpg cyl disp hp drat wt qsec vs am gear carb total
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 15
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 15
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 11
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 11

Related

How to make a list with a variable name?

I'm using the mtcars dataset for this example.
I have a function which creates a named list using a variable:
make_list <- function(df, variable_name) {
a <- df %>%
list(variable_name = .)
return(a)
}
When I use this function:
mylist <- make_list(mtcars, "car_info")
head(mylist)
$variable_name
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
The list name is called variable_name, rather than car_info.
How do I change the function (but still use a pipe format) so that the correct name is returned?
If you want to continue using the pipe, you can use setNames:
make_list <- function(df, variable_name) {
df %>%
list%>%
setNames(variable_name)
}
make_list(mtcars, "car_info")
Output:
$car_info
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
make_list <- function(df, variable_name) {
a <- df %>% list
names(a) <- variable_name
return(a)
}
Try this:
make_list <- function(df, variable_name) {
a <- df %>%
list()
names(a) <- variable_name
return(a)
}
mylist <- make_list(mtcars, "car_info")
Output (Some rows):
mylist
$car_info
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
rlang has a list2 function that does that
make_list <- function(df, variable_name) {
rlang::list2(!! variable_name := df)
}
make_list(mtcars, "car_info")
#> $car_info
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Or tibble::lst works the same: make_list <- function(df, variable_name) tibble::lst(!! variable_name := df)

Using select=-c() in subset function gives error: invalid argument to unary operator

I have a matrix and I would like to eliminate two columns by their names.
My code was :
trn_data = subset(trn_data, select = -c("Rye flour","Barley products"))
but R gave me an error message like this:
Error in -c("Rye flour", "Barley products") :
invalid argument to unary operator
I tried this
trn_data = subset(trn_data, select = -c(Rye flour,Barley products))
Also returning an error:
Error: unexpected symbol in "trn_data=subset(trn_data,select =-c(Rye flour"
How can I fix this? Is there any other method that can eliminate two columns by their names?
You should not provide the names as characters to subset. This works:
trn_data_subset <- subset(trn_data, select = -c(`Rye flour`,`Barley products`))
If you have spaces in the name of columns, you should use Grave Accent.
Here's an example using mtcars dataset:
mtexapmple <- mtcars[1:4,]
names(mtexapmple)[1] <- "mpg with space"
mtexapmple
#> mpg with space cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
subset(mtexapmple, select = -c(`mpg with space`, cyl))
#> disp hp drat wt qsec vs am gear carb
#> Mazda RX4 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 258 110 3.08 3.215 19.44 1 0 3 1
You can also do it like these:
within(trn_data, rm(`Rye flour`,`Barley products`))
or
trn_data[, !(colnames(trn_data) %in% c("Rye flour","Barley products"))]
With dplyr, we can still use - with double quote
library(dplyr)
mtexample %>%
select(-"mpg with space")
# cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1
data
mtexample <- mtcars[1:4,]
names(mtexample)[1] <- "mpg with space"

Why using get to access variables in Global env without specify environment explicitly works for dplyr 0.5.0 but not for 0.6.0?

I asked a question about accessing variable in global environment with the same name as one of the column name in dplyr functions. One solution I received was using get. However, new matter arises. Without specify environment explicitly, the result is different (see the following), Could someone explain why?
# dplyr 0.6.0
mpg <- 21
mtcars %>% filter(mpg == get("mpg"))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> ...
# dplyr 0.5.0
mtcars %>% filter(mpg == get("mpg"))
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#2 21 6 160 110 3.9 2.875 17.02 0 1 4 4

How to dplyr rename a column, by column index?

The following code renames first column in the data set:
require(dplyr)
mtcars %>%
setNames(c("RenamedColumn", names(.)[2:length(names(.))]))
Desired results:
RenamedColumn cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Would it be possible to arrive at the same result using rename and column index?
This:
mtcars %>%
rename(1 = "ChangedNameAgain")
will fail:
Error in source("~/.active-rstudio-document", echo = TRUE) :
~/.active-rstudio-document:7:14: unexpected '='
6: mtcars %>%
7: rename(1 =
^
Similarly trying to use rename_ or .[[1]] as column reference will return an error.
As of dplyr 0.7.5, rlang 0.2.1, tidyselect 0.2.4, this simply works:
library(dplyr)
rename(mtcars, ChangedNameAgain = 1)
# ChangedNameAgain cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# ...
Original answer and edits now obsolete:
The logic of rename() is new_name = old_name, so ChangedNameAgain = 1 would make more sense than 1 = ChangedNameAgain.
I would suggest:
mtcars %>% rename_(ChangedNameAgain = names(.)[1])
# ChangedNameAgain cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Edit
I have yet to wrap my head around the new dplyr programming system based on rlang, since versions 0.6/0.7 of dplyr.
The underscore-suffixed version of rename used in my initial answer is now deprecated, and per #jzadra's comment, it didn't work anyway with syntactically problematic names like "foo bar".
Here is my attempt with the new rlang-based Non Standard Evaluation system. Do not hesitate to tell me what I've done wrong, in the comments:
df <- tibble("foo" = 1:2, "bar baz" = letters[1:2])
# # A tibble: 2 x 2
# foo `bar baz`
# <int> <chr>
# 1 1 a
# 2 2 b
First I try directly with rename() but unfortunately I've got an error. It seems to be a FIXME (or is this FIXME unrelated?) in the source code (I'm using dplyr 0.7.4), so it could work in the future:
df %>% rename(qux = !! quo(names(.)[[2]]))
# Error: Expressions are currently not supported in `rename()`
(Edit: the error message now (dplyr 0.7.5) reads Error in UseMethod("rename_") : no applicable method for 'rename_' applied to an object of class "function")
(Update 2018-06-14: df %>% rename(qux = !! quo(names(.)[[2]])) now seems to work, still with dplyr 0.7.5, not sure if an underlying package changed).
Here is a workaround with select that works. It doesn't preserve column order like rename though:
df %>% select(qux = !! quo(names(.)[[2]]), everything())
# # A tibble: 2 x 2
# qux foo
# <chr> <int>
# 1 a 1
# 2 b 2
And if we want to put it in a function, we'd have to slightly modify it with := to allow unquoting on the left hand side. If we want to be robust to inputs like strings and bare variable names, we have to use the "dark magic" (or so says the vignette) of enquo() and quo_name() (honestly I don't fully understand what it does):
rename_col_by_position <- function(df, position, new_name) {
new_name <- enquo(new_name)
new_name <- quo_name(new_name)
select(df, !! new_name := !! quo(names(df)[[position]]), everything())
}
This works with new name as a string:
rename_col_by_position(df, 2, "qux")
# # A tibble: 2 x 2
# qux foo
# <chr> <int>
# 1 a 1
# 2 b 2
This works with new name as a quosure:
rename_col_by_position(df, 2, quo(qux))
# # A tibble: 2 x 2
# qux foo
# <chr> <int>
# 1 a 1
# 2 b 2
This works with new name as a bare name:
rename_col_by_position(df, 2, qux)
# # A tibble: 2 x 2
# qux foo
# <chr> <int>
# 1 a 1
# 2 b 2
And even this works:
rename_col_by_position(df, 2, `qux quux`)
# # A tibble: 2 x 2
# `qux quux` foo
# <chr> <int>
# 1 a 1
# 2 b 2
Here's a couple of alternative solutions that are arguably easier to read because they are not focused around the . reference. select understands column indices, so if you're renaming the first column, you can simply do
mtcars %>% select( RenamedColumn = 1, everything() )
However, the issue with using select is that it will reorder columns if you're renaming a column in the middle. To get around the issue, you have to pre-select the columns to the left of the one you're renaming:
## This will rename the 7th column without changing column order
mtcars %>% select( 1:6, RenamedColumn = 7, everything() )
Another option is to use the new rename_at, which also understand column indices:
## This will also rename the 7th column without changing the order
## Credit for simplifying the second argument: Moody_Mudskipper
mtcars %>% rename_at( 7, ~"RenamedColumn" )
The ~ is needed because rename_at is quite flexible and can accept functions as its second argument. For example, mtcars %>% rename_at( c(2,4), toupper ) will make the names of the second and fourth columns uppercase.
dplyr has superceded rename_at() with rename_with(). You can rename a column by index like this:
library(tidyverse)
mtcars %>%
rename_with(.cols = 1, ~"renamed_column")
#> renamed_column cyl disp hp drat wt qsec vs am gear
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3
#> ...
Be sure to include the tilde (~)* before the new column name.
Also note that if you introduce the glue package, you can modify existing column names like this:
library(glue)
mtcars %>%
rename_with(.cols = 1, ~glue::glue("renamed_{.}"))
#> renamed_mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> ...
Applying the above approach to multiple columns is just a matter of passing in the column index number range using a colon (:) or multiple indices in a vector using c(); here's a combination of both:
mtcars %>%
rename_with(.cols = c(1:3, 5), ~glue::glue("renamed_{.}"))
#> renamed_mpg renamed_cyl renamed_disp hp renamed_drat wt
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875
#> Datsun 710 22.8 4 108.0 93 3.85 2.320
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440
#> ...
And keep in mind that since the . represents the current column name, you can apply string modification functions to it like this:
mtcars %>%
rename_with(.cols = c(1:3),
~glue::glue("renamed_{str_replace(.,'mpg','miles_per_gallon')}"))
#> renamed_miles_per_gallon renamed_cyl renamed_disp hp
#> Mazda RX4 21.0 6 160.0 110
#> Mazda RX4 Wag 21.0 6 160.0 110
#> Datsun 710 22.8 4 108.0 93
#> Hornet 4 Drive 21.4 6 258.0 110
#> Hornet Sportabout 18.7 8 360.0 175
#> ...
*You can learn more about the ~ and . NSE function shorthand here.
Imho rlang as suggested by #Aurele is too much here.
Solution 1: Use a curly bracket pipe pipe context:
bcMatrix %>% {colnames(.)[1] = "foo"; .}
Solution 2: Or (ab)use the tee operator %>% from magrittr package (installed anyway if dplyr is used) to perform the renaming as a side-effect:
bcMatrix %T>% {colnames(.)[1] = "foo"}
Solution 3: using a simple helper function:
rename_by_pos = function(df, index, new_name){
colnames(df)[index] = new_name
df
}
iris %>% rename_by_pos(2,"foo")

Counting number of entries by ID in R?

So I'm trying to count the number of entries by ID in R, I'll use a modified version of mtcars to get my point across. Here's the data:
car type mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Datsun 710 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
I want to end up with a table that counts the number of entries for each ID, so that my results are:
Mazda RX4 2
Datsun 710 2
Should be a fairly simple and straightforward solution, but I'm new to R and can't quite figure it out. Should I use "Aggregate"?
You can use either table or count
as.data.frame(table(rownames(mtcars)))
Or
library(plyr)
count(rownames(mtcars))
If you need the count for one of the column,
as.data.frame(table(yourdf$id))
Using dplyr on a data frame named df with an ID variable called id:
library(dplyr)
tally(group_by(df, id))

Resources