How do I get subgroups by using pipes? I don't understand why what I wrote doesn't work. Can someone explain how these work, reading online and seeing examples online hasn't help me because I am not sure what I am not understanding?
mean(mtcars$qsec)
mtcars %>%
select(qsec) %>%
mean()
Warning message:
In mean.default(.) : argument is not numeric or logical: returning NA
mean(mtcars$qsec[mtcars$cyl==8])
mtcars %>%
group-by(qsec) %>%
filter(cyl==8)
mean()
Error in mean.default() : argument "x" is missing, with no default
mean(mtcars$mpg[mtcars$hp > median(mtcars$hp)])
mtcars %>%
group_by(mpg) %>%
filter(hp>median(hp))
mean
The reason is that select still returns a data.frame with one column and mean expects a vector based on the ?mean
x - An R object. Currently there are methods for numeric/logical vectors and date, date-time and time interval objects. Complex vectors are allowed for trim = 0, only.
We can use pull to extract the column as a vector and apply the mean on it
library(dplyr)
mtcars %>%
pull(qsec) %>%
mean
#[1] 17.84875
In the second case, we are getting the mean of 'qsec' where 'cyl' is 8
mtcars %>%
select(qsec, cyl) %>%
filter(cyl == 8) %>%
pull(qsec) %>%
mean
#[1] 16.77214
Related
I'm working on an assignment for one of my classes. We have to se pipe operators to get the mean Height for trees with Volume greater than 13.
So initially, I tried:
df <- trees
df %>% filter(Volume > 13) %>% mean(Height)
The problem is, then I get a warning message
Warning message:
In mean.default(., Height) :
argument is not numeric or logical: returning NA
I can't figure out how Height is not numeric (it pretty clearly looks like a list of numbers to me), and so I can't complete this question.
Could someone help me out? I've been testing different variations, to no avail.
We can get the mean within summarise
library(dplyr)
df %>%
filter(Volume > 13) %>%
summarise(Mean = mean(Height))
mean expects a vector, and if we need to do this outside summarise pull the 'Height' as vector
df %>%
filter(Volume > 13) %>%
pull(Height) %>%
mean
Or use .$Height
df %>%
filter(Volume > 13) %>%
.$Height %>%
mean
The warning can be reproduced with iris data
data(iris)
iris %>%
mean(.$Sepal.Length)
#[1] NA
Warning message:
In mean.default(., .$Sepal.Length) :
argument is not numeric or logical: returning NA
It is not related to pipe. If the input is data.frame, it returns NA as the expectation is a vector
mean(iris['Sepal.Length'])
#[1] NA
Warning message:
In mean.default(iris["Sepal.Length"]) :
argument is not numeric or logical: returning NA
iris %>%
.$Sepal.Length %>%
mean
#[1] 5.843333
I obviously get an error with the below but I was hoping to summarise the same column with regards to mean and median, and also how many points are in the polygon. But within the same pipe. Any help would be great.
Nin_Sep_points_sf_joined <-
st_join(merged_ten_seven_shp, Nin_Sep_sf_3011) %>%
filter(!is.na(Employment_diff)) %>%
group_by(Kod) %>%
summarise(Count=mean(as.numeric(as.character(price)))), summarise(Count_tot=n()), summarise(Count=median(as.numeric(as.character(price))))
You can supply multiple arguments to summarize which you separate with a ,:
library(dplyr)
Nin_Sep_points_sf_joined <-
st_join(merged_ten_seven_shp, Nin_Sep_sf_3011) %>%
filter(!is.na(Employment_diff)) %>%
group_by(Kod) %>%
summarise(Count=mean(as.numeric(as.character(price))),
Count_tot=n(),
Count=median(as.numeric(as.character(price))))
Note that you can even refer to the results of previous arguments in the next argument. So you could calculate SD based on Count_tot.
This is probably a simple question, but I'm having trouble getting the mean function to work using dplyr.
Using the mtcars dataset as an example, if I type:
data(mtcars)
mtcars %>%
select (mpg) %>%
mean()
I get the "Warning message:
In mean.default(.) : argument is not numeric or logical: returning NA" error message.
For some reason though if I repeat the same code but just ask for a "summary", or "range" or several other statistical calculations, they work fine:
data(mtcars)
mtcars %>%
select (mpg) %>%
summary()
Similarly, if I run the mean function in base R notation, that works fine too:
mean(mtcars$mpg)
Can anyone point out what I've done wrong?
Use pull to pull out the vector.
mtcars %>%
pull(mpg) %>%
mean()
# [1] 20.09062
Or use pluck from the purrr package.
mtcars %>%
purrr::pluck("mpg") %>%
mean()
# [1] 20.09062
Or summarize first and then pull out the mean.
mtcars %>%
summarize(mean = mean(mpg)) %>%
pull(mean)
# [1] 20.09062
In dplyr, you can use summarise() whenever you're not changing your original dataframe (reordering it, filtering it, adding to it, etc), but instead are creating a new dataframe that has summary statistics for the first dataframe.
mtcars %>%
summarise(mean_mpg = mean(mpg))
gives the output:
mean_mpg
1 20.09062
PS. If you're learning dplyr, learning these five verbs will take you a long way: select(), filter(), group_by(), summarise(), arrange().
I want to select all numeric columns from a dataframe, and then to select all the non-numeric columns. An obvious way to do this is the following :-
mtcars %>%
select_if(is.numeric) %>%
head()
This works exactly as I expect.
mtcars %>%
select_if(!is.numeric) %>%
head()
This doesn't, and produces the error message Error in !is.numeric : invalid argument type
Looking at another way to do the same thing :-
mtcars %>%
select_if(sapply(., is.numeric)) %>%
head()
works perfectly, but
mtcars %>%
select_if(sapply(., !is.numeric)) %>%
head()
fails with the same error message. (purrr::keep behaves exactly the same way).
In both cases using - to drop the undesired columns fails too, with the same error as above for the is.numeric version, and this error message for the sapply version Error: Can't convert an integer vector to function.
The help page for is.numeric says
is.numeric is an internal generic primitive function: you can write methods to handle specific classes of objects, see InternalMethods. ... Methods for is.numeric should only return true if the base type of the class is double or integer and values can reasonably be regarded as numeric (e.g., arithmetic on them makes sense, and comparison should be done via the base type).
The help page for ! says
Value
For !, a logical or raw vector(for raw x) of the same length as x: names, dims and dimnames are copied from x, and all other attributes (including class) if no coercion is done.
Looking at the useful question Negation ! in a dplyr pipeline %>% I can see some of the reasons why this doesn't work, but neither of the solutions suggested there works.
mtcars %>%
select_if(not(is.numeric())) %>%
head()
gives the reasonable error Error in is.numeric() : 0 arguments passed to 'is.numeric' which requires 1.
mtcars %>%
select_if(not(is.numeric(.))) %>%
head()
Fails with this error :-
Error in tbl_if_vars(.tbl, .predicate, caller_env(), .include_group_vars = TRUE) : length(.p) == length(tibble_vars) is not TRUE.
This behaviour definitely violates the principle of least surprise. It's not of great consequence to me now, but it suggests I am failing to understand some more fundamental point.
Any thoughts?
Negating a predicate function can be done with the dedicated Negate() or purrr::negate() functions (rather than the ! operator, that negates a vector):
library(dplyr)
mtcars %>%
mutate(foo = "bar") %>%
select_if(Negate(is.numeric)) %>%
head()
# foo
# 1 bar
# 2 bar
# 3 bar
# 4 bar
# 5 bar
# 6 bar
Or (purrr::negate() (lower-case) has slightly different behavior, see the respective help pages):
library(purrr)
library(dplyr)
mtcars %>%
mutate(foo = "bar") %>%
select_if(negate(is.numeric)) %>%
head()
# foo
# 1 bar
# 2 bar
# 3 bar
# 4 bar
# 5 bar
# 6 bar
you could define your own "is not numeric" function and then use that instead
is_not_num <- function(x) !is.numeric(x)
mtcars %>%
select_if(is_not_num) %>%
head()
mtcars %>%
select_if(funs(!is.numeric(.))) %>%
head()
does the same
I have the following df where df <- data.frame(V1=c(0,0,1),V2=c(0,0,2),V3=c(-2,0,2))
If I do filter(df,rowSums!=0) I get the following error:
Error in filter_impl(.data, quo) :
Evaluation error: comparison (6) is possible only for atomic and list types.
Does anybody know why is that?
Thanks for your help
PS: Plain rowSums(df)!=0 works just fine and gives me the expected logical
A more tidyverse style approach to the problem is to make your data tidy, i.e., with only one data value.
Sample data
my_mat <- matrix(sample(c(1, 0), replace=T, 60), nrow=30) %>% as.data.frame
Tidy data and form implicit row sums using group_by
my_mat %>%
mutate(row = row_number()) %>%
gather(col, val, -row) %>%
group_by(row) %>%
filter(sum(val) == 0)
This tidy approach is not always as fast as base R, and it isn't always appropriate for all data types.
OK, I got it.
filter(df,rowSums(df)!=0)
Not the most difficult one...
Thanks.