I used this question and answer to get what I want (how to compute rowsums using tidyverse), but I was wondering if there was a way to do named subsetting with rowSums. I can imagine an instance where I have a lot of variables where this would be desirable.
What I mean is something like this:
rosSums(iris, Sepal.Length, Sepal.Width)
Instead of:
rowSums(iris[1:2])
Thanks for any help in advance!
using dplyr you could simply do.
iris %>% mutate(Sepal.Length + Sepal.Width)
Related
May Know what know is there in the below code. I am trying to extract distinct values of Species under iris but not getting . I am trying to code without %>%
iris[,c(distinct("Species"))]
I am guessing you want to do this:
library(dplyr)
distinct(iris, Species)
you do not need %>% to begin with, but if you mean that you don't want to use the dplyr package, maybe you can try what #sm925 suggested as a comment: as.character(unique(iris$Species))
This will give you a vector with all unique species:
unique(iris$Species)
I am trying to rewrite this expression to magrittr’s pipe operator:
print(mean(pull(df, height), na.rm=TRUE))
which returns 175.4 for my dataset.
I know that I have to start with the data frame and write it as >df%>% but I’m confused about how to write it inside out. For example, should the na.rm=TRUE go inside mean(), pull() or print()?
UPDATE: I actually figured it out by trial and error...
>df%>%
+pull(height)%>%
+mean(na.rm=TRUE)
+print()
returns 175.4
It would be good practice to make a reproducible example, with dummy data like this:
height <- seq(1:30)
weight <- seq(1:30)
df <- data.frame(height, weight)
These pipe operators work with the majority of the tidyverse (not just magrittr). What you are trying to do is actually coming out of dplyr. The na.rm=T is required for many summary variables like mean, sd, as well as certain functions used to gather specific data points like min, max, etc. These functions don't play well with NA values.
df %>% pull(height) %>% mean(na.rm=T) %>% print()
Unless your data is nested you may not even need to use pull
df %>% summarise(mean = mean(height,na.rm=T))
Also, using summarise you can pipe these into another dataframe rather than just printing, and call them out of the dataframe whenever you want.
df %>% summarise(meanHt = mean(height,na.rm=T), sdHt = sd(height,na.rm=T)) -> summary
summary[1]
summary[2]
I really love the apply-family in R, but I think I still do not get the best of it.
with(mtcars, tapply(mpg, cyl, mean))
sapply(mtcars, mean)
These two functions for example are really nice, but how can I combine them to get the mean for each variable for every category of the variable cyl?
With dplyr it is quite easy I guess:
mtcars %>%
group_by(cyl) %>%
summarise_all(mean)
For dplyr it seems to be quite easy. So maybe another questions might be why it is useful to even learn all these apply functions, when dplyr makes it easy to solve the problem? :-)
If you're looking for a base R solution, then you can use split to separate your data frame by cyl, then use sapply as before:
S <- split( mtcars, mtcars$cyl )
lapply( S, function(x) sapply(x, mean) )
Your second question is primarily opinion-based, so I'll give mine: tidyverse packages, like dplyr, build on top of base R functionality to provide convenient and consistent interface for common data manipulation operations. For this reason, it is generally preferable, but may not always be available in a particular development environment. In the latter case, it is helpful to know how to fall back on base R functionality.
I have a question about dplyr.
Lets say I want to update certain values in a dataframe, can I do this?:
mtcars %>% filter(mpg>20) %>% select(hp)=1000
(The example is nonsensical where all cars with MPGs greater than 20 have HP set to 1000)
I get an error so I am guessing the answer is no I can't use %>% and the dplyr verbs to the left of an assignment, but the dplyr syntax is a lot cleaner than:
mtcars[mtcars$mpg>20,"hp"]=1000
Especially when you are dealing with more complex cases, so I wanted to ask if there is any way to use the dplyr syntax in this case?
edit: It looks like mutate is the verb I want, so now my question is, can I dynamically change the name of the var in the mutate statement like so:
for (i in c("hp","wt")) {mtcars<-mtcars %>% filter(mpg>20) %>% mutate(i=1000) }
This example just creates a column named "i" with value 1000, which isn't what I want.
I have a simple syntax question for an absolute beginner. I have been searching and experimenting and I can't figure it out. I need to only plot values from the variable SIZE that are greater than 0.8, but less than seven. I am using the with() expression along with plot(). Can someone tell me how I should write this?
with(dat[SIZE <7 | SIZE > 0.8 ,], plot(SP.RICH~SIZE))
Thank You.
Selecting only certain rows is called filtering.
One way is to use dplyr, it's a nicer idiom:
require(dplyr)
dat %>% filter(SIZE>0.8 & SIZE<7) %>%
plot(SP.RICH~SIZE, data = .)
Another is data.table package.