I can find max values of rows, disp, hp in mtcars dataset using sapply function, which gives 472 335 respectively:
sapply(list(mtcars$disp,mtcars$hp), max, na.rm=TRUE)
Now I want cyl for these values, i.e. cyl of cars where maximum value of sapply(list(mtcars$disp,mtcars$hp),max,na.rm=TRUE) is found.
Which function should I be using? I tried unsuccessfully with which,rownames,colnames:
mtcars(which(sapply(list(mtcars$disp,mtcars$hp),max,na.rm=TRUE)))
rownames(which(sapply(list(mtcars$disp,mtcars$hp),max,na.rm=TRUE))))
mtcars$cyl(sapply(list(mtcars$disp,mtcars$hp),max,na.rm=TRUE))
library(dplyr)
filter(mtcars, hp==max(hp) | disp == max(disp))$cyl
And the data.table solution is:
require(data.table)
mtcars <- as.data.table(mtcars)
mtcars[hp==max(hp) | disp==max(disp)]
mpg cyl disp hp drat wt qsec vs am gear carb
1: 10.4 8 472 205 2.93 5.25 17.98 0 0 3 4
2: 15.0 8 301 335 3.54 3.57 14.60 0 1 5 8
# if you want to get one column, e.g. 'cyl'
mtcars[hp==max(hp) | disp == max(disp), cyl]
[1] 8 8
# if you want to get several columns, do either of:
mtcars[hp==max(hp) | disp == max(disp), .(cyl,qsec)]
mtcars[hp==max(hp) | disp == max(disp), list(cyl,qsec)]
cyl qsec
1: 8 17.98
2: 8 14.60
Related
For the following data frame
df <- data.frame(identifier = c("A","B","C),
age = c(33, 65, 42),
sex=c("male", "female","male"))
I want to be able to filter and count the number of males over 60, however this does not work:
summary <- df %>% summarise(males_over_60 = sum(age>60[sex=="male"]))
Any help would be greatly appreciated.
As mentioned in the comment by #neilfws, no rows in your example data satisfy both conditions.
However, using mtcars as an example, you can filter on two conditions using &:
mtcars %>% filter(cyl == 4 & hp >100)
mpg cyl disp hp drat wt qsec vs am gear carb
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
and you can summarise this as you like.
I am trying to find the range of variable lat for each other column containing occurence records e.g. 0,1,2,3 etc. where the record of occurrence is non-zero (range of lat where occurence >0). I've tried to subset the data for each column without rows with 0 individuals recorded but I can't get it to work.
i tried to extract the minimum and maximum of lat for each species column where the occurence was >0 using which.max/min:
allfreq$lat[which.min(allfreq$lat[allfreq$Fem.mad !=0])]
however the results made no sense in that the values were nowhere near the minimum and maximum I observed visually.
Using mtcars dataset
> sapply(mtcars,function(x){range(x[x!=0])})
mpg cyl disp hp drat wt qsec vs am gear carb
[1,] 10.4 4 71.1 52 2.76 1.513 14.5 1 1 3 1
[2,] 33.9 8 472.0 335 4.93 5.424 22.9 1 1 5 8
For example, if I want to keep only those rows of the data mtcars where the variable qsec contains this decimal .50, following the solutions given here, I use:
mtcars_stringed<-mtcars%>%filter(str_detect(qsec, ".50"))
mtcars_stringed<-mtcars[mtcars$qsec %like% ".50", ]
mtcars_stringed <- mtcars[grep(".50", mtcars$qsec), ]
View(mtcars_stringed)
Surprisingly, all these strategies fail, by returning null, while in fact mtcars$qsec has values containing .50 such as 14.50, 15.50,
Any alternative solution, or is there something I am missing? Thanks in advance.
When you treat a numeric as a string, it is converted as.character(mtcars$qsec). If you look at that, you'll see that in the conversion, trailing 0s are dropped, so we get, e.g., "14.5", "15.5".
It will work if you use the regex pattern "\\.5$", \\ to make the . a ., not just "any character", and $ to match the end of the string.
mtcars %>% filter(str_detect(qsec, "\\.5$"))
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4
# 2 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
However, in general, treating decimals as strings can be risky. A better approach might to get rid of the integer with %% 1 and then test for nearness to 0.5 within some tolerance, this will avoid precision issues.
mtcars %>% filter(abs(qsec %% 1 - 0.5) < 1e-10)
You are probably looking for:
mtcars %>%
filter(qsec %% 0.50 == 0 & qsec %% 1 != 0)
mpg cyl disp hp drat wt qsec vs am gear carb
1 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4
2 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
I am a very beginner in working with R. This question therefore can be considered as a basic one.
I am trying to convert data in matrix format to panel data format when A, B or C = 0 For example:
set.seed(0); mat <- matrix(sample(0:1, 16, replace=T), ncol=4, nrow=4)
colnames (mat) <- c("A", "B", "C", "D")
rownames (mat) <- c("1","2", "3", "4")
to a panel format like:
A 1
A 2
A 3
A 4
B 1
B 2
B 3
B 4
for every letter where variable "1"-"4" are 0.
I tried using the apply codes from the plyr package. Can someone provide me the right code and argument for letting R know that it should extract A, B, C or D if "1"=0 and repeat the same process for "2", "3" and "4" and that R puts the output underneath the former in a new dataframe?
I realized the above stated question is not clear enough. I therefore make it more clear by the hand of the mtcars dataset.
cars <- mtcars
In case of this dataset, the format I would like is:
Mazda RX4 | mpg | 21.0
Mazda RX4 | cyl | 6
Mazda RX4 | disp | 160.0
...
Mazda RX4 Wag | mpg | 21.0
Mazda RX4 Wag | cyl | 6
...
and so on.
A note: You keep refering to the rows as variables. Having your variables in a row is at the very least confusing if not straight out dangerous because people expect variables to be in a column!
If your variables are called "1",...,"4" then I assume A,...,D refers to your observations? This would be even more confusing...
If you are interessted in what makes data tidy you should read Hadley Wickhams's revealing article on tidy data.
EDIT:
Regarding your question:
Using the mtcars dataset and functions from the tidyr and dplyr package:
require(tidyr)
require(dplyr)
mtcars %>%
add_rownames() %>%
gather("id", "value", mpg:carb) %>%
arrange(rowname)
Source: local data frame [352 x 3]
rowname id value
(chr) (chr) (dbl)
1 AMC Javelin mpg 15.200
2 AMC Javelin cyl 8.000
3 AMC Javelin disp 304.000
4 AMC Javelin hp 150.000
5 AMC Javelin drat 3.150
6 AMC Javelin wt 3.435
7 AMC Javelin qsec 17.300
8 AMC Javelin vs 0.000
9 AMC Javelin am 0.000
10 AMC Javelin gear 3.000
.. ... ... ...
If you dont know the %>% operator (called pipe-operator) just read it as "and then".
For the mtcarexample this piece of code
library(data.table)
cars <- as.data.table(mtcars, keep.rownames = TRUE)
melt(cars, id.vars = "rn")[order(rn)]
will give
rn variable value
1: AMC Javelin mpg 15.20
2: AMC Javelin cyl 8.00
3: AMC Javelin disp 304.00
4: AMC Javelin hp 150.00
5: AMC Javelin drat 3.15
---
348: Volvo 142E qsec 18.60
349: Volvo 142E vs 1.00
350: Volvo 142E am 1.00
351: Volvo 142E gear 4.00
352: Volvo 142E carb 2.00
Note that mtcars is a data.frame not a matrix.
The solution for the matrix mat given in the Q is
melt(as.data.table(mat, keep.rownames = TRUE), id.vars = "rn")[value == 0][
order(variable, rn), .(variable, rn)]
which will return
rn variable value
1: A 2
2: A 3
3: B 2
4: C 3
5: C 4
6: D 1
7: D 3
This question already has answers here:
sample rows of subgroups from dataframe with dplyr
(4 answers)
Closed 9 years ago.
I can sample 10 rows from a data.frame like this:
mtcars[sample(1:32, 10),]
What is syntax for doing this with dplyr? This is what I tried:
library(dplyr)
filter(mtcars, sample(1:32, 10))
I believe you aren't really "filtering" in your example, you are just sampling rows.
In hadley´s words here is the purpose of the function:
filter() works similarly to subset() except that you can give it any number of filtering conditions which are joined together with & (not
&& which is easy to do accidentally!)
Here is an example with the mtcars dataset, as it's used in the introductory vignette
library(dplyr)
filter(mtcars, cyl == 8, wt < 3.5)
mpg cyl disp hp drat wt qsec vs am gear carb
1 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
2 15.2 8 304 150 3.15 3.435 17.30 0 0 3 2
3 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4
As a conclusion: filter is equivalen to subset(), not sample().
Figured out how to do it (although Josh O'Brien beat me to it):
filter(mtcars, rownames(mtcars) %in% sample(rownames(mtcars), 10, replace = F))