tapply with categorical variable - r

I am trying to use tapply() for some descriptive analysis, with the mtcars dataset in R.
So the problem is:
> table(mtcars$carb)
1 2 3 4 6 8
7 10 3 10 1 1
> tapply(mtcars$carb,list(mtcars$vs,mtcars$am),function(x){length(x)})
0 1
0 12 6
1 7 7
The above line worked, but the line below didnt:
> tapply(mtcars$carb,list(mtcars$vs,mtcars$am),function(x){table(x)})
0 1
0 Integer,3 Integer,4
1 Integer,3 Integer,2
By using tapply on mtcars$carb, I expect to get the table for each of the four combinations from vs and am. Any idea what went wrong? Thank you very much.

The calculation is already done by tapply but it is not available in easy to view form. You can wrap the output of table in list.
tapply(mtcars$carb,list(mtcars$vs,mtcars$am),function(x) list(table(x)))
#[[1]]
#x
#2 3 4
#4 3 5
#[[2]]
#x
#1 2 4
#3 2 2
#[[3]]
#x
#2 4 6 8
#1 3 1 1
#[[4]]
#x
#1 2
#4 3
Or using lapply :
temp <- tapply(mtcars$carb,list(mtcars$vs,mtcars$am),table)
lapply(temp, I)

We can do this with fable
ftable(mtcars[c('carb', 'vs', 'am')])

Related

Add dynamic subset conditions as variables into the data.frame

I have a data frame like
> x = data.frame(A=c(1,2,3),B=c(2,3,4))
> x
A B
1 1 2
2 2 3
3 3 4
and subsetting conditions in a data frame like
> cond = data.frame(condition=c('A>1','B>2 & B<4'))
> cond
condition
1 A>1
2 B>2 & B<4
which I then apply dynamically
> eval(parse(text=paste0("subset(x,",cond[1,'condition'],")")))
A B
2 2 3
3 3 4
> eval(parse(text=paste0("subset(x,",cond[2,'condition'],")")))
A B
2 2 3
Now, instead of subsetting, I would like to add the subsetting conditions as variables into the data. The end result would look like
A B condition1 condition2
1 1 2 0 0
2 2 3 1 1
3 3 4 1 0
How could I derive the above table using the dynamic conditions?
Before using eval parse, I hope you have gone through some readings like
What specifically are the dangers of eval(parse(…))?
and many others which are available.
However, to answer your question, we can continue your flow and use eval parse in sapply
+(sapply(seq_len(nrow(cond)), function(i)
eval(parse(text=paste0("with(x,",cond[i,'condition'],")")))))
# [,1] [,2]
#[1,] 0 0
#[2,] 1 1
#[3,] 1 0
To add it to the dataframe,
x[paste0("condition", 1:nrow(cond))] <-
+(sapply(seq_len(nrow(cond)), function(i)
eval(parse(text=paste0("with(x,",cond[i,'condition'],")")))))
x
# A B condition1 condition2
#1 1 2 0 0
#2 2 3 1 1
#3 3 4 1 0
Simplifying it a bit (using #jogo's comment)
+(sapply(cond$condition, function(i) with(x, eval(parse(text=as.character(i))))))
# [,1] [,2]
#[1,] 0 0
#[2,] 1 1
#[3,] 1 0
Here is an option using tidyverse
library(tidyverse)
x %>%
mutate(!!! rlang::parse_exprs(str_c(cond$condition, collapse=";"))) %>%
rename_at(3:4, ~ paste0("condition", 1:2))
# A B condition1 condition2
#1 1 2 FALSE FALSE
#2 2 3 TRUE TRUE
#3 3 4 TRUE FALSE
If needed, the logical columns can be easily converted to binary with as.integer

Reduce columns of a matrix by a function in R

I have a matrix sort of like:
data <- round(runif(30)*10)
dimnames <- list(c("1","2","3","4","5"),c("1","2","3","2","3","2"))
values <- matrix(data, ncol=6, dimnames=dimnames)
# 1 2 3 2 3 2
# 1 5 4 9 6 7 8
# 2 6 9 9 1 2 5
# 3 1 2 5 3 10 1
# 4 6 5 1 8 6 4
# 5 6 4 5 9 4 4
Some of the column names are the same. I want to essentially reduce the columns in this matrix by taking the min of all values in the same row where the columns have the same name. For this particular matrix, the result would look like this:
# 1 2 3
# 1 5 4 7
# 2 6 1 2
# 3 1 1 5
# 4 6 4 1
# 5 6 4 4
The actual data set I'm using here has around 50,000 columns and 4,500 rows. None of the values are missing and the result will have around 40,000 columns. The way I tried to solve this was by melting the data then using group_by from dplyr before reshaping back to a matrix. The problem is that it takes forever to generate the data frame from the melt and I'd like to be able to iterate faster.
We can use rowMins from library(matrixStats)
library(matrixStats)
res <- vapply(split(1:ncol(values), colnames(values)),
function(i) rowMins(values[,i,drop=FALSE]), rep(0, nrow(values)))
res
# 1 2 3
#[1,] 5 4 7
#[2,] 6 1 2
#[3,] 1 1 5
#[4,] 6 4 1
#[5,] 6 4 4
row.names(res) <- row.names(values)

recode values into one column

I have a dataframe with one value per row, potentially in one of several columns. How can I create a single column that contains the column number the 1 is in? I would like to do this using dplyr, but the only methods I can think of involve for loops, which seems very not R like.
df<-data.frame(
a=c(1,0,0,0),
b=c(0,1,1,0),
c=c(0,0,0,1)
)
a b c
1 1 0 0
2 0 1 0
3 0 1 0
4 0 0 1
GOAL:
1 1
2 2
3 2
4 3
There is no need for dplyr here. This is what max.col() is for. Since all the other values in the row will be zero, then max.col() will give us the column number where the 1 appears.
max.col(df)
# [1] 1 2 2 3
If you need a column, then
data.frame(x = max.col(df))
# x
# 1 1
# 2 2
# 3 2
# 4 3
Or cbind() or matrix() for a matrix.
We could also do
as.matrix(df) %*%seq_along(df)
# [,1]
#[1,] 1
#[2,] 2
#[3,] 2
#[4,] 3
which(df==1, arr.ind=T)
# row col
# [1,] 1 1
# [2,] 2 2
# [3,] 3 2
# [4,] 4 3

Create index for contiguous runs of values

I have a vector:
test <-c(1,1,0,2,2,3,4,1,1,0)
test
# [1] 1 1 0 2 2 3 4 1 1 0
I want to construct an grouping variable which indicates when values change:
# [1] 1 1 2 3 3 4 5 6 6 7
What is the best way to do this?
Use run length encoding (rle), seq_along and rep
r <- rle(test)
changes <- rep(seq_along(r$lengths), r$lengths)
changes
## [1] 1 1 2 3 3 4 5 6 6 7
Alternative option, which will admittedly only work for numeric data.
test <-c(1,1,0,2,2,3,4,1,1,0)
cumsum(c(1L, diff(test) != 0))
# [1] 1 1 2 3 3 4 5 6 6 7
And a convoluted variation that will work for any data types:
head(cumsum(c(TRUE, c(tail(test, -1), NA) != test)), -1)
# [1] 1 1 2 3 3 4 5 6 6 7

subtract first value from each subset of dataframe

I want to subtract the smallest value in each subset of a data frame from each value in that subset i.e.
A <- c(1,3,5,6,4,5,6,7,10)
B <- rep(1:4, length.out=length(A))
df <- data.frame(A, B)
df <- df[order(B),]
Subtracting would give me:
A B
1 0 1
2 3 1
3 9 1
4 0 2
5 2 2
6 0 3
7 1 3
8 0 4
9 1 4
I think the output you show is not correct. In any case, from what you explain, I think this is what you want. This uses ave base function:
within(df, { A <- ave(A, B, FUN=function(x) x-min(x))})
A B
1 0 1
5 3 1
9 9 1
2 0 2
6 2 2
3 0 3
7 1 3
4 0 4
8 1 4
Of course there are other alternatives such as plyr and data.table.
Echoing Arun's comment above, I think your expected output might be off. In any event, you should be able to use can use tapply to calculate subsets and then use match to line those subsets up with the original values:
subs <- tapply(df$A, df$B, min)
df$A <- df$A - subs[match(df$B, names(subs))]
df
A B
1 0 1
5 3 1
9 9 1
2 0 2
6 2 2
3 0 3
7 1 3
4 0 4
8 1 4

Resources