purrr map instead of apply - r

I'm trying to incorporate more pipes in my code. Oftentimes, I have to break up pipes to use the apply function. Then I found purrr. However, it's not clear to me how exactly it works. Here is what I want, and what I've tried. The main problem is that I want a rowwise computation.
want:
apply(mtcars,1,function(x) which.max(x))
have:
mtcars %>% map_dbl(which.max)

If we need rowwise, then use pmap. According to ?pmap
... Note that a data frame is a very important special case, in which case pmap() and pwalk() apply the function .f to each row. map_dfr(), pmap_dfr() and map2_dfc(), pmap_dfc() return data frames created by row-binding and column-binding respectively. ...
pmap_int(mtcars, ~ which.max(c(...)))
#[1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3 4 4 3
Also, in base R, this can be easily done and efficiently done with max.col
max.col(mtcars, "first")
#[1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3 4 4 3
The map is used similar to lapply/sapply where it loops through each column and apply the function on that column. So, it would be similar to
apply(mtcars, 2, which.max)

Related

How to assign values in one column to other columns in wide data using R

There is a wide data set, a simple example is
df<-data.frame("id"=c(1:6),
"ax"=c(1,2,2,3,4,4),
"bx"=c(7,8,8,9,10,10),
"cx"=c(11,12,12,13,14,14))
I'm looking for a way to assign the values in "ax" to column "bx" and "cx". Here, imagine we have thousands of columns we intend to replace with "ax", so I want this to be done in an automated approach using R. The expected output look like
df<-data.frame("id"=c(1:6),
"ax"=c(1,2,2,3,4,4),
"bx"=c(1,2,2,3,4,4),
"cx"=c(1,2,2,3,4,4))
I've thought of, and tried using mutate_at and ends_with, but this has not work for me. For example, I tried
df %>%
mutate_at(vars(ends_with("x")), labels = "ax")
and this prints an error. Not sure what's wrong or what's to be added to get this working, so I would like to request your help on this. Thank you very much!
A simple way using base R would be :
change_cols <- grep('x$', names(df))
df[change_cols] <- df$ax
df
# id ax bx cx
#1 1 1 1 1
#2 2 2 2 2
#3 3 2 2 2
#4 4 3 3 3
#5 5 4 4 4
#6 6 4 4 4
I would suggest this tidyverse approach using across() to select the range of variables you want:
library(tidyverse)
#Data
df<-data.frame("id"=c(1:6),
"ax"=c(1,2,2,3,4,4),
"bx"=c(7,8,8,9,10,10),
"cx"=c(11,12,12,13,14,14))
#Mutate
df %>% mutate(across(c(bx:cx), ~ ax))
Output:
id ax bx cx
1 1 1 1 1
2 2 2 2 2
3 3 2 2 2
4 4 3 3 3
5 5 4 4 4
6 6 4 4 4
Another option with mutate_at()
df %>%
mutate_at(vars(matches("x$")), ~ax)
# id ax bx cx
# 1 1 1 1 1
# 2 2 2 2 2
# 3 3 2 2 2
# 4 4 3 3 3
# 5 5 4 4 4
# 6 6 4 4 4

Order/Sort/Rank a table

I have a table like this
table(mtcars$gear, mtcars$cyl)
I want to rank the rows by the ones with more observations in the 4 cylinder. E.g.
4 6 8
4 8 4 0
5 2 1 2
3 1 2 12
I have been playing with order/sort/rank without much success. How could I order tables output?
We can convert table to data.frame and then order by the column.
sort_col <- "4"
tab <- as.data.frame.matrix(table(mtcars$gear, mtcars$cyl))
tab[order(-tab[sort_col]), ]
# OR tab[order(tab[sort_col], decreasing = TRUE), ]
# 4 6 8
#4 8 4 0
#5 2 1 2
#3 1 2 12
If we don't want to convert it into data frame and want to maintain the table structure we can do
tab <- table(mtcars$gear, mtcars$cyl)
tab[order(-tab[,dimnames(tab)[[2]] == sort_col]),]
# 4 6 8
# 4 8 4 0
# 5 2 1 2
# 3 1 2 12
Could try this. Use sort for the relevant column, specifying decreasing=TRUE; take the names of the sorted rows and subset using those.
table(mtcars$gear, mtcars$cyl)[names(sort(table(mtcars$gear, mtcars$cyl)[,1], dec=T)), ]
4 6 8
4 8 4 0
5 2 1 2
3 1 2 12
In the same scope as Milan, but using the order() function, instead of looking for names() in a sort()-ed list.
The [,1] is to look at the first column when ordering.
table(mtcars$gear, mtcars$cyl)[order(table(mtcars$gear, mtcars$cyl)[,1], decreasing=T),]

Pasting as object names

I am trying to use paste0 with merge, so that I can merge a bunch of stuff in a loop. However, I'm having trouble with calling specific columns from data.frames
To illustrate, I'll use head
Example:
df <- data.frame(x=1:10,y=1:10)
head(df)
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
head(get("df"))
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
head(df$x)
[1] 1 2 3 4 5 6
head(get("df$x"))
Error in get("df$x") : object 'df$x' not found
Is there a way to get a specific column?
The function get looks for objects defined in an environment. If you do not specify the environment, it defaults to your global workspace.
You need to coerce df into an environment using as.environment, and then call get using this environment, e.g.:
get("x", as.enviroment(get("df")))

Replicating vector elements by index

I have an integer vector:
a <- c(1,1,3,1,4)
where each element in a indicates how many times its index should be replicated in a new vector.
So the resulting vector should be:
b <- c(1,2,3,3,3,4,5,5,5,5)
What would be the most efficient way to do this?
For example using rep:
rep(seq_along(a),a)
1 2 3 3 3 4 5 5 5 5
Another less efficient option is to use inverse.rle :
inverse.rle(list(lengths=a,values=seq_along(a)))
[1] 1 2 3 3 3 4 5 5 5 5

Merge data frames for Cohen's kappa

I'm trying to analyze some date using R but I'm not very familiar with R (yet) and therefore I'm totally stuck.
What I try to do is manipulate my input data so I can use it to calculate Cohen's Kappa.
Now the problem is, that for rater_1, I have several ratings for some of the items and I need to select one. If rater_1 has given the same rate on an item as rater_2, then this rating should be chosen, if not any rating of the list can be used.
I tried
unique(merge(rater_1, rater_2, all.x=TRUE))
which brings me close, but if the ratings between the two raters diverge, only one is kept.
So, my question is, how do I get from
item rating_1
1 3
2 5
3 4
item rating_2
1 2
1 3
2 4
2 1
2 2
3 4
3 2
to
item rating_1 rating_2
1 3 3
2 5 4
3 4 4
?
There are some fancy ways to do this, but I thought it might be helpful to combine a few basic techniques to accomplish this task. Usually, in your question, you should include some easy way to generate your data, like this:
# Create some sample data
set.seed(1)
id<-rep(1:50)
rater_1<-sample(1:5,50,replace=TRUE)
df1<-data.frame(id,rater_1)
id<-rep(1:50,each=2)
rater_2<-sample(1:5,100,replace=TRUE)
df2<-data.frame(id,rater_2)
Now, here is one simple technique for doing this.
# Merge together the data frames.
all.merged<-merge(df1,df2)
# id rater_1 rater_2
# 1 1 2 3
# 2 1 2 5
# 3 2 2 3
# 4 2 2 2
# 5 3 3 1
# 6 3 3 1
# Find the ones that are equal.
same.rating<-all.merged[all.merged$rater_2==all.merged$rater_1,]
# Consider id 44, sometimes they match twice.
# So remove duplicates.
same.rating<-same.rating[!duplicated(same.rating),]
# Find the ones that never matched.
not.same.rating<-all.merged[!(all.merged$id %in% same.rating$id),]
# Pick one. I chose to pick the maximum.
picked.rating<-aggregate(rater_2~id+rater_1,not.same.rating,max)
# Stick the two together.
result<-rbind(same.rating,picked.rating)
result<-result[order(result$id),] # Sort
# id rater_1 rater_2
# 27 1 2 5
# 4 2 2 2
# 33 3 3 1
# 44 4 5 3
# 281 5 2 4
# 11 6 5 5
A fancy way to do this would be like this:
same.or.random<-function(x) {
matched<-which.min(x$rater_1==x$rater_2)
if(length(matched)>0) x[matched,]
else x[sample(1:nrow(x),1),]
}
do.call(rbind,by(merge(df1,df2),id,same.or.random))

Resources