dplyr calculate new columns in batch - r

I would like to add new columns to a data.frame using dplyr. One by one it is easy using mutate. However, I have a situation where I have a function that calculates several parameters based on some other column and I would like to add them to the table in one go. Suppose I have a function
f = function(x) {data.frame(A = x + 1, B = x + 2, C = x + 3)}
And I want to run this function against a column in a data.frame and add the results to the same data.frame, so
df = data.frame(x = 1:10)
df %>% XXX(f(x))
would result in data.frame like this:
x A B C
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
7 8 9 10
8 9 10 11
9 10 11 12
10 11 12 13
I know I have read about function like XXX in the example above, but I'm unable to find it right now. Anybody has hints?

We can use do
library(dplyr)
df %>%
do(data.frame(., f(.$x)))
# x A B C
#1 1 2 3 4
#2 2 3 4 5
#3 3 4 5 6
#4 4 5 6 7
#5 5 6 7 8
#6 6 7 8 9
#7 7 8 9 10
#8 8 9 10 11
#9 9 10 11 12
#10 10 11 12 13
Or
library(purrr)
df %>%
map_df(f) %>%
bind_cols(df, .)

Related

Aggregate single column in R

I have some data with only one variable, as following:
dd <- data.frame (
x=sample (1:10, 100, T)
)
I want to aggregate, for example count each occurance, but only with base package's functions
dd |>
transform(y=1) |>
do(aggregate(y~x, data=., FUN = \(x) length(x)))
is there any better solution?
Will this work:
as.data.frame(table(dd))
dd Freq
1 1 11
2 2 10
3 3 13
4 4 8
5 5 13
6 6 9
7 7 6
8 8 10
9 9 10
10 10 10

Binning a discrete variable (preferably in dplyr)

I would like to "bin" a large discrete variable by combining two consecutive rows into one bin. I would also like to call the bin by the first row value.
As an example:
x<-data.frame(x=c(1,2,3,4,5,6,7,8,9,10,11,12),
y=c(1,1,3,3,5,5,7,7,9,9,11,11))
x
We may use gl to create the grouping bin
library(dplyr)
x %>%
mutate(grp = as.integer(gl(n(), 2, n())))
x y grp
1 1 1 1
2 2 1 1
3 3 3 2
4 4 3 2
5 5 5 3
6 6 5 3
7 7 7 4
8 8 7 4
9 9 9 5
10 10 9 5
11 11 11 6
12 12 11 6
Performing the steps as you exactly outlined them would be this:
library(dplyr)
x %>%
mutate(bins = rep(1:(length(x) / 2), each = 2)) %>%
group_by(bins) %>%
filter(row_number() == 1) %>%
ungroup()
However this would give you the exact same result (without the bins column) in one line of code:
x[seq(1, nrow(x), by = 2), ]
Another way using seq and ceiling.
x$bin <- ceiling(seq(nrow(x))/2)
x
# x y bin
#1 1 1 1
#2 2 1 1
#3 3 3 2
#4 4 3 2
#5 5 5 3
#6 6 5 3
#7 7 7 4
#8 8 7 4
#9 9 9 5
#10 10 9 5
#11 11 11 6
#12 12 11 6

How to collect outputs of vector-valued function into a dataframe?

I have a function f1 that takes a number k as input and returns 3 numbers k, k+1, k+2. I would like to ask how to concatenate these results into a dataframe for k from 1 to 10. In this way, the line k corresponds to the output f1(k).
f1 <- function(k){
return (c(k, k+1, k+2))
}
f1(1)
f1(2)
An option is to Vectorize the function 'f1', pass the values 1 to 10, returns a matrix, and then convert it to data.frame with as.data.frame
as.data.frame(Vectorize(f1)(1:10))
If it needs to be vertical, then transpose the output and apply as.data.frame
as.data.frame(t(Vectorize(f1)(1:10)))
-output
# V1 V2 V3
#1 1 2 3
#2 2 3 4
#3 3 4 5
#4 4 5 6
#5 5 6 7
#6 6 7 8
#7 7 8 9
#8 8 9 10
#9 9 10 11
#10 10 11 12
Or we can use outer
as.data.frame(outer(1:10, 0:2, `+`))
You can also use:
as.data.frame(do.call(rbind,lapply(1:10,f1)))
Output:
as.data.frame(do.call(rbind,lapply(1:10,f1)))
V1 V2 V3
1 1 2 3
2 2 3 4
3 3 4 5
4 4 5 6
5 5 6 7
6 6 7 8
7 7 8 9
8 8 9 10
9 9 10 11
10 10 11 12

selecting common columns from different elements of a list

I have a data set in list format. The list is further divide into 20 elements. Each element contains 12 rows and some columns. Now I want to extract common columns from each element of the list and make a new data set. I try to make a reproducible example. Please see code
a<-data.frame(x=(1:10),y=(1:10),z=(1:10))
b<-data.frame(x=(1:10),y=(1:10),n=(1:10))
c<-data.frame(x=(1:10),y=(1:10),q=(1:10))
data<-list(a,b,c)
data1<-ldply(data)
required_data<-data1[,-3:-5]
Find the common columns using Reduce, subset them from list and bind them together
cols <- Reduce(intersect, lapply(data, colnames))
do.call(rbind, lapply(data, `[`, cols))
# x y
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 5
#6 6 6
#7 7 7
#8 8 8
#9 9 9
#10 10 10
#11 1 1
#...
The last step can also be performed using
purrr::map_df(data, `[`, cols)
with base R, you can fist find the names in common
commonName <- names((r<-table(unlist(Map(names,data))))[r>1])
then retrieve the columns from list and integrate (similar to the second step in the solution by #Ronak Shah)
res <- Reduce(rbind,lapply(data, '[',commonName))
which gives:
> res
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
11 1 1
12 2 2
13 3 3
14 4 4
15 5 5
16 6 6
17 7 7
18 8 8
19 9 9
20 10 10
21 1 1
22 2 2
23 3 3
24 4 4
25 5 5
26 6 6
27 7 7
28 8 8
29 9 9
30 10 10

How to generate an uneven sequence of numbers in R

Here's an example data frame:
df <- data.frame(x=c(1,1,2,2,2,3,3,4,5,6,6,6,9,9),y=c(1,2,3,4,6,3,7,8,6,4,3,7,3,2))
I want to generate a sequence of numbers according to the number of observations of y per x group (e.g. there are 2 observations of y for x=1). I want the sequence to be continuously increasing and jumps by 2 after each x group.
The desired output for this example would be:
1,2,5,6,7,10,11,14,17,20,21,22,25,26
How can I do this simply in R?
To expand on my comment, the groupings can be arbitrary, you simply need to recast it to the correct ordering. There are a few ways to do this, #akrun has shown that this can be accomplished using match function, or you can make use the the as.numeric function if this is easier to understand for yourself.
df <- data.frame(x=c(1,1,2,2,2,3,3,4,5,6,6,6,9,9),y=c(1,2,3,4,6,3,7,8,6,4,3,7,3,2))
# these are equivalent
df$newx <- as.numeric(factor(df$x, levels=unique(df$x)))
df$newx <- match(df$x, unique(df$x))
Since you now have a "new" releveling which is sequential, we can use the logic that was discussed in the comments.
df$newNumber <- 1:nrow(df) + (df$newx-1)*2
For this example, this will result in the following dataframe:
x y newx newNumber
1 1 1 1
1 2 1 2
2 3 2 5
2 4 2 6
2 6 2 7
3 3 3 10
3 7 3 11
4 8 4 14
5 6 5 17
6 4 6 20
6 3 6 21
6 7 6 22
9 3 7 25
9 2 7 26
where df$newNumber is the output you wanted.
To create the sequence 0,0,4,4,4,9,..., basically what you're doing is taking the minimum of each group and subtracting 1. The easiest way to do this is using the library(dplyr).
library(dplyr)
df %>%
group_by(x) %>%
mutate(newNumber2 = min(newNumber) -1)
Which will have the output:
Source: local data frame [14 x 5]
Groups: x
x y newx newNumber newNumber2
1 1 1 1 1 0
2 1 2 1 2 0
3 2 3 2 5 4
4 2 4 2 6 4
5 2 6 2 7 4
6 3 3 3 10 9
7 3 7 3 11 9
8 4 8 4 14 13
9 5 6 5 17 16
10 6 4 6 20 19
11 6 3 6 21 19
12 6 7 6 22 19
13 9 3 7 25 24
14 9 2 7 26 24

Resources