Split a dataset into a list of dataframes with equal number of columns - r

I have a data set with 36 columns and single observation. I want to split it into a list with each dataframe having 3 columns and then rbind them into a single data frame.
I have been using the following code:
m=12
nc<-ncol(df)
df1<-lapply(split(as.list(df), cut(1:nc, m, labels = FALSE)), as.data.frame)
df1<-do.call("rbind",df1)
This code is working. But the problem comes when I try to run this code in shiny app.
Can someone suggest a replacement for above code

We can split the one row dataframe by generating a specific sequence
do.call("rbind", split(c(t(df)), rep(seq(1, ncol(df)/3), each = 3)))
where
rep(seq(1, ncol(df)/3), each = 3)
would generate
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8
9 9 9 10 10 10 11 11 11 12 12 12

Related

cbind a column containing lists maintaining column of lists

I am trying to cbind a dataframe to a column, where some of the rows of that column are made up of lists. For example:
df1 <- tibble(x1=c(1,2,3))
df2 <- tibble(x2=c(4,5,6),
x.list=list(list(7,8),9,list(10,11,12)))
But when I try to cbind just the column of lists:
df3 <- cbind(df1,df2$x.list)
I get an unnested(?) version of xlist:
7:30:10> df3
x1 7 8 9 10 11 12
1 1 7 8 9 10 11 12
2 2 7 8 9 10 11 12
3 3 7 8 9 10 11 12
How can I cbind the column of lists and maintain it as a single column of lists?
You can try this based on the deference between $ and [ where the later returns a list
df3 <- cbind(df1,df2[2])
output
x1 x.list
1 1 7, 8
2 2 9
3 3 10, 11, 12
we can see that class(df3$x.list) is list

How to split one vector in to multiple vectors with pattern in R

I have a vector and I want to split it into multiple vectors with some pattern. For example:
a table x with a vector of 14 numbers like:
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
I want to create a new table with multiple vectors based on above vector
n=2,m1=3,m2=4
column n=2, for column1:row=n*m1 and column2: row= n*m2 (Here, the number could be variables)
1 7
2 8
3 9
4 10
5 11
6 12
13
14
Many thanks
markus's solution is correct. Many thanks.
n <- 2; split(1:14, rep(1:2, n*3:4))

Select the variables in one dataframe from a list in another dataframe

I have a big data with large number of columns and rows. I want to subset few columns in df1 from a list of variables (the name of the columns in df1) in df2. Just for example, I have
df1 <- data.frame(A=sample(1:10, 10), B=sample(1:10, 10), C=sample(1:10,10), D=sample(1:10,10))
var <- c('A','C')
ratio <- c(0.5,0.6)
df2 <- data.frame(var,ratio)
New dataframe should look like this:
A C
1 9 2
2 1 3
3 4 5
4 2 8
5 10 7
6 5 1
7 7 9
8 3 4
9 8 10
10 6 6
We need to convert the factor variable 'var' to character class for subsetting the first dataset
df1[as.character(df2$var)]

R Looking up closest value in data.frame less than equal to another value

I have two data.frames, lookup_df and values_df. For each row in lookup_df I want to lookup the closest value in the values_df that is less than or equal to an index value.
Here's my code so far:
lookup_df <- data.frame(ids = 1:10)
values_df <- data.frame(idx = c(1,3,7), values = c(6,2,8))
What I'm wanting for the result_df is the following:
> result_df
ids values
1 1 6
2 2 6
3 3 2
4 4 2
5 5 2
6 6 2
7 7 8
8 8 8
9 9 8
10 10 8
I know how to do this with SQL fairly easily but I'm curious if there is an R way that is straightforward. I could iterate the the rows of the lookup_df and then loop through the rows of the values_df but that is not computationally efficient. I'm open to using dplyr library if someone knows how to use that to solve the problem.
If values_df is sorted by idx ascending, then findInterval will work:
lookup_df <- data.frame(ids = 1:10)
values_df <- data.frame(idx = c(1,3,7), values = c(6,2,8))
lookup_df$values <- values_df$values[findInterval(lookup_df$ids,values_df$idx)]
lookup_df
> ids values
1 1 6
2 2 6
3 3 2
4 4 2
5 5 2
6 6 2
7 7 8
8 8 8
9 9 8
10 10 8

Excel OFFSET function in r

I am trying to simulate the OFFSET function from Excel. I understand that this can be done for a single value but I would like to return a range. I'd like to return a group of values with an offset of 1 and a group size of 2. For example, on row 4, I would like to have a group with values of column a, rows 3 & 2. Sorry but I am stumped.
Is it possible to add this result to the data frame as another column using cbind or similar? Alternatively, could I use this in a vectorized function so I could sum or mean the result?
Mockup Example:
> df <- data.frame(a=1:10)
> df
a
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
> #PROCESS
> df
a b
1 1 NA
2 2 (1)
3 3 (1,2)
4 4 (2,3)
5 5 (3,4)
6 6 (4,5)
7 7 (5,6)
8 8 (6,7)
9 9 (7,8)
10 10 (8,9)
This should do the trick:
df$b1 <- c(rep(NA, 1), head(df$a, -1))
df$b2 <- c(rep(NA, 2), head(df$a, -2))
Note that the result will have to live in two columns, as columns in data frames only support simple data types. (Unless you want to resort to complex numbers.) head with a negative argument cuts the negated value of the argument from the tail, try head(1:10, -2). rep is repetition, c is concatenation. The <- assignment adds a new column if it's not there yet.
What Excel calls OFFSET is sometimes also referred to as lag.
EDIT: Following Greg Snow's comment, here's a version that's more elegant, but also more difficult to understand:
df <- cbind(df, as.data.frame((embed(c(NA, NA, df$a), 3))[,c(3,2)]))
Try it component by component to see how it works.
Do you want something like this?
> df <- data.frame(a=1:10)
> b=t(sapply(1:10, function(i) c(df$a[(i+2)%%10+1], df$a[(i+4)%%10+1])))
> s = sapply(1:10, function(i) sum(b[i,]))
> df = data.frame(df, b, s)
> df
a X1 X2 s
1 1 4 6 10
2 2 5 7 12
3 3 6 8 14
4 4 7 9 16
5 5 8 10 18
6 6 9 1 10
7 7 10 2 12
8 8 1 3 4
9 9 2 4 6
10 10 3 5 8

Resources