I'm looking to do the simple following task;
Imagine a dataframe with 3 rows and 5 columns;
X1 X2 X3 X4 X5
1 1 2 3 4 5
2 3 4 5 6 7
3 2 3 4 5 6
I want to add another constant row to only the last 4 columns
X1 X2 X3 X4 X5
1 1 2 3 4 5
2 3 4 5 6 7
3 2 3 4 5 6
4 5 5 5 5
How may I be able to accomplish this? thank you in advance!
Assuming your original dataframe is called df and you have data of length ncol(df)-1 in a vector called data.to.add:
rbind(df, c(NA, data.to.add))
here is a function:
your input data:
df<-
fread(" X1 X2 X3 X4 X5
1 1 2 3 4 5
2 3 4 5 6 7
3 2 3 4 5 6")
Example data:
a=df;b=rep(5,4)
function:
rbind.filling <- function(a,b) {
n_col<-max(ncol(a),ncol(b))
if(is.null(dim(a))) {a<-matrix(c(rep(NA,n_col-length(a)),a),nrow=1);;colnames(a)<-names(b)} else {
b<-matrix(c(rep(NA,n_col-length(b)),b),nrow=1);colnames(b)<-names(a)}
return(rbind(a,b))
}
call function:
rbind.filling(a,b)
# V1 V2 V3 V4 V5 V6
# 1: 1 1 2 3 4 5
# 2: 2 3 4 5 6 7
# 3: 3 2 3 4 5 6
# 4: NA NA 5 5 5 5
rbind.filling(b,a)
# V1 V2 V3 V4 V5 V6
# 1: NA NA 5 5 5 5
# 2: 1 1 2 3 4 5
# 3: 2 3 4 5 6 7
# 4: 3 2 3 4 5 6
Related
I would like to make a loop that extracts every 3 column from my dataframe. The original dataframe length is not fix, so it could be 30 column or 12 long. The only thing what is fix that there will be 3 columns after each other I want to extract and put it into a list of dataframes.
For example:
KO5_1 KO5_2 KO5_3 KO9_1
1 1 3 3
2 0 0 3
2 2 3 0
0 0 1 2
and I want the KO5_1 KO5_2 KO5_3 to be extracted and put together in a separate df. The KO9 and the other 2 of its kind will go another df. The names are changing its just an example.
How can I make this loop?
Thank you!
Use split.default, with gl(n, 3) to split every three columns. Or gl(3, 1) if you wanna get three groups of one column every three columns.
df <- as.data.frame(replicate(10, sample(5)))
names(df) <- paste0("K0", 1:10)
n = ceiling(ncol(df) / 3)
split.default(df, gl(n, 3))
$`1`
K01 K02 K03
1 2 3 3
2 4 1 5
3 3 2 2
4 1 5 1
5 5 4 4
$`2`
K04 K05 K06
1 5 4 4
2 4 3 2
3 2 5 1
4 3 2 3
5 1 1 5
$`3`
K07 K08 K09
1 3 4 1
2 5 5 3
3 1 1 2
4 4 3 5
5 2 2 4
$`4`
K010
1 3
2 5
3 1
4 4
5 2
Let's use a sample data frame. The names aren't important.
set.seed(1)
df <- as.data.frame(replicate(10, sample(5)))
df
#> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#> 1 1 5 3 2 1 1 2 1 2 4
#> 2 4 3 5 5 3 2 5 4 5 2
#> 3 3 4 1 4 5 5 1 3 3 1
#> 4 5 2 4 3 4 3 4 2 4 3
#> 5 2 1 2 1 2 4 3 5 1 5
To get this into a list of 3 data frames comprising every 3rd column, we can do:
n <- 3
lapply(seq(n) %% n, function(i) df[seq_along(df) %% n == i])
#> [[1]]
#> V1 V4 V7 V10
#> 1 1 2 2 4
#> 2 4 5 5 2
#> 3 3 4 1 1
#> 4 5 3 4 3
#> 5 2 1 3 5
#>
#> [[2]]
#> V2 V5 V8
#> 1 5 1 1
#> 2 3 3 4
#> 3 4 5 3
#> 4 2 4 2
#> 5 1 2 5
#>
#> [[3]]
#> V3 V6 V9
#> 1 3 1 2
#> 2 5 2 5
#> 3 1 5 3
#> 4 4 3 4
#> 5 2 4 1
You can change n to get every 4th column, 5th column, etc.
I think the title says it all
Let's jump to the example
Imagine I have a vector (the contents of which are not relevant for this example)
aux<-c(1:5)
I need to create a data frame that has the same vector repeating itself n times (n can vary, sometimes it is 8 times, sometimes it is 7)
I did it like this for repeating itself 8 times:
aux.df<-data.frame(aux,aux,aux,aux,aux,aux,aux,aux)
This got me the result I wanted but you can see why it's not an ideal way...
is there a package, function, way to tell R to repeat the vector 'aux' 8 times?
I also tried creating a matrix and then transforming it into a data frame but that didn't work and I got a weird data frame with vectors inside of each cell...
what I tried that didn't work:
aux.df<- as.data.frame(matrix(aux, nrows=5, ncol=8))
Using replicate().
as.data.frame(replicate(8, aux))
# V1 V2 V3 V4 V5 V6 V7 V8
# 1 1 1 1 1 1 1 1 1
# 2 2 2 2 2 2 2 2 2
# 3 3 3 3 3 3 3 3 3
# 4 4 4 4 4 4 4 4 4
# 5 5 5 5 5 5 5 5 5
parameters
aux<-c(1:5)
n<-8
vector aux repeated as columns
aux.df<-as.data.frame(matrix(rep(aux,n),ncol=n,byrow = F))
vector aux repeated as rows
aux.df<-as.data.frame(matrix(rep(aux,n),nrow=n,byrow = T))
Here are some possible opitons
> data.frame(aux = aux)[rep(1, 8)]
aux aux.1 aux.2 aux.3 aux.4 aux.5 aux.6 aux.7
1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
> data.frame(kronecker(t(rep(1, 8)), aux))
X1 X2 X3 X4 X5 X6 X7 X8
1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
> data.frame(outer(aux, rep(1, 8)))
X1 X2 X3 X4 X5 X6 X7 X8
1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
> list2DF(rep(list(aux), 8))
1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
I would like to lag multiple specific columns of a data frame in R.
Let's take this generic example. Let's assume I have defined which columns of my dataframe I need to lag:
Lag <- c(0, 1, 0, 1)
Lag.Index <- is.element(Lag, 1)
df <- data.frame(x1 = 1:8, x2 = 1:8, x3 = 1:8, x4 = 1:8)
My initial dataframe:
x1 x2 x3 x4
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
I would like to compute the following dataframe:
x1 x2 x3 x4
1 1 NA 1 NA
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
I would know how to do it for only one lagged column as shown here, but not able to find a way to do it for multiple lagged columns in an elegant way. Any help is very much appreciated.
You can use purrr's map2_dfc to lag different values by column.
purrr::map2_dfc(df, Lag, dplyr::lag)
# x1 x2 x3 x4
# <int> <int> <int> <int>
#1 1 NA 1 NA
#2 2 1 2 1
#3 3 2 3 2
#4 4 3 4 3
#5 5 4 5 4
#6 6 5 6 5
#7 7 6 7 6
#8 8 7 8 7
Or with data.table :
library(data.table)
setDT(df)[, names(df) := Map(shift, .SD, Lag)]
A data.table option using shift along with Vectorize
> setDT(df)[, Vectorize(shift)(.SD, Lag)]
x1 x2 x3 x4
[1,] 1 NA 1 NA
[2,] 2 1 2 1
[3,] 3 2 3 2
[4,] 4 3 4 3
[5,] 5 4 5 4
[6,] 6 5 6 5
[7,] 7 6 7 6
[8,] 8 7 8 7
Not sure whether this is elegant enough, but I would use dplyr's mutate_at function to tweak columns
df %>% dplyr::mutate_at(.vars = vars(x2,x4),.funs = ~lag(., default = NA))
We convert the lag to logical class, get the corresponding names and use across from dplyr
library(dplyr)
df %>%
mutate(across(names(.)[as.logical(Lag)], lag))
# x1 x2 x3 x4
#1 1 NA 1 NA
#2 2 1 2 1
#3 3 2 3 2
#4 4 3 4 3
#5 5 4 5 4
#6 6 5 6 5
#7 7 6 7 6
#8 8 7 8 7
Or we can do this in base R
df[as.logical(Lag)] <- rbind(NA, df[-nrow(df), as.logical(Lag)])
I have this dataframe:
> df <- data.frame(Semester = sample(1:4, 20, replace=TRUE),
X1 = sample(c(1:7,NA), 20, replace =TRUE),
X2 = sample(c(1:7,NA), 20, replace =TRUE),
X3 = sample(c(1:7,NA), 20, replace =TRUE),
X4 = sample(c(1:7,NA), 20, replace =TRUE),
X5 = sample(c(1:7,NA), 20, replace =TRUE),
X6 = sample(c(1:7,NA), 20, replace =TRUE),
X7 = sample(c(1:7,NA), 20, replace =TRUE),
stringsAsFactors = FALSE)
> df
Semester X1 X2 X3 X4 X5 X6 X7
1 4 3 7 NA NA 1 2 7
2 3 NA 3 NA 4 3 2 6
3 1 2 5 3 4 7 NA 2
4 3 1 1 6 1 3 2 4
5 1 1 2 1 3 2 6 5
6 2 1 7 1 5 2 2 6
7 4 7 6 5 2 7 1 2
8 1 5 5 7 4 5 1 5
9 1 3 1 1 5 6 3 7
10 3 6 NA 1 1 5 NA 2
11 1 1 6 6 6 3 5 7
12 3 1 5 1 2 3 1 NA
13 4 1 4 1 1 5 6 1
14 1 5 4 4 NA 5 3 3
15 2 2 NA 4 1 1 5 4
16 3 6 7 6 7 3 3 7
17 1 1 2 4 5 4 5 3
18 4 4 7 7 6 NA 4 NA
19 3 4 2 3 4 4 3 5
20 2 1 NA 3 5 7 NA 6
And I'm trying to get this output, where n_* is the count for the number n_* for the all X* variables. For example, n_7 for Semester==1 is the count where X* values are 7 (This output is just referential, the values are artificial).
Semester n_7 n_6 n_5 n_4 n_3 n_2 n_1
1 5 7 1 5 7 7 7
2 4 10 1 3 6 3 4
3 5 5 2 5 3 3 2
4 3 9 10 5 7 0 0
I triedby(), but it counts the values of Semester also. Is there another way to do this?:
by(df, df$Semester,function(df){
count_if(eq(7), df)
count_if(eq(6), df)
count_if(eq(5), df)
count_if(eq(4), df)
count_if(eq(3), df)
count_if(eq(2), df)
count_if(eq(1), df)})
You could use a dcast() melt() approach.
library(data.table)
dcast(melt(df, "Semester"), Semester ~ value, fun=length)[-9]
# Semester 1 2 3 4 5 6 7
# 1 1 5 8 10 2 7 8 4
# 2 2 8 6 7 2 5 2 5
# 3 3 2 1 4 3 2 4 5
# 4 4 1 1 3 4 7 2 8
as I have a dataframe like this:
participant v1 v2 v3 v4 v5 v6
1 4 2 9 7 2
2 6 8 1
3 5 4 5
4 1 1 2 3
Every two consecutive variables (v1 and v2, v3 and v4, v5 and v6) belong to each other (this is what I call "count" later).
I desperatly search a way to get the following:
participant count v(odd numbers) v(even numbers)
1 1 4 2
2 9
3 7 2
2 1 6
2 8
3 1
3 1
2 5 4
3 5
4 1 1 1
2 2
3 3
As this is my first question on stackoverflow ever, I hope you understand my request. I searched a lot for similar problems (and solutions to them) but found nothing. I would very much appreciate your support.
We can use melt
library(data.table)
melt(setDT(d1), measure = list(paste0("v", seq(1, 6, by= 2)),
paste0("v", seq(2,6, by = 2))))[order(participant)]
# participant variable value1 value2
# 1: 1 1 4 2
# 2: 1 2 NA 9
# 3: 1 3 7 2
# 4: 2 1 NA 6
# 5: 2 2 8 NA
# 6: 2 3 NA 1
# 7: 3 1 NA NA
# 8: 3 2 5 4
# 9: 3 3 NA 5
#10: 4 1 1 1
#11: 4 2 NA 2
#12: 4 3 3 NA