repeat Data N times in R - r

The data I have contain three variables. There are three unique IDs and each has multiple records.
ID <- c(rep(1,2), rep(2,1), rep(3,2))
y0 <- c(rep(5,2), rep(3,1), rep(1,2))
z0 <- c(rep(1,2), rep(13,1), rep(4,2))
dat1 <- data.frame(ID, y0,z0)
What I am trying to is repeat the whole data N times (N needs to be a parameter), and I need to add a new column with the repetition number.
So if N = 2, the new data look like:
rep <- c(rep(1,2), rep(2,2), rep(1,1), rep(2,1), rep(1,2), rep(2,2))
ID <- c(rep(1,4), rep(2,2), rep(3,4))
y0 <- c(rep(5,4), rep(3,2), rep(1,4))
z0 <- c(rep(1,4), rep(13,2), rep(4,4))
dat2 <- data.frame(rep, ID, y0,z0)

We replicate the sequence of rows and order it later to get the expected output
res <- cbind(rep = rep(seq_len(2), each = nrow(dat1)), dat1[rep(seq_len(nrow(dat1)), 2),])
resN <- res[order(res$ID),]
row.names(resN) <- NULL
all.equal(dat2, resN, check.attributes = FALSE)
#[1] TRUE
Or another option is to replicate into a list and then with Map create the 'rep' column (it is not recommended to have function names as column names, object names etc.) and rbind the list elements
res1 <- do.call(rbind, Map(cbind, rep = seq_len(2), replicate(2, dat1, simplify = FALSE)))
res2 <- res1[order(res1$ID),]
row.names(res2) <- NULL
all.equal(dat2, res2, check.attributes = FALSE)
#[1] TRUE

Related

How to write a loop for the following code in R?

for my script, just the finish loop is missing. Would be great if someone could help. Find attached the example dataset.
library(dplyr)
set.seed(94756)
mat1 <- matrix(sample(seq(-1,100, 0.11),70, replace = TRUE),ncol = 5)
mat1 <- as_tibble(mat1)
mat2 <- matrix(sample(seq(-1,100, 0.11),70, replace = TRUE),ncol = 5)
mat2 <- as_tibble(mat2)
mat2[3,1] <- NA
mat2[6,1] <- NA
mat3 <- matrix(sample(seq(-1,100, 0.11), 70,replace = TRUE),ncol = 5)
mat3 <- as_tibble(mat3)
mat3[4,1] <- NA
data <- list(mat1, mat2, mat3)
library(purrr)
data1 <- map(data, ~add_column(., V1_logical = between(.$V1, 20, 80), .after = 'V1'))
r_pre <- lapply(data1, "[", 2)
data2 <- lapply(data1, function(x) {x$V1_logical[x$V1_logical== TRUE] <- 1; x})
data3 <- lapply(data2, function(x) {x$V1_logical[x$V1_logical== FALSE] <- 0; x})
data4 <- map(data3, ~add_column(., ind = rleid(.$V1_logical), .after = "V1_logical"))
rfun <- function(x) with(rle(x$V1_logical), tibble(lengths, values))
rfun1 <- purrr::map_dfr(data4, rfun)
And then the follwing I would like to write within a loop:
marker <- as.numeric(min(which(rfun1$values == 1 & rfun1$lengths >= 3)))
rfun1 <- add_column(rfun1, marker = rfun1$values == 1 & rfun1$lengths >= 3, .after = "values")
data_drop <- rfun1[c(1:marker),]
data_drop_c <- as.numeric(sum(data_drop$lengths))
Then create the final dataframe by substract the datadrop, somehow like that for every dataframe within a list final_df <- data4[-c(1:data_drop_c), 4] #for all dataframes within list Because with rfun1 I put all dataframes together to one, I would like to have the loop to tell me the place of the marker, cut the data before away and count 4 rows after marker. Then the next dataframe in the list (starting with id of list two with counting until markerplace)
So therefore it would be helpful probably to add an ID for each dataframe for rfun1 as well, somehow like this (but it's for data4 here, wish the same for rfun1) ...
data5 <- bind_rows(data4, .id = "i") %>% group_by(i) %>% count(ind)
Here in data5 in "ind" I don't know if the number is standing for logical TRUE or FALSE. I am only searching for TRUE >=3. So if I could add an id col for rfun1 for every dataframe within a list and run the asked loop, it should work out.
Thanks in advance!

pairwise substring a data frame and save them in a list

# Example data
dat <- matrix(runif(2*300), ncol = 2, nrow = 20)
group <- rep_len(LETTERS[1:3], 20)
df <- cbind.data.frame(dat, Group = group)
# Greate subset groups
n <- levels(as.factor(group))
mylist <- combn(n, 2, simplify = FALSE)
I would like to subset my data according to pairwise combinations of the group attribute, and then save the result in mylist.
How can I do it?
Thanks a lot.
We can use subset with %in% after looping over the mylist
mylist2 <- lapply(mylist, function(x) subset(df, Group %in% x))
It can be also be done within combn by making use of FUN argument
combn(n, 2, FUN = function(x) subset(df, Group %in% x), simplify = FALSE)

list a number of data.frames for the output of an R function

I was wondering how I could get my function below to output a list of length(n.x), data.frames, with each data.frame having n.x[i] rows in it.
x is a vector, and n.x determines the number of elements to be extracted from x, and length(n.x) determines the number of separate sets to become data.frames.
For example, if x = 1:5 and n.x = c(2, 3), I want the first 2 elements of x(i.e., 1 and 2) to become one data.frame, and and the last 3 elements of x(i.e., 3, 4, 5) to become another data.frame. Then, I want these two data.frame to be outputted as a list.
Update: Some xs can have a long= T, and some long= F. After saving the call as an object (e.g., a), Can the user use a$study1$long to extract the xs for which long = T and a$study1$short to extract the xs for which long = F?
foo <- function(x, n.x, long) {
a <- x
data.frame(a)
}
a <- foo(1:4, c(1, 2, 1), long = c(T, F, T, T) )
a$study1$short
a$study1$long
We can use the 'n.x' to create a grouping vector to split the data.frame into a list of data.frames
foo <- function(x, n.x, long) {
d1 <- data.frame(a = x, long)
lst1 <- split(d1, list(rep(seq_along(n.x), n.x), long), drop = TRUE)
names(lst1) <- paste0("Study", seq_along(lst1))
lst1 <- lapply(lst1, `row.names<-`, NULL)
lapply(lst1, function(x) setNames(x, c("a", c("short", "long")[x$long[1] +1])))
}
foo(1:4, c(1, 2, 1), c(TRUE, FALSE, TRUE, TRUE))
#$Study1
# a short
#1 2 FALSE
#$Study2
# a long
#1 1 TRUE
#$Study3
# a long
#1 3 TRUE
#$Study4
# a long
#1 4 TRUE
If we need to pass one more vector ('nn') and assign it as row names
foo <- function(x, n.x, long, nn, rowName = "character") {
nn <- if(rowName == "character") {
nn
} else as.integer(factor(nn))
d1 <- data.frame(a = x, long)
row.names(d1) <- nn
lst1 <- split(d1, list(rep(seq_along(n.x), n.x), long), drop = TRUE)
names(lst1) <- paste0("Study", seq_along(lst1))
#lst1 <- lapply(lst1, `row.names<-`, NULL)
lapply(lst1, function(x)
setNames(x, c("a", c("short", "long")[x$long[1] +1])))
}
nn <- c("bigi, gigi, cigi", "fifi")
nn1 <- unlist(strsplit(nn, ", "))
foo(1:4, c(1, 2, 1), c(TRUE, FALSE, TRUE, TRUE), nn1, rowName = "integer")

rename mulitple datasets after applying a function in R

I am trying to apply a function to different dataframes. After doing that I want to get the resulting dataframe and save them keeping their original names and adding something else to differentiate the new dataframes.
This is what I've tried, which is obviously not working.
#Creating dummi data
N <- 8
df1 <- data.frame(x1 = rnorm(N), x2 = sample(1:10, size = N, replace = TRUE), x3 = 1*(runif(n = N) < .75))
df2 <- data.frame(y1 = rnorm(N), y2 = sample(100:200, size = N, replace = TRUE), y3 = runif(N))
df3 <- data.frame(z1 =rnorm(N), z2 = sample(8:80, size = N,replace = TRUE), Z3 = runif(N))
# Making a list of the three data frames
mydata <- list(df1=df1, df2=df2, df3= df3)
#Applying a function to mydata list
mydata2 <- lapply(mydata, function(x) mean(unlist(x)))
# Renaming each dataset
n <- 1:length(mydata2)
noms <- names(mydata2)
for (i in 1:n){
mynewlist <- lapply(mydata2, function(x) {names(x) <-("_mean", sep ="");
return(x))}
Please any help will be deeply apreciated.
We can use list2env if we need to create multiple objects in the global environment (though not recommended as most of the operations can be done within the list itself).
We change the names of the list by pasteing a suffix substring and then use list2env
list2env(setNames(mydata2, paste0(names(mydata2),
"_newname")), envir=.GlobalEnv)

r - find same times in n number of data frames

Consider the following example:
Date1 = seq(from = as.POSIXct("2010-05-03 00:00"),
to = as.POSIXct("2010-06-20 23:00"), by = 120)
Dat1 <- data.frame(DateTime = Date1,
x1 = rnorm(length(Date1)))
Date2 <- seq(from = as.POSIXct("2010-05-01 03:30"),
to = as.POSIXct("2010-07-03 22:00"), by = 120)
Dat2 <- data.frame(DateTime = Date2,
x1 = rnorm(length(Date2)))
Date3 <- seq(from = as.POSIXct("2010-06-08 01:30"),
to = as.POSIXct("2010-07-13 11:00"), by = 120)
Dat3Matrix <- matrix(data = rnorm(length(Date3)*3), ncol = 3)
Dat3 <- data.frame(DateTime = Date3,
x1 = Dat3Matrix)
list1 <- list(Dat1,Dat2,Dat3)
Here I build three data.frames as an example and placed them all into a list. From here I would like to write a routine that would return the 3 data frames but only keeping the times that were present in each of the others i.e. all three data frames should be reduced to the times that were consistent among all of the data frames. How can this be done?
zoo has a multi-way merge. This lapply's read.zoo over the components of list1 converting them each to zoo class. tz="" tells it to use POSIXct for the resulting date/times. It then merges the converted components using all=FALSE so that only intersecting times are kept.
library(zoo)
z <- do.call("merge", c(lapply(setNames(list1, 1:3), read.zoo, tz = ""), all = FALSE))
If we later wish to convert z to data.frame try dd <- cbind(Time = time(z), coredata(z)) but it might be better to keep it as a zoo object (or convert it to an xts object) so that further processing is simplified as well.
One approach is to find the respective indices and then subset accordingly:
idx1 <- (Dat1[,1] %in% Dat2[,1]) & (Dat1[,1] %in% Dat3[,1])
idx2 <- (Dat2[,1] %in% Dat1[,1]) & (Dat2[,1] %in% Dat3[,1])
idx3 <- (Dat3[,1] %in% Dat1[,1]) & (Dat3[,1] %in% Dat2[,1])
Now Dat1[idx1,], Dat2[idx2,], Dat3[idx3,] should give the desired result.
You could use merge:
res <- NULL
for (i in 2:length(list1)) {
dat <- list1[[i]]
names(dat)[2] <- paste0(names(dat)[2], "_", i);
dat[[paste0("id_", i)]] <- 1:nrow(dat)
if (is.null(res)) {
res <- dat
} else {
res <- merge(res, dat, by="DateTime")
}
}
I added columns with id's; you could use these to index the records in the original data.frames

Resources