Save datasets with different names in for loop in R - r

I am trying to implement the following:
dataset_id_1 = subset(data, id == 1)
dataset_id_2 = subset(data, id == 2)
dataset_id_3 = subset(data, id == 3)
However, I need to do this for more than 100 IDs. I encounter a problem in generating the name of the dataset on the left. I tried the following:
for (i in 1:120) {
dataset_id_[[i]] = subset(data, id == i)
}
Do you know how to save the name of the dataset according to the specified id?
Thank you so much

Try split + list2env like below
lst <- split(volping, volping$id)
list2env(setNames(lst,paste0("dataset_id_",names(lst))), .GlobalEnv)

Try this:
#List
List <- list()
#Loop
for (i in 1:120) {
List[[i]] = subset(data, id == i)
}
#Names
names(List) <- paste0('dataset_id_',1:length(List))
#Set to envir
list2env(List,envir = .GlobalEnv)

Try this
for (i in 1:120) {
assign(paste("dataset_id_", i), subset(data, id == i) )
}

Related

Filtering values in a list using R

I am working on a project in which I need to filter my list if it has certain values for each of my IDs but unfortunately, it isn't working.
So I have one list with 12 different matrixes with the same columns but diff
library(tidyverse)
trajectory_C <- list()
trajectory_D <- list()
file_list_C <- list.files(pattern=".trajectory_C.csv")
file_list_D <- list.files(pattern=".trajectory_D.csv")
for (i in 1:length(file_list_C)) {
trajectory_C[[i]] <- read.csv(file_list_C[i])
}
for (i in 1:length(file_list_D)) {
trajectory_D[[i]] <- read.csv(file_list_D[i])
}
So my two lists are trajectory_D and trajectory_C. I then created two other lists and I saved the unique values of a certain column called "ID" and added a validation column to it.
unique_ID_C <- list()
unique_ID_D <- list()
for (i in 1:12) {
unique_ID_C[[i]] <- unique(trajectory_C[[i]]["ID"])
unique_ID_D[[i]] <- unique(trajectory_D[[i]]["ID"])
}
for (i in 1:12) {
Turning <- matrix(data=0,nrow = length(unique_ID_C[[i]]), ncol = 1)
unique_ID_C[[i]] <- cbind(unique_ID_C[[i]],Turning)
names(unique_ID_C[[i]]) <- c("ID","Validation")
}
What I want to do right now is understand if each of my unique values has certain elements (28 and 29) in the variable "Segment". For all the twelve different levels of my list.
for (i in 1:12) {
for (ID in unique_ID_C[[1]]) {
c <- unique(trajectory_C[[i]][trajectory_C[[i]]["ID"] == ID,"Segment"])
unique_ID_C[[i]][unique_ID_C[[i]]["ID"] == ID,2] <- ifelse(any(28 == c) == TRUE & any(29 == c) == TRUE,1,0)
}
}
I am new in programming and this is the first time I am using Lists so this might be my problem.

How do I make dataframes in a for loop in R?

I want to create dataframes in a for loop where every dataframe gets a value specified in a vector. It seems very simple but for some reason I cannot find the answer.
So what I want is something like this:
x <- c(1,2,3)
for (i in x) {
df_{{i}} <- ""
return df_i
}
The result I want is:
df_1
df_2
df_3
So df_{{i}} should be something else but I don't know what.
EDIT: I have solved my problem by creating a list of lists like this:
function_that_creates_model_output <- function(var) {
output_function <- list()
output_function$a <- df_a %>% something(var)
output_function$b <- df_b %>% something(var)
return(output_function)
}
meta_output <- list()
for (i in x) {
meta_output[[i]] <- function_that_creates_model_output(var = i)
}
One solution would be to use the function assign
x <- c(1,2,3)
for (i in x) {
assign(x = paste0("df_",i),value = NULL)
}

Organizing the files under the group name

The data I have look like in a list:
G085_1.csv, G085_2.csv, G085_3.csv, .. G100_1.csv, G100_2.csv, .. G173_1, csv., G173_2, csv., G173_3.csv
where G stands for the group followed by the identification of each group member (1, 2, or 3). Notably, some groups do not have all three members.
What I'm trying to do is to create a loop for running the following code (an example for 1 group) for the entire groups.
i1 <- fread("sample/G085_1.csv")
i2 <- fread("sample/G085_2.csv")
i3 <- fread("sample/G085_3.csv")
What I have been doing is:
Groups <- c()
for(g in 85:173){
Groups[g] <- ifelse(g<100,
paste0("G0", g),
paste0("G", g))
}
Members <- c("i1", "i2", "i3")
for(g in 1:length(Groups)){
for(m in 1:3) {
filename<- paste0("i",m)
wd <- paste0("sample/", Groups[g],"_",
m, ".csv")
ifelse(file.exists(wd),assign(filename,fread(wd)),
function(){})
}
assign(Groups[g],...
)
}
The place that I'm stuck in is the last part (assign(Groups[g]...). I'm not sure what would allow for calling in all the i1, i2, i3 dataframes for each group under the group. Is there a better way than using assign function here?
This code is not exactly assign to i1, i2, i3 but will gave you a list for your work with the Group names assgined to the list items. Each group item is a list contain three data.frame read from the files. In case the file not exists the item will be NULL.
Using foreach approach
library(foreach)
list_data <- foreach(g = Groups, .final = function(x) { setNames(x, Groups) }) %do% {
current_group <- foreach(m = 1:3) %do% {
filename<- paste0("i",m)
wd <- paste0("sample/", Groups[g],"_",
m, ".csv")
data <- ifelse(file.exists(wd) , fread(wd), NULL)
return(data)
}
return(current_group)
}
Using purrr map
library(purrr)
item_index <- c(1:3)
all_group_data <- map(.x = Groups, .f = function(g) {
list_files <- paste0("sample/", g,"_", item_index, ".csv")
group_data <- map(.x = list_files, .f = function(file) {
if (file.exists(file)) {
data <- fread(file)
} else {
data <- NULL
}
data
})
group_data
})
names(all_group_data) <- Groups

How to rewrite as a loop when I have identical frames for different years and the year is in the name?

I am new, so this question is a bit basic, but it might help others get a good start as well...
How to rewrite the below as a loop and have it include the years in the new names, as below...
DFNUM2011 = DF2011[,!(names(DF2011) %in% mydummies)]
DFNUM2012 = DF2012[,!(names(DF2012) %in% mydummies)]
DFNUM2013 = DF2013[,!(names(DF2013) %in% mydummies)]
I tried
df.list<-list("2011","2012","2013")
> for (i in df.list){
+ DFNUM[[i]] = DF[[i]][,!(names(DF2011) %in% mydummies)]
+ }
Error in DF : object 'DF' not found
This can work:
#List
List <- list(DFNUM2011,DFNUM2012,DFNUM2013)
#Loop
for (i in seq_along(List))
{
List[[i]] = List[[i]][,!(names(List[[i]]) %in% mydummies)]
}
A working example can be:
#Example
List <- list(iris,mtcars)
mydummies <- c('Species','mpg')
#Loop
for (i in seq_along(List))
{
List[[i]] = List[[i]][,!(names(List[[i]]) %in% mydummies)]
}
And a more compact way without loops:
#Code
List <- lapply(List, function(x) {x<-x[,!names(x) %in% mydummies]})
You can use :
library(purrr)
n <- 2011:2013
result <- map(mget(paste0('DF', n)), ~keep(.x, !(names(.x) %in% mydummies)))
If you want to create new dataframes with different names in your global environment.
names(result) <- paste0('DFNUM', n)
list2env(result, .GlobalEnv)
This should create DFNUM2011, DFNUM2012 and DFNUM2013 dataframes.

R loop to create data frames with 2 counters

What I want is to create 60 data frames with 500 rows in each. I tried the below code and, while I get no errors, I am not getting the data frames. However, when I do a View on the as.data.frame, I get the view, but no data frame in my environment. I've been trying for three days with various versions of this code:
getDS <- function(x){
for(i in 1:3){
for(j in 1:30000){
ID_i <- data.table(x$ID[j: (j+500)])
}
}
as.data.frame(ID_i)
}
getDS(DATASETNAME)
We can use outer (on a small example)
out1 <- c(outer(1:3, 1:3, Vectorize(function(i, j) list(x$ID[j:(j + 5)]))))
lapply(out1, as.data.table)
--
The issue in the OP's function is that inside the loop, the ID_i gets updated each time i.e. it is not stored. Inorder to do that we can initialize a list and then store it
getDS <- function(x) {
ID_i <- vector('list', 3)
for(i in 1:3) {
for(j in 1:3) {
ID_i[[i]][[j]] <- data.table(x$ID[j:(j + 5)])
}
}
ID_i
}
do.call(c, getDS(x))
data
x <- data.table(ID = 1:50)
I'm not sure the description matches the code, so I'm a little unsure what the desired result is. That said, it is usually not helpful to split a data.table because the built-in by-processing makes it unnecessary. If for some reason you do want to split into a list of data.tables you might consider something along the lines of
getDS <- function(x, n=5, size = nrow(x)/n, column = "ID", reps = 3) {
x <- x[1:(n*size), ..column]
index <- rep(1:n, each = size)
replicate(reps, split(x, index),
simplify = FALSE)
}
getDS(data.table(ID = 1:20), n = 5)

Resources