How to create a loop inside a loop in R - r

I am new to R and I need some help in relation loops. I need to produce a huge amount to tables from one data set and I think that a loop inside a loop will solve the problem, but I am having problems getting the right result.
Lets say I have the following data set:
var1 <- c("A","A","A","A","B","B","B","B")
var2 <- c(1,2,1,2,1,2,1,2)
df <- data.frame(var1,var2)
And I want to extract the data in 4 tables:
Result of "A" & 1
Result of "A" & 2
Result of "B" & 1
Result of "B" & 2
I have this loop, but I cannot get the 4 tables. Can anyone help!
for (i in df$var1) {
dummy<- df%>%filter(var1 == i)
for (j in dummy$var2) {
nTab <- paste0("tab_", j, sep ="")
assign(nTab, dummy%>%filter (var2 == j))
}
}

Expanding on #Gregor's comment, and the question here,
Save all data frames in list to separate .csv files,
you can use Map() with the split() function to output the newly created dataframes to individual csv files:
Code:
s=split(df, f = paste(df$var1, df$var2, sep = "_"))
Map(write.csv,s, paste0("table_",names(s),".csv"),row.names=F)
which will write the csv's to your current working directory, with the names "table_A_1.csv", etc based on the value of var1 and var2.

We can split the data frame into several data frames based on two columns and store them in a list.
df_list <- split(df, f = list(df$var1, df$var2))
df_list
# $A.1
# var1 var2
# 1 A 1
# 3 A 1
#
# $B.1
# var1 var2
# 5 B 1
# 7 B 1
#
# $A.2
# var1 var2
# 2 A 2
# 4 A 2
#
# $B.2
# var1 var2
# 6 B 2
# 8 B 2
To save the data frames in the list, we can further use the lapply function.
lapply(names(df_list), function(x) write.csv(df_list[[x]], paste0(x, ".csv"), row.names = FALSE))
Where df_list[[x]] is the way to access individual data frames based on the name. paste0(x, ".csv") is to construct file directory.

Related

Changing dataframes in Loop r

I have many dataframes that I would like to run though a code. Is there a way to change the dataframe name in a loop?
df01$x = rnorm(100)
df02$x = rnorm(100)+2
df03$x = rnorm(100)*2
dflist <- c("df01",
"df02",
"df03")
for (i in 1:length(dflist){
{
#complete tasks by changing df name in existing code
ifelse([[i]]$x > 0,1,[[i]]$x)
}
#I want to do this for a number of different fuctions, so it is best to change the df name before "$"
df[[i]]$Varible = aggregate(df$Varible, .. ,..)}
Consider interacting with collection of data frames in a list and not passing string literals of its names. Within an lapply() function, you can handle your operations, and then even convert each back into individual dataframe objects:
df01 <- data.frame(x=rnorm(100))
df02 <- data.frame(x=rnorm(100)+2)
df03 <- data.frame(x=rnorm(100)*2)
# LIST OF DATAFRAMES (NAMED WITH LITERALS)
dfList <- list(df01=df01, df02=df02, df03=df03)
head(dfList[[1]])
# x
# 1 1.3440091
# 2 0.5838570
# 3 -0.2377477
# 4 -1.5059733
# 5 -1.0605490
# 6 -0.8122972
# ITERATIVELY RUN COLUMN OPERATION
dfList <- lapply(dfList, function(df) {
df$x <- ifelse(df$x > 0, 1, df$x)
return(df)
})
# CONVERT LIST ITEMS INTO INDIVIDUAL ENVIRON OBJECTS
list2env(dfList, envir=.GlobalEnv)
head(df01)
# x
# 1 1.0000000
# 2 1.0000000
# 3 -0.2377477
# 4 -1.5059733
# 5 -1.0605490
# 6 -0.8122972

rename column in dataframe using variable name R

I have a number of data frames. Each with the same format.
Like this:
A B C
1 -0.02299388 0.71404158 0.8492423
2 -1.43027866 -1.96420767 -1.2886368
3 -1.01827712 -0.94141194 -2.0234436
I would like to change the name of the third column--C--so that it includes part if the name of the variable name associated with the data frame.
For the variable df_elephant the data frame should look like this:
A B C.elephant
1 -0.02299388 0.71404158 0.8492423
2 -1.43027866 -1.96420767 -1.2886368
3 -1.01827712 -0.94141194 -2.0234436
I have a function which will change the column name:
rename_columns <- function(x) {
colnames(x)[colnames(x)=='C'] <-
paste( 'C',
strsplit (deparse (substitute(x)), '_')[[1]][2], sep='.' )
return(x)
}
This works with my data frames. However, I would like to provide a list of data frames so that I do not have to call the function multiple times by hand. If I use lapply like so:
lapply( list (df_elephant, df_horse), rename_columns )
The function renames the data frames with an NA rather than portion of the variable name.
[[1]]
A B C.NA
1 -0.02299388 0.71404158 0.8492423
2 -1.43027866 -1.96420767 -1.2886368
3 -1.01827712 -0.94141194 -2.02344361
[[2]]
A B C.NA
1 0.45387054 0.02279488 1.6746280
2 -1.47271378 0.68660595 -0.2505752
3 1.26475917 -1.51739927 -1.3050531
Is there some way that I kind provide a list of data frames to my function and produce the desired result?
You are trying to process the data frame column names instead of the actual lists' name. And this is why it's not working.
# Generating random data
n = 3
item1 = data.frame(A = runif(n), B = runif(n), C = runif(n))
item2 = data.frame(A = runif(n), B = runif(n), C = runif(n))
myList = list(df_elephant = item1, df_horse = item2)
# 1- Why your code doesnt work: ---------------
names(myList) # This will return the actual names that you want to use : [1] "df_elephant" "df_horse"
lapply(myList, names) # This will return the dataframes' column names. And thats why you are getting the "NA"
# 2- How to make it work: ---------------
lapply(seq_along(myList), # This will return an array of indicies
function(i){
dfName = names(myList)[i] # Get the list name
dfName.animal = unlist(strsplit(dfName, "_"))[2] # Split on underscore and take the second element
df = myList[[i]] # Copy the actual Data frame
colnames(df)[colnames(df) == "C"] = paste("C", dfName.animal, sep = ".") # Change column names
return(df) # Return the new df
})
# [[1]]
# A B C.elephant
# 1 0.8289368 0.06589051 0.2929881
# 2 0.2362753 0.55689663 0.4854670
# 3 0.7264990 0.68069346 0.2940342
#
# [[2]]
# A B C.horse
# 1 0.08032856 0.4137106 0.6378605
# 2 0.35671556 0.8112511 0.4321704
# 3 0.07306260 0.6850093 0.2510791
You can also try. Somehow similar to Akrun's answer using also Map in the end:
# Your data
d <- read.table("clipboard")
# create a list with names A and B
d_list <- list(A=d, B=d)
# function
foo <- function(x, y){
gr <- which(colnames(x) == "C") # get index of colnames C
tmp <- colnames(x) #new colnames vector
tmp[gr] <- paste(tmp[gr], y, sep=".") # replace the old with the new colnames.
setNames(x, tmp) # set the new names
}
# Result
Map(foo, d_list, names(d_list))
$A
A B C.A
1 -0.02299388 0.7140416 0.8492423
2 -1.43027866 -1.9642077 -1.2886368
3 -1.01827712 -0.9414119 -2.0234436
$B
A B C.B
1 -0.02299388 0.7140416 0.8492423
2 -1.43027866 -1.9642077 -1.2886368
3 -1.01827712 -0.9414119 -2.0234436
We can try with Map. Get the datasets in a list (here we used mget to return the values of the strings in a list), using Map, we change the names of the third column with that of the corresponding vector of names.
Map(function(x, y) {names(x)[3] <- paste(names(x)[3], sub(".*_", "", y), sep="."); x},
mget(c("df_elephant", "df_horse")), c("df_elephant", "df_horse"))
#$df_elephant
# A B C.elephant
#1 -0.02299388 0.7140416 0.8492423
#2 -1.43027866 -1.9642077 -1.2886368
#3 -1.01827712 -0.9414119 -2.0234436
#$df_horse
# A B C.horse
#1 0.4538705 0.02279488 1.6746280
#2 -1.4727138 0.68660595 -0.2505752
#3 1.2647592 -1.51739927 -1.3050531

Loop rename variables in R with assign()

I am trying to rename a variable over several data frames, but assign wont work. Here is the code I am trying
assign(colnames(eval(as.name(DataFrameX)))[[3]], "<- NewName")
# The idea is, go through every dataset, and change the name of column 3 to
# "NewName" in all of them
This won't return any error (All other versions I could think of returned some kind of error), but it doesn't change the variable name either.
I am using a loop to create several data frames and different variables within each, now I need to rename some of those variables so that the data frames can be merged in one at a later stage. All that works, except for the renaming. If I input myself the names of the dataframe and variables in a regular call with colnames(DF)[[3]] <- "NewName", but somehow when I try to use assign so that it is done in a loop, it doesn't do anything.
Here is what you can do with a loop over all data frames in your environment. Since you are looking for just data frame in your environment, you are immune of the risk to touch any other variable. The point is that you should assign new changes to each data frame within the loop.
df1 <- data.frame(q=1,w=2,e=3)
df2 <- data.frame(q=1,w=2,e=3)
df3 <- data.frame(q=1,w=2,e=3)
# > df1
# q w e
# 1 1 2 3
# > df2
# q w e
# 1 1 2 3
# > df3
# q w e
# 1 1 2 3
DFs=names(which(sapply(.GlobalEnv, is.data.frame)))
for (i in 1:length(DFs)){
df=get(paste0(DFs[i]))
colnames(df)[3]="newName"
assign(DFs[i], df)
}
# > df1
# q w newName
# 1 1 2 3
# > df2
# q w newName
# 1 1 2 3
# > df3
# q w newName
# 1 1 2 3
We could try ?eapply() to apply setnames() from the data.table package to all data.frame's in your global enviromnent.
library(data.table)
eapply(.GlobalEnv, function(x) if (is.data.frame(x)) setnames(x, 3, "NewName"))

How do you delete the header in a dataframe?

I want to delete the header from a dataframe that I have. I read in the data from a csv file then I transposed it, but it created a new header that is the name of the file and the row that the data is from in the file.
Here's an example for a dataframe df:
a.csv.1 a.csv.2 a.csv.3 ...
x 5 6 1 ...
y 2 3 2 ...
I want to delete the a.csv.n row, but when I try df <- df[-1,] it deletes row x and not the top.
If you really, really, really don't like column names, you may convert your data frame to a matrix (keeping possible coercion of variables of different class in mind), and then remove the dimnames.
dd <- data.frame(x1 = 1:5, x2 = 11:15)
mm1 <- as.matrix(dd)
mm2 <- matrix(mm1, ncol = ncol(dd), dimnames = NULL)
I add my previous comment here as well:
?data.frame: "The column names should be non-empty, and attempts to use empty names will have unsupported results.".
Set names to NULL
names(df) <- NULL
You can also use the header option in read.csv
You can use names(df) to change the names of header or col names. If newnames is a list of names as newname<-list("col1","col2","col3"), then names(df)<-newname will give you a data with col names as col1 col2 col3.
As # Henrik said, the col names should be non-empty. Setting the names(df)<-NULLwill give NA in col names.
If your data is csv file and if you use header=TRUE to read the data in R then the data will have same colnames as csv file, but if you set the header=FALSE, R will assign the colnames as V1,V2,...and your colnames in the original csv file appear as a first row.
anydata.csv
a b c d
1 1 2 3 13
2 2 3 1 21
read.csv("anydata.csv",header=TRUE)
a b c d
1 1 2 3 13
2 2 3 1 21
read.csv("anydata.csv",header=FALSE)
V1 V2 V3 V4
1 a b c d
2 1 2 3 13
3 2 3 1 21
You could use
setNames(dat, rep(" ", length(dat)))
where dat is the name of the data frame. Then all columns will have the name " " and hence will be 'invisible'.
It comes with some years of delay but you can simply use a vector renaming de columns:
## if you want to delete all column names:
colnames(df)[] <- ""
## if you want to delete let's say column 1:
colnames(df)[1] <- ""
## if you want to delete 1 to 3 and 7:
colnames(df)[c(1:3,7)] <- ""
As already mentioned not having column names just isn't something that is going to happen with a data frame, but I'm kind of guessing that you don't care so much if they are there you just don't want to see them when you print your data frame? If so, you can write a new print function to get around that, like so:
> dat <- data.frame(var1=c("A","B","C"),var2=rnorm(3),var3=rnorm(3))
> print(dat)
var1 var2 var3
1 A 1.2771777 -0.5726623
2 B -1.5000047 1.3249348
3 C 0.1989117 -1.4016253
> ncol.print <- function(dat) print(matrix(as.matrix(dat),ncol=ncol(dat),dimnames=NULL),quote=F)
> ncol.print(dat)
[,1] [,2] [,3]
[1,] A 1.2771777 -0.5726623
[2,] B -1.5000047 1.3249348
[3,] C 0.1989117 -1.4016253
Your other option it set your variable names to unique amounts of whitespace, for example:
> names(dat) <- c(" ", " ", " ")
> dat
1 A 1.2771777 -0.5726623
2 B -1.5000047 1.3249348
3 C 0.1989117 -1.4016253
You can also write a function do this:
> blank.names <- function(dat){
+ for(i in 1:ncol(dat)){
+ names(dat)[i] <- paste(rep(" ",i),collapse="")
+ }
+ return(dat)
+ }
> dat <- data.frame(var1=c("A","B","C"),var2=rnorm(3),var3=rnorm(3))
> dat
var1 var2 var3
1 A -1.01230289 1.2740237
2 B -0.13855777 0.4689117
3 C -0.09703034 -0.4321877
> blank.names(dat)
1 A -1.01230289 1.2740237
2 B -0.13855777 0.4689117
3 C -0.09703034 -0.4321877
But generally I don't think any of this should be done.
A function that I use in one of my R scripts:
read_matrix <- function (csvfile) {
a <- read.csv(csvfile, header=FALSE)
matrix(as.matrix(a), ncol=ncol(a), dimnames=NULL)
}
How to call this:
iops_even <- read_matrix('even_iops_Jan15.csv')
iops_odd <- read_matrix('odd_iops_Jan15.csv')
You can simply do:
print(df.to_string(header=False))
if you want to remove the line indexes as well, you can do:
print(df.to_string(index=False,header=False))

Conditional lapply

So I have a bunch of data frames in a list object. Frames are organised such as
ID Category Value
2323 Friend 23.40
3434 Foe -4.00
And I got them into a list by following this topic. I can also run simple functions on them as shown in this topic.
Now I am trying to run a conditional function with lapply, and I'm running into trouble. In some tables the 'ID' column has a different name (say, 'recnum'), and I need to tell lapply to go through each data frame, check if there is a column named 'recnum', and change its name to 'ID', as in
colnr <- which(names(x) == "recnum"
if (length(colnr > 0)) {names(x)[colnr] <- "ID"}
But I'm running into trouble with local scope and who knows what. Any ideas?
Use the rename function from plyr; it renames by name, not position:
x <- data.frame(ID = 1:2,z=1:2)
y <- data.frame('recnum' = 1:2,z=3:4)
.list <- list(x,y)
library(plyr)
lapply(.list, rename, replace = c('recnum' = 'ID'))
[[1]]
ID z
1 1 1
2 2 2
[[2]]
ID z
1 1 3
2 2 4
Your original code works fine:
foo <- function(x){
colnr <- which(names(x) == "recnum")
if (length(colnr > 0)) {names(x)[colnr] <- "ID"}
x
}
.list <- list(x,y)
lapply(.list, foo)
Not sure what your problem was.
If you look at the second part of mnel's answer, you can see that the function foo evaluates x as its last expression. Without that, if you try to change the names of the data.frames in your list directly from within the anonymous function passed to lapply, it will likely not work.
Just as an alternative, you could use gsub and avoid loading an additional package (although plyr is a nice package):
xx <- list(data.frame("recnum" = 1:3, "recnum2" = 1:3),
data.frame("ID" = 4:6, "hat" = 4:6))
lapply(xx, function(x){
names(x) <- gsub("^recnum$", "ID", names(x))
return(x)
})
# [[1]]
# ID recnum2
# 1 1 1
# 2 2 2
# 3 3 3
# [[2]]
# ID hat
# 1 4 4
# 2 5 5
# 3 6 6

Resources