R, how to manipulate several dataframe at a time - r

If I have 5 data frames in the global environment, such as a,b,c,d,and e
I want the data frame a to be compared with e, and if R finds any common elements in a and e, delete the elements in a. then I want the data frame b to be compared with e and delete the common elements, and so on.
Actually I have 20 tables need to be compared with e.
Can anyone give some elegant way to handle this problem? I'm thinking of loop or functions but can't work the details out.
Thanks everybody and have a nice day!

The easiest would be to put all the dataframes you want to compare in a list, then use lapply to loop over this list:
# create list of data.frames
dlist <- list(df1 = data.frame(var1 = 1:10), df2 = data.frame(var1 = 11:20),
df3 = data.frame(var1 = 21:30), df4 = data.frame(var1 = 31:40))
# create master-data.frame
set.seed(1)
df <- data.frame(var1 = sample(1:100, 30))
# use lapply() to loop over the data and exclude all elements that are in the master-data.frame
dlist <- lapply(dlist, function(x){
x <- x[!x$var1 %in% df$var1, , drop = FALSE]
})
Result:
> dlist
$df1
var1
2 2
3 3
4 4
5 5
7 7
8 8
9 9
$df2
var1
1 11
2 12
3 13
4 14
5 15
8 18
$df3
var1
2 22
3 23
4 24
6 26
10 30
$df4
var1
1 31
3 33
5 35
6 36
8 38
9 39
10 40
If you absolutely need the dataframes in your global directory, you could use list2env:
list2env(dlist, envir = .GlobalEnv)

Related

R | Create a cross matrix

I don´t know how or where to start, but i hope someone can help. It´s the first time i´d use R like this, so even a keyword or a recommendation where to look it up would be helpful.
My dataframe looks like this:
set.seed(1)
df <- data.frame(
X = sample(c(1, 2, 3), 50, replace = TRUE),
Y = sample(c(1, 2, 3), 50, replace = TRUE))
And I would like to get a cross table like this:
using
length(which(df$X == & df$Y == ))
I could calculate the data with R and fill it in my Excel-sheet but there has to be a better option.
Thank you in advance.
Try this base R solution:
#Data
set.seed(1)
df <- data.frame(
X = sample(c(1, 2, 3), 50, replace = TRUE),
Y = sample(c(1, 2, 3), 50, replace = TRUE))
#Code
addmargins(table(df$X,df$Y))
Output:
1 2 3 Sum
1 6 7 5 18
2 4 6 9 19
3 5 5 3 13
Sum 15 18 17 50
You can also change the order of your variables like this:
#Code2
addmargins(table(df$Y,df$X))
Output:
1 2 3 Sum
1 6 4 5 15
2 7 6 5 18
3 5 9 3 17
Sum 18 19 13 50
In order to export to MS Excel, you use this code:
library(xlsx)
#Transform to dataframe
d1 <- as.data.frame.matrix(addmargins(table(df$X,df$Y)))
#Export
write.xlsx(d1,file='myexample.xlsx','Sheet1')
If the data have only two columns, just pass the data.frame object to table.
addmargins(table(df))
If the data include more than two columns, you can subset it's variable before passing to table().
addmargins(table(df[c("X", "Y")]))
You can also pass a formula to xtabs().
addmargins(xtabs( ~ X + Y, df))
All of above give
Y
X 1 2 3 Sum
1 5 6 3 14
2 2 6 6 14
3 13 4 5 22
Sum 20 16 14 50
To export the table to an excel file, you can use write.xlsx() from openxlsx.
library(openxlsx)
tab <- addmargins(xtabs( ~ X + Y, df))
write.xlsx(tab, "foo.xlsx")

Use a row as colname

I want to take a row and use it to set the colnames example below
df1a = data.frame(Customer = c("A", "a",1:8), Product = c("B", "b",11:18))
colnames(df1a)<-df1a[2,]
Expected output
a b
1 A B
2 a b
3 1 11
4 2 12
5 3 13
6 4 14
7 5 15
8 6 16
9 7 17
10 8 18
I think the problem is that df1a[2,] is a data frame
Here the columns are factor class as by default stringsAsFactors = TRUE in the data.frame call. So, the values that got changed are the integer storage values of the factor rather than the acutal values
df1a <- data.frame(Customer = c("A", "a",1:8),
Product = c("B", "b",11:18), stringsAsFactors = FALSE)
and then do the assignment
names(df1a) <- unlist(df1a[2,])
Or as #Ryan mentioned, unlist is not needed
names(df1a) <- df1a[2,]
You can change the names without generating a new data.frame:
names(df1a) <- lapply(df1a[2,], as.character)

R: Splitting one list into two lists under a condition

I've one large list (assume L) with 20 dataframes. The 20 data frames have only two different forms. They have 13 or 5 rows.
$foo1
a value
1 12 321.12
2 11 231.12
3 10 211.15
4 9 ...
5 8 ...
6 7 ...
7 6
8 5
9 4
10 3
11 2
12 1
13 0
$foo2
a value
1 4 19.52
2 3 98.91
3 2 97.67
4 1 ...
5 0 ...
I want to split the list into two lists with the following condition:
All data frames with the same row length should be stored in one list. As a result, I want a list of all data frames that have 5 rows and the other one should include all data frames with 13 rows.
We can do this by splitting on the number of rows. Create a grouping variable by looping through the list to get the number of rows ('grp')
grp <- sapply(L, nrow)
Then split the list 'L' by the grp
L1 <- split(L, grp)
If we need the list names to be 'month', 'quarter'
L1 <- split(L, setNames(c("month", "quarter"), c("13", "5"))[as.character(grp)])
data
set.seed(24)
L <- list(foo1 = data.frame(a = 1:13, value = rnorm(13)),
foo2 = data.frame(a = 1:5, value = rnorm(5)),
foo3 = data.frame(a = 1:13, value = rnorm(13)),
foo4 = data.frame(a = 1:5, value = rnorm(5)))
>

returning from list to data.frame after lapply

I have a very simply question about lapply. I am transitioning from STATA to R and I think there is some very basic concept that I am not getting about looping in R. But I have been reading about it all afternoon and can't figure out a reasonable way to do this very simple thing.
I have three data frames df1, df2, and df3 that all have the same column names, in the same order, etc.
I want to rename their columns all at once.
I put the data frames in a list:
dflist <- list(df1, df2, df3)
What I want the new names to be:
varlist <- c("newname1", "newname2", "newname3")
Write a function that replaces names with those in varlist, and lapply it over the data frames
ChangeNames <- function(x) {
names(x) <- varlist
return(x)
}
dflist <- lapply(dflist, ChangeNames)
So, as far as I understand, R has changed the names of the copies of the data frames that I put in the list, but not the original data frames themselves. I want the data frames themselves to be renamed, not the elements of the list (which are trapped in a list).
Now, I can go
df1 <- as.data.frame(dflist[1])
df2 <- as.data.frame(dflist[2])
df2 <- as.data.frame(dflist[3])
But that seems weird. You need a loop to get back the elements of a loop?
Basically: once you've put some data frames in a list and run your function on them via lapply, how do you get them back out of the list, without starting back at square one?
If you just want to change the names, that isn't too hard in R. Bear in mind that the assignment operator, <-, can be applied in sequence. Hence:
names(df1) <- names(df2) <- names(df3) <- c("newname1", "newname2", "newname3")
I am not sure I understand correctly, do you want to rename the columns of the data frames or the components of the list that contain the data frames?
If it is the first, please always search before asking, the question has been asked here.
So what you can easily do in case you have even more data frames in the list is:
# Creating some sample data first
> dflist <- list(df1 = data.frame(a = 1:3, b = 2:4, c = 3:5),
+ df2 = data.frame(a = 4:6, b = 5:7, c = 6:8),
+ df3 = data.frame(a = 7:9, b = 8:10, c = 9:11))
# See how it looks like
> dflist
$df1
a b c
1 1 2 3
2 2 3 4
3 3 4 5
$df2
a b c
1 4 5 6
2 5 6 7
3 6 7 8
$df3
a b c
1 7 8 9
2 8 9 10
3 9 10 11
# And do the trick
> dflist <- lapply(dflist, setNames, nm = c("newname1", "newname2", "newname3"))
# See how it looks now
> dflist
$df1
newname1 newname2 newname3
1 1 2 3
2 2 3 4
3 3 4 5
$df2
newname1 newname2 newname3
1 4 5 6
2 5 6 7
3 6 7 8
$df3
newname1 newname2 newname3
1 7 8 9
2 8 9 10
3 9 10 11
So the names were changed from a, b and c to newname1, newname2and newname3 for each data frame in the list.
If it is the second, you can do this:
> names(dflist) <- c("newname1", "newname2", "newname3")

Using lapply to change column names of a list of data frames

I'm trying to use lapply on a list of data frames; but failing at passing the parameters correctly (I think).
List of data frames:
df1 <- data.frame(A = 1:10, B= 11:20)
df2 <- data.frame(A = 21:30, B = 31:40)
listDF <- list(df1, df2,df3) #multiple data frames w. way less columns than the length of vector todos
Vector with columns names:
todos <-c('col1','col2', ......'colN')
I'd like to change the column names using lapply:
lapply (listDF, function(x) { colnames(x)[2:length(x)] <-todos[1:length(x)-1] } )
but this doesn't change the names at all. Am I not passing the data frames themselves, but something else? I just want to change names, not to return the result to a new object.
Thanks in advance, p.
You can also use setNames if you want to replace all columns
df1 <- data.frame(A = 1:10, B= 11:20)
df2 <- data.frame(A = 21:30, B = 31:40)
listDF <- list(df1, df2)
new_col_name <- c("C", "D")
lapply(listDF, setNames, nm = new_col_name)
## [[1]]
## C D
## 1 1 11
## 2 2 12
## 3 3 13
## 4 4 14
## 5 5 15
## 6 6 16
## 7 7 17
## 8 8 18
## 9 9 19
## 10 10 20
## [[2]]
## C D
## 1 21 31
## 2 22 32
## 3 23 33
## 4 24 34
## 5 25 35
## 6 26 36
## 7 27 37
## 8 28 38
## 9 29 39
## 10 30 40
If you need to replace only a subset of column names, then you can use the solution of #Jogo
lapply(listDF, function(df) {
names(df)[-1] <- new_col_name[-ncol(df)]
df
})
A last point, in R there is a difference between a:b - 1 and a:(b - 1)
1:10 - 1
## [1] 0 1 2 3 4 5 6 7 8 9
1:(10 - 1)
## [1] 1 2 3 4 5 6 7 8 9
EDIT
If you want to change the column names of the data.frame in global environment from a list, you can use list2env but I'm not sure it is the best way to achieve want you want. You also need to modify your list and use named list, the name should be the same as name of the data.frame you need to replace.
listDF <- list(df1 = df1, df2 = df2)
new_col_name <- c("C", "D")
listDF <- lapply(listDF, function(df) {
names(df)[-1] <- new_col_name[-ncol(df)]
df
})
list2env(listDF, envir = .GlobalEnv)
str(df1)
## 'data.frame': 10 obs. of 2 variables:
## $ A: int 1 2 3 4 5 6 7 8 9 10
## $ C: int 11 12 13 14 15 16 17 18 19 20
try this:
lapply (listDF, function(x) {
names(x)[-1] <- todos[-length(x)]
x
})
you will get a new list with changed dataframes. If you want to manipulate the listDF directly:
for (i in 1:length(listDF)) names(listDF[[i]])[-1] <- todos[-length(listDF[[i]])]
I was not able to get the code used in these answers to work. I found some code from another forum which did work. This will assign the new column names into each dataframe, the other methods created a copy of the dataframes. For anyone else here is the code.
# Create some dataframes
df1 <- data.frame(A = 1:10, B= 11:20)
df2 <- data.frame(A = 21:30, B = 31:40)
listDF <- c("df1", "df2") #Notice this is NOT a list
new_col_name <- c("C", "D") #What do you want the new columns to be named?
# Assign the new column names to each dataframe in "listDF"
for(df in listDF) {
df.tmp <- get(df)
names(df.tmp) <- new_col_name
assign(df, df.tmp)
}

Resources