Use a row as colname - r

I want to take a row and use it to set the colnames example below
df1a = data.frame(Customer = c("A", "a",1:8), Product = c("B", "b",11:18))
colnames(df1a)<-df1a[2,]
Expected output
a b
1 A B
2 a b
3 1 11
4 2 12
5 3 13
6 4 14
7 5 15
8 6 16
9 7 17
10 8 18
I think the problem is that df1a[2,] is a data frame

Here the columns are factor class as by default stringsAsFactors = TRUE in the data.frame call. So, the values that got changed are the integer storage values of the factor rather than the acutal values
df1a <- data.frame(Customer = c("A", "a",1:8),
Product = c("B", "b",11:18), stringsAsFactors = FALSE)
and then do the assignment
names(df1a) <- unlist(df1a[2,])
Or as #Ryan mentioned, unlist is not needed
names(df1a) <- df1a[2,]

You can change the names without generating a new data.frame:
names(df1a) <- lapply(df1a[2,], as.character)

Related

How to create a parameter using $ to select from data frame

I have three different data frames that are similar in their columns such:
df1 df2 df3
Class 1 2 3 Class 1 2 3 Class 1 2 3
A 5 3 2 A 7 3 10 A 5 4 1
B 9 1 4 B 2 6 2 A 2 6 2
C 7 9 8 C 4 7 1 A 12 3 8
I would like to iterate through the three files and select the data from the columns with similar name. In other words, I want to iterate three times and everytime select data of column 1, then column 2, and then column 3 and merge them in one data frame.
To do that, I did the following:
df1 <- read.csv(R1)
df2 <- read.csv(R2)
df3 <- read.csv(R3)
df <- data.frame(Class=character(), B1_1=integer(), B1_2=integer(), B1_3=integer(), stringsAsFactors=FALSE)
for(i in 1:3){
nam <- paste("X", i, sep = "") #here I want to call the column name such as X1, X2, and X3
df[seq_along(df1[nam]), ]$B1_1 <- df1[nam]
df[seq_along(df2[nam]), ]$B1_2 <- df2[nam]
df[seq_along(df3[nam]), ]$B1_3 <- df3[nam]
df$Class <- df1$Class
}
In this line df[seq_along(df1[nam]), ]$B1_1 <- df1[nam], I followed the solution from this but this produces the following error:
Error in `$<-.data.frame`(`*tmp*`, "B1_1", value = list(X1 = c(5L, 7L, :
replacement has 10 rows, data has 1
Do you have any idea how to solve it?

R, how to manipulate several dataframe at a time

If I have 5 data frames in the global environment, such as a,b,c,d,and e
I want the data frame a to be compared with e, and if R finds any common elements in a and e, delete the elements in a. then I want the data frame b to be compared with e and delete the common elements, and so on.
Actually I have 20 tables need to be compared with e.
Can anyone give some elegant way to handle this problem? I'm thinking of loop or functions but can't work the details out.
Thanks everybody and have a nice day!
The easiest would be to put all the dataframes you want to compare in a list, then use lapply to loop over this list:
# create list of data.frames
dlist <- list(df1 = data.frame(var1 = 1:10), df2 = data.frame(var1 = 11:20),
df3 = data.frame(var1 = 21:30), df4 = data.frame(var1 = 31:40))
# create master-data.frame
set.seed(1)
df <- data.frame(var1 = sample(1:100, 30))
# use lapply() to loop over the data and exclude all elements that are in the master-data.frame
dlist <- lapply(dlist, function(x){
x <- x[!x$var1 %in% df$var1, , drop = FALSE]
})
Result:
> dlist
$df1
var1
2 2
3 3
4 4
5 5
7 7
8 8
9 9
$df2
var1
1 11
2 12
3 13
4 14
5 15
8 18
$df3
var1
2 22
3 23
4 24
6 26
10 30
$df4
var1
1 31
3 33
5 35
6 36
8 38
9 39
10 40
If you absolutely need the dataframes in your global directory, you could use list2env:
list2env(dlist, envir = .GlobalEnv)

R: Splitting one list into two lists under a condition

I've one large list (assume L) with 20 dataframes. The 20 data frames have only two different forms. They have 13 or 5 rows.
$foo1
a value
1 12 321.12
2 11 231.12
3 10 211.15
4 9 ...
5 8 ...
6 7 ...
7 6
8 5
9 4
10 3
11 2
12 1
13 0
$foo2
a value
1 4 19.52
2 3 98.91
3 2 97.67
4 1 ...
5 0 ...
I want to split the list into two lists with the following condition:
All data frames with the same row length should be stored in one list. As a result, I want a list of all data frames that have 5 rows and the other one should include all data frames with 13 rows.
We can do this by splitting on the number of rows. Create a grouping variable by looping through the list to get the number of rows ('grp')
grp <- sapply(L, nrow)
Then split the list 'L' by the grp
L1 <- split(L, grp)
If we need the list names to be 'month', 'quarter'
L1 <- split(L, setNames(c("month", "quarter"), c("13", "5"))[as.character(grp)])
data
set.seed(24)
L <- list(foo1 = data.frame(a = 1:13, value = rnorm(13)),
foo2 = data.frame(a = 1:5, value = rnorm(5)),
foo3 = data.frame(a = 1:13, value = rnorm(13)),
foo4 = data.frame(a = 1:5, value = rnorm(5)))
>

Using lapply to change column names of a list of data frames

I'm trying to use lapply on a list of data frames; but failing at passing the parameters correctly (I think).
List of data frames:
df1 <- data.frame(A = 1:10, B= 11:20)
df2 <- data.frame(A = 21:30, B = 31:40)
listDF <- list(df1, df2,df3) #multiple data frames w. way less columns than the length of vector todos
Vector with columns names:
todos <-c('col1','col2', ......'colN')
I'd like to change the column names using lapply:
lapply (listDF, function(x) { colnames(x)[2:length(x)] <-todos[1:length(x)-1] } )
but this doesn't change the names at all. Am I not passing the data frames themselves, but something else? I just want to change names, not to return the result to a new object.
Thanks in advance, p.
You can also use setNames if you want to replace all columns
df1 <- data.frame(A = 1:10, B= 11:20)
df2 <- data.frame(A = 21:30, B = 31:40)
listDF <- list(df1, df2)
new_col_name <- c("C", "D")
lapply(listDF, setNames, nm = new_col_name)
## [[1]]
## C D
## 1 1 11
## 2 2 12
## 3 3 13
## 4 4 14
## 5 5 15
## 6 6 16
## 7 7 17
## 8 8 18
## 9 9 19
## 10 10 20
## [[2]]
## C D
## 1 21 31
## 2 22 32
## 3 23 33
## 4 24 34
## 5 25 35
## 6 26 36
## 7 27 37
## 8 28 38
## 9 29 39
## 10 30 40
If you need to replace only a subset of column names, then you can use the solution of #Jogo
lapply(listDF, function(df) {
names(df)[-1] <- new_col_name[-ncol(df)]
df
})
A last point, in R there is a difference between a:b - 1 and a:(b - 1)
1:10 - 1
## [1] 0 1 2 3 4 5 6 7 8 9
1:(10 - 1)
## [1] 1 2 3 4 5 6 7 8 9
EDIT
If you want to change the column names of the data.frame in global environment from a list, you can use list2env but I'm not sure it is the best way to achieve want you want. You also need to modify your list and use named list, the name should be the same as name of the data.frame you need to replace.
listDF <- list(df1 = df1, df2 = df2)
new_col_name <- c("C", "D")
listDF <- lapply(listDF, function(df) {
names(df)[-1] <- new_col_name[-ncol(df)]
df
})
list2env(listDF, envir = .GlobalEnv)
str(df1)
## 'data.frame': 10 obs. of 2 variables:
## $ A: int 1 2 3 4 5 6 7 8 9 10
## $ C: int 11 12 13 14 15 16 17 18 19 20
try this:
lapply (listDF, function(x) {
names(x)[-1] <- todos[-length(x)]
x
})
you will get a new list with changed dataframes. If you want to manipulate the listDF directly:
for (i in 1:length(listDF)) names(listDF[[i]])[-1] <- todos[-length(listDF[[i]])]
I was not able to get the code used in these answers to work. I found some code from another forum which did work. This will assign the new column names into each dataframe, the other methods created a copy of the dataframes. For anyone else here is the code.
# Create some dataframes
df1 <- data.frame(A = 1:10, B= 11:20)
df2 <- data.frame(A = 21:30, B = 31:40)
listDF <- c("df1", "df2") #Notice this is NOT a list
new_col_name <- c("C", "D") #What do you want the new columns to be named?
# Assign the new column names to each dataframe in "listDF"
for(df in listDF) {
df.tmp <- get(df)
names(df.tmp) <- new_col_name
assign(df, df.tmp)
}

Converting row names to data frame column

I want to be able to access b0.e7, c0.14,...,f8.d4. But right now these are not in a column, but are the "row names". How can I have the row names be 1,2,3,4,5,6,7 and b0.e7, c0.14,...,f8.d4 to be it's own column. Thanks for the help in advance.
df=as.data.frame(c)
df = subset(df, c>7)
df
c
b0.e7 11
c0.14 8
f8.d1 10
f8.d2 9
f8.d3 11
f8.d4 12
Try this. The first line assigns a new column that is just the current row names of the data frame. The second line resets the row names to NULL, resulting in a sequence.
> df$new <- rownames(df)
> rownames(df) <- NULL
Which should result in
> df
# c new
# 1 11 b0.e7
# 2 8 c0.14
# 3 10 f8.d1
# 4 9 f8.d2
# 5 11 f8.d3
# 6 12 f8.d4
And you can reverse the column order if needed with df[, c(2, 1)]
You can make use of the fact that cbind.data.frame can make use of arguments from data.frame, one of which is row.names. That argument can be set to NULL, meaning that a slightly more direct approach than proposed by Richard is:
cbind(rn = rownames(mydf), mydf, row.names = NULL)
# rn c
# 1 b0.e7 11
# 2 c0.14 8
# 3 f8.d1 10
# 4 f8.d2 9
# 5 f8.d3 11
# 6 f8.d4 12
You can try this as well.
rows = row.names(df)
df1 = cbind(rows,df)

Resources