apply in R to change column names - r

I am hoping to change one of the column names (the 14th column) in each of many files but I cannot figure how to go about it. I have tried multiple kinds of apply but that approach isn't working and I don't know where to start looking for another approach. Here is my code so far:
File.names<-(tk_choose.files(default="", caption="Files", multi=TRUE, filters=NULL, index=1))
Num.Files<-NROW(File.names)
test<-sapply(1:Num.Files,function(x){readLines(File.names[x])})
lapply(1:Num.Files, function(x){data<-read.table(header=TRUE, text=test)})
#This is the issue
names(data)[14]<-'column14'
names(data)
As I mentioned I tried varying types of apply but to no avail. Is there a different way of going about this? Any suggestions would be welcome.

You have to call names another lapply. E.g.:
l <- list(x=c(a=1, b=1), y=c(a=1, b=1))
l2 <- lapply(l, function(x) {
names(x)[2] <- "d"
return(x)
})
l2
#$x
#a d
#1 1
#
#$y
#a d
#1 1

Split the names out first, then alter, then assign as a group. Like,
new.names <- names( data )
new.names[[14]] <- `column14`
names( data ) <- new.names

Related

How can I make a tibble/tbl_df/data_frame from a vector or vectors

I have a name and a vector
my.name <- 'data.values'
my.vec <- 1:5
and I'd like to make a tibble/tbl_df/data_frame with one column that has my.name as the name of that column and my.vec as the values. What I have is
df <- data_frame(placeholder = rep(NA, length(my.vec)))
df[[my.name]] <- my.vec
df[['placeholder']] <- NULL
Which just feels silly. Is there an easier way to do this?
I am also interested in the case where I have multiple vectors and multiple names, e.g.
my.name1 <- 'data.values.day1'
my.name2 <- 'data.values.day2'
my.vec1 <- 1:5
my.vec2 <- 2:6
...
I think the best answer came in a comment.
DirtySockSniffer recommended:
as_data_frame(setNames(list(my.vec), my.name)))
which generalizes nicely to the multiple column situation
as_data_frame(setNames(list(my.vec1, my.vec2),
c(my.name1, my.name2)))
You can create a data_frame first and then set its column names:
my.data <- data_frame(my.vec.1, my.vec.2, ...)
names(my.data) <- c(my.name.1, my.name.2, ...) # Order is important here

R: Merging lists of data frames

I'm a total noob at R and I've tried (and retried) to search for an answer to the following problem, but I've not been able to get any of the proposed solutions to do what I'm interested in.
I have two lists of named elements, with each element pointing to data frames with identical layouts:
(EDIT)
df1 <- data.frame(A=c(1,2,3),B=c("A","B","C"))
df2 <- data.frame(A=c(98,99),B=c("Y","Z"))
lst1 <- c(X=df1,Y=df2)
df3 <- data.frame(A=c(4,5),B=c("D","E"))
lst2 <- c(X=df3)
(EDIT 2)
So it seems like storing multiple data frames in a list is a bad idea, as it will convert the data frames to lists. So I'll go out looking for an alternative way to store a set of named data frames.
In general the names of the elements in the two elements might overlap partially, completely, or not at all.
I'm looking for a way to merge the two lists into a single list:
<some-function-sequence>(lst1, lst2)
->
c(X=rbind(df1,df3),Y=df2)
-resulting in something like this:
[EDIT: Syntax changed to correctly reflect desired result (list-of-data frames)]
$X
A B
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
$X.B
A B
1 98 Y
2 99 Z
I.e:
IF the lists contain identical element names, each pointing to a data frame, THEN I want to 'rbind' the rows from these two data frames and assign the resulting data frame to the same element name in the resulting list.
Otherwise the element names and data frames from both lists should just be copied into the resulting list.
I've tried the solutions from a number of discussions such as:
Can I combine a list of similar dataframes into a single dataframe?
Combine/merge lists by elements names
Simultaneously merge multiple data.frames in a list
Combine/merge lists by elements names (list in list)
Convert a list of data frames into one data frame
-but I've not been able to find the right solution. A general problem seems to be that the data frame ends up being converted into a list by the application of 'mapply/sapply/merge/...' - and usually also sliced and/or merged in ways which I am not interested in. :)
Any help with this will be much appreciated!
[SOLUTION]
The solution seems to be to change the use of c(...) when collecting data frames to list(...) after which the solution proposed by Pierre seems to give the desired result.
Here is a proposed solution using split and c to combine like terms. Please read the caveat at the bottom:
s <- split(c(lst1, lst2), names(c(lst1,lst2)))
lapply(s, function(lst) do.call(function(...) unname(c(...)), lst))
# $X.A
# [1] 1 2 3 4 5
#
# $X.B
# [1] "A" "B" "C" "D" "E"
#
# $Y.A
# [1] 98 99
#
# $Y.B
# [1] "Y" "Z"
This solution is based on NOT having factors as strings. It will not throw an error but the factors will be converted to numbers. Below I show how I transformed the data to remove factors. Let me know if you require factors:
df1 <- data.frame(A=c(1,2,3),B=c("A","B","C"), stringsAsFactors=FALSE)
df2 <- data.frame(A=c(98,99),B=c("Y","Z"), stringsAsFactors=FALSE)
lst1 <- c(X=df1,Y=df2)
df3 <- data.frame(A=c(4,5),B=c("D","E"), stringsAsFactors=FALSE)
lst2 <- c(X=df3)
If the data is stored in lists we can use:
lapply(split(c(lst1, lst2), names(c(lst1,lst2))), function(lst) do.call(rbind, lst))
The following solution is probably not the most efficient way. However, if I got your problem right this should work ;)
# Example data
# Some vectors
a <- 1:5
b <- 3:7
c <- rep(5, 5)
d <- 5:1
# Some dataframes, data1 and data3 have identical column names
data1 <- data.frame(a, b)
data2 <- data.frame(c, b)
data3 <- data.frame(a, b)
data4 <- data.frame(c, d)
# 2 lists
list1 <- list(data1, data2)
list2 <- list(data3, data4)
# Loop, wich checks for the dataframe names and rbinds dataframes with the same column names
final_list <- list1
used_lists <- numeric()
for(i in 1:length(list1)) {
for(j in 1:length(list2)) {
if(sum(colnames(list1[[i]]) == colnames(list2[[j]])) == ncol(list1[[i]])) {
final_list[[i]] <- rbind(list1[[i]], list2[[j]])
used_lists <- c(used_lists, j)
}
}
}
# Adding the other dataframes, which did not have the same column names
for(i in 1:length(list2)) {
if((i %in% used_lists) == FALSE) {
final_list[[length(final_list) + 1]] <- list2[[i]]
}
}
# Final list, which includes all other lists
final_list

finding similar element between two data

I asked a question before which was complicated and I did not get any help. So I tried to simplify the question and input output.
I have tried many ways but none worked for example , I sort down some
# 1
for(i in ncol(mydata)){
corsA = grep(colnames(mydata)[i] , colnames(mysecond))
mydata[,corsA]%in%mysecond[,i]}
# here if I get true then means they have match
## 2
are.cols.identical <- function(col1, col2) identical(mydata[,col1], mysecond[,col2])
res <- outer(colnames(mydata), colnames(mysecond),FUN = Vectorize(are.cols.identical))
cut <- apply(res, 1, function(x)match(TRUE, x))
### 3
(mydata$Rad) %in% (mysecond$Ro5_P1_A5)
#### 4
which(mydata %in% mysecond)
#### 5
match(mydata$sus., mysecond$R5_P1_A5)
or
which(mydata$sus. %in% mysecond$RP1_A5)
matches <- sapply(mydata,function(x) sapply(mysecond,identical,x))
and few others, but none led me to an answer
Here is another solution using regex:
rows<-mapply(grep,mysecond,mydata)
The step above will return a list with the matched rows in each column:
rows
If you would like to see how many rows where matched you can do this:
lapply(rows,length)
Now we can go ahead a get the rows of interest in mydata, but rows is a list so we need to unlist() and we might have some duplicate rows, and we don't want them to appear twice in the output, so we use the unique() function:
rows<-unique(unlist(rows))
mydata[rows,]
#View(mydata[rows,])
require(plyr)
dat <- strsplit(as.character(mydata$subunits..UniProt.IDs.), ',')
dat <- data.frame(mydata[,1],rbind.fill(lapply(dat,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)})))
mydata[unlist(apply(dat,2, function(x) which(x %in% mysecond[,2]))),]

How to assign to a subset of an R object with a name given as string

I have the name of a matrix as string and would like to assign to a column of that matrix.
A <- matrix(1:4,2)
v <- 10:11
name <- "A"
get(name)[,2] <- v
This does not work because the LHS is just a value (i.e. a vector) and has lost the meaning of "the second column of A".
eval(parse(text=paste0(name,'[,2]<- v')))
This does the job, but a lot of people discourage the use of such a structure. What is the recommended way to go about this?
EDIT:
Most comments on similar problems I have found discourage the use of object names that can only be passed as strings and instead promote the use of lists, i.e.
l <- list(A=matrix(1:4,2))
v <- 10:11
name <- "A"
l[[name]][,2] <- v
but this does not really answer my question.
For changing names of columns, you should work on a data.frame and not on a matrix:
A <- matrix(1:4,2)
v <- 10:11
name <- "A"
A <- as.data.frame(A)
v <- as.data.frame(v)
colnames(A)[2] <- name
A[,2] <- v
Is this what you were looking for?

Succinctly assign names and values simultaneously

I find myself often writing the following two lines. Is there a succinct alternative?
newObj <- vals
names(newObj) <- nams
# This works, but is ugly and not necessarily preferred
'names<-'(newObj <- vals, nams)
I'm looking for something similar to this (which of course does not work):
newObj <- c(nams = vals)
Wrapping it up in a function is an option as well, but I am wondering if the functionality might already be present.
sample data
vals <- c(1, 2, 3)
nams <- c("A", "B", "C")
You want the setNames function
# Your example data
vals <- 1:3
names <- LETTERS[1:3]
# Using setNames
newObj <- setNames(vals, names)
newObj
#A B C
#1 2 3
The names<- method often (if not always) copies the object internally. setNames is simply a wrapper for names<-,
If you want to assign names and values succinctly in code and memory, then the setattr function, from either the bit or data.table packages will do this by reference (no copying)
eg
library(data.table) # or library(bit)
setattr(vals, 'names', names)
Perhaps slightly less succinct, but you could write yourself a simple wrapper
name <- function(x, names){ setattr(x,'names', names)}
val <- 1:3
names <- LETTERS[1:3]
name(val, names)
# and it has worked!
val
## A B C
## 1 2 3
Note that if you assign to a new object, both the old and new object will have the names!

Resources