I have a dataset where I only want to loop through certain columns in a dataframe one at a time to create a graph. The structure of my dataframe consists of data that I parsed from a larger dataset into a vector containing multiple dataframes.
I want to call one column from one dataframe in the vector. I want to loop on the dataframes to call each column.
See example below:
d1 <- data.frame(y1=c(1,2,3),y2=c(4,5,6))
d2 <- data.frame(y1=c(3,2,1),y2=c(6,5,4))
my.list <- list(d1, d2)
All I have to work with is my.list
How would I do this?
You can use lapply to plot each of the individual data frames in your list. For example,
d1 <- data.frame(y1=c(1,2,3),y2=c(4,5,6),y3=c(7,8,9))
d2 <- data.frame(y1=c(3,2,1),y2=c(6,5,4),y3=c(11,12,13))
mylist <- list(d1, d2)
par(mfrow=c(2,1))
# lapply on a subset of columns
lapply(mylist, function(x) plot(x$y2, x$y3))
You don't need a for loop to get their data points. You can call the column by their column names.
# a toy dataframe
d <- data.frame(A = 1:20, B = sample(c(FALSE, TRUE), 20, replace = TRUE),
C = LETTERS[1:20], D = rnorm(20, 0, 1))
col_names <- c("A", "B", "D") # names of columns I want to get
d[,col_names] # returns a dataset with the values of the columns you want
Here is a solution to your problem using a for loop:
# a toy dataframe
mylist <- list(dat1 = data.frame(A = 1:20, B = LETTERS[1:20]),
dat2 = data.frame(A = 21:40, B = LETTERS[1:20]),
dat3 = data.frame(A = 41:60, B = LETTERS[1:20]))
col_names <- c("A") # name of columns I want to get
for (i in 1:length(mylist)){
# you can do whatever you want with what is returned;
# here I am just print them out
print(names(mylist)[i]) # name of the data frame
print(mylist[[i]][,col_names]) # values in Column A
}
I think the simplest answer to your question is to use double brackets.
for (i in 1:length(my.list)) {
print(my.list[[i]]$column)
}
That works assuming all of the columns in your list of data frames have the same names. You could also call the position of the column in the data frame if you wanted.
Yes, lapply can be more elegant, but in some situations a for loop makes more sense.
Related
I have a list of data.frames whereas the first column's colname in each data.frame is supposed to be complemented by dynamic information from a vector.
Example:
set.seed(1)
df1 <- data.frame(matrix(sample(32), ncol = 8))
names(df1) <- paste(rep(c("a", "b"), each = 4), 1:4, sep = "")
set.seed(2)
df2 <- data.frame(matrix(sample(32), ncol = 8))
names(df2) <- paste(rep(c("a", "b"), each = 4), 1:4, sep = "")
list_dfs <- list(df1, df2)
add_info <- c("add1", "other")
How can I add information from add_info to change the colname for a1 in df1 to "a1 add1" and a1 in df2 to "a1 add2" in a scalable way within the given list structure? The other colnames are not supposed to be changed.
I tried several approaches setting colnames using paste0 within lapply or a for loop and reviewed similar questions on SO but couldn't solve this problem so far.
You can do the following:
list_dfs <- lapply(1:length(list_dfs), function(i) {
setNames(list_dfs[[i]],paste(names(list_dfs[[i]]),add_info[[i]]))
})
Now the first dataframe in the list has its original name concatenated with the first element of add_info, the second has its names concatenated with second element of add_info. You can easily scale this to longer lists of data.frames and corresponding add_info-vectors.
Update:
If you only want to change the first name, do
list_dfs <- lapply(1:length(list_dfs), function(i) {
lastNames <- names(list_dfs[[i]])[2:NCOL(list_dfs[[i]])]
firstName <- paste(names(list_dfs[[i]])[1],add_info[[i]])
setNames(list_dfs[[i]],c(firstName,lastNames))
})
I have two dataframes. I want to replace the ids in dataframe1 with generic ids. In dataframe2 I have mapped the ids from dataframe1 with the generic ids.
Do I have to merge the two dataframes and after it is merged do I delete the column I don't want?
Thanks.
With dplyr
library(dplyr)
left_join(df1, df2, by = 'ids')
We can use merge and then delete the ids.
dataframe1 <- data.frame(ids = 1001:1010, variable = runif(min=100,max = 500,n=10))
dataframe2 <- data.frame(ids = 1001:1010, generics = 1:10)
result <- merge(dataframe1,dataframe2,by="ids")[,-1]
Alternatively we can use match and replace by assignment.
dataframe1$ids <- dataframe2$generics[match(dataframe1$ids,dataframe2$ids)]
Subsetting data frames isn't very difficult in R: hope this helps, you didn't provide much code so I hope this will be of help to you:
#create 4 random columns (vectors) of data, and merge them into data frames:
a <- rnorm(n=100,mean = 0,sd=1)
b <- rnorm(n=100,mean = 0,sd=1)
c <- rnorm(n=100,mean = 0,sd=1)
d<- rnorm(n=100,mean = 0,sd=1)
df_ab <- as.data.frame(cbind(a,b))
df_cd <- as.data.frame(cbind(c,d))
#if you want column d in df_cd to equal column a in df_ab simply use the assignment operator
df_cd$d <- df_ab$a
#you can also use the subsetting with square brackets:
df_cd[,"d"] <- df_ab[,"a"]
I am facing the following challenge:
I have a list of dataframes in R and I'd like to extract some specific information from it. Here is an example:
df_1 <- data.frame(A = c(1,2), B = c(3,4), D = c(5,6))
df_2 <- data.frame(A = c(7,8), B = c(9,10), D = c(11,12))
df_3 <- data.frame(A = c(0,1), B = c(2,3), D = c(4,5))
L <- list(df_1, df_2, df_3)
What I'd like to extract are the values at position (1,1) in each of these dataframes. In the above case this would be: 1, 7, 0.
Is there a way to extract this information easily, probably with one line of code?
As Ronak has suggested , you can use function like lapply and wrap it with unlist for desired output.
unlist(lapply(L,function(x) x[1,1]))
In addition to the *apply methods shown above, you can also do this in a Vectorized manner. Since all the data frames in your list have the same column names, and you want the first element from the first column, i.e. 'A1', then you can simply unlist (which will create a named vector) and grab the values with the name A1.
v1 <- unlist(L)
v1[names(v1) == 'A1']
#A1 A1 A1
# 1 7 0
I already have a list of data frames (mylist) and need to switch the first and second column for all the data frames in the list.
Test Data Frame in List
[reads] [phylum]
1 phylum1
2 phylum2
3 phylum3
Into....
[phylum] [reads]
phylum1 1
phylum2 2
phylum3 3
I know I need to use lapply, but not sure what to input for the FUN=
mylist <- lapply(mylist, FUN = mylist[ ,c("phylum", "reads")])
errors saying incorrect number of dimensions
Sorry if this is a simple question and thanks in advance for your help!
-Brand new R user
The FUN asks for a function that it can apply to every element in the list. You are passing mylist[ ,c("phylum", "reads")]) which is not a function.
# sample data
df1 <- data.frame(reads = sample(10,4), phylum = sample(10,4))
df2 <- data.frame(reads = sample(10,4), phylum = sample(10,4))
df3 <- data.frame(reads = sample(10,4), phylum = sample(10,4))
df4 <- data.frame(reads = sample(10,4), phylum = sample(10,4))
ldf <- list(df1,df2,df3,df4)
ldf_re <- lapply(ldf, FUN = function(X){X[c('phylum', 'reads')]})
In the last line, the lapply will iterate through all the dataframes, they will be passed as the X argument for the function defined in the FUN argument and the columns will be dataframes will be stored in the list ldf_re with their columns rearranged.
I have a list of dataframes and I need to transform a certain variable in each of the dataframes as factor.
E.g.
myList <- list(df1 = data.frame(A = sample(10), B = rep(1:2, 10)),
df2 = data.frame(A = sample(10), B = rep(1:2, 10))
)
Lets say that variable B needs to be factor in each dataframe. I've tried this:
TMP <- setNames(lapply(seq_along(myList), function(x) apply(myList[[x]][c("B")], 2, factor)), names(myList))
but it only returns the transformed variable, not the whole dataframe as I need. I know how to do this with for loop, but I don't want to resort to that.
Per comment from David Arenburg, this solution should work:
TMP <- lapply(myList, function(x) {x[, "B"] <- factor(x[, "B"]) ; x}) ; str(TMP)