Loop show head of several data frames - r

I have several data frames and I would like to run the head function over all of them. I tried the following but it doesn’t work, as it returns the name of the data frame but not the head of the data frame itself.
df.a <- data.frame(col1 = "a", col2 = 1)
df.b <- data.frame(col1 = "b", col2 = 2)
df.c <- data.frame(col1 = "c", col2 = 3)
list <- ls()
for (i in 1:length(list())){
head(list[i])
}
lapply(ls(),head)
Any idea on how to do it or why it is not working?

Put your data frames into a list, and add print to your loop.
my.list <- list(df.a, df.b, df.c)
for (i in seq_along(my.list)){
print(head(my.list[[i]]))
}

We need to get the value of the objects provided by the ls() as a vector of character strings. If the object names have a pattern, specify the pattern in the ls and wrap it with mget to get the values in a list, loop over the list with lapply and get the head
lapply(mget(ls(pattern="df\\.")), head)

Related

Replace list names if they exist

I have example data as follows:
# list of data frames:
l = list(a=mtcars, b=mtcars, c=mtcars)
I would like to replace the list names, if they exist in the vector list_names_available_for_name_change with new_list_names.
list_names_available_for_name_change <- c("a", "c")
new_list_names <- c("android", "circus")
I thought of doing something like:
names(l)[names(l) == "a"] <- "android"
But I would like to do this for the entire list. Something like:
names(l)[names(l) == list_names_available_for_name_change ] <- new_list_names
How should I write the syntax to achieve this?
Desired output:
# list of data frames:
l = list(android=mtcars, b=mtcars, circus=mtcars)
In base R, use match to find the matching positions of the 'names' of the list with the subsset of list names, use that to get the corresponding 'new_list_names' and do the assign on the names of the list
nm1 <- new_list_names[match(names(l), list_names_available_for_name_change)]
i1 <- !is.na(nm1)
names(l)[i1] <- nm1[i1]
-output
names(l)
[1] "android" "b" "circus"
Or with mapvalues
names(l) <- plyr::mapvalues(names(l),
list_names_available_for_name_change, new_list_names)

How can lapply work with addressing columns as unknown variables?

So, I have a list of strings named control_for. I have a data frame sampleTable with some of the columns named as strings from control_for list. And I have a third object dge_obj (DGElist object) where I want to append those columns. What I wanted to do - use lapply to loop through control_for list, and for each string, find a column in sampleTable with the same name, and then add that column (as a factor) to a DGElist object. For example, for doing it manually with just one string, it looks like this, and it works:
group <- as.factor(sampleTable[,3])
dge_obj$samples$group <- group
And I tried something like this:
lapply(control_for, function(x) {
x <- as.factor(sampleTable[, x])
dge_obj$samples$x <- x
}
Which doesn't work. I guess the problem is that R can't recognize addressing columns like this. Can someone help?
Here are two base R ways of doing it. The data set is the example of help("DGEList") and a mock up data.frame sampleTable.
Define a vector common_vars of the table's names in control_for. Then create the new columns.
library(edgeR)
sampleTable <- data.frame(a = 1:4, b = 5:8, no = letters[21:24])
control_for <- c("a", "b")
common_vars <- intersect(control_for, names(sampleTable))
1. for loop
for(x in common_vars){
y <- sampleTable[[x]]
dge_obj$samples[[x]] <- factor(y)
}
2. *apply loop.
tmp <- sapply(sampleTable[common_vars], factor)
dge_obj$samples <- cbind(dge_obj$samples, tmp)
This code can be rewritten as a one-liner.
Data
set.seed(2021)
y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
dge_obj <- DGEList(counts=y, group=rep(1:2,each=2))

Add different suffix to column names on multiple data frames in R

I'm trying to add different suffixes to my data frames so that I can distinguish them after I've merge them. I have my data frames in a list and created a vector for the suffixes but so far I have not been successful.
data2016 is the list containing my 7 data frames
new_names <- c("june2016", "july2016", "aug2016", "sep2016", "oct2016", "nov2016", "dec2016")
data2016v2 <- lapply(data2016, paste(colnames(data2016)), new_names)
Your query is not quite clear. Therefore two solutions.
The beginning is the same for either solution. Suppose you have these four dataframes:
df1x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df2x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
df3x <- data.frame(v1 = rnorm(50),
v2 = runif(50))
df4x <- data.frame(v3 = rnorm(60),
v4 = runif(60))
Suppose further you assemble them in a list, something akin to your data2016using mgetand ls and describing a pattern to match them:
my_list <- mget(ls(pattern = "^df\\d+x$"))
The names of the dataframes in this list are the following:
names(my_list)
[1] "df1x" "df2x" "df3x" "df4x"
Solution 1:
Suppose you want to change the names of the dataframes thus:
new_names <- c("june2016", "july2016","aug2016", "sep2016")
Then you can simply assign new_namesto names(my_list):
names(my_list) <- new_names
And the result is:
names(my_list)
[1] "june2016" "july2016" "aug2016" "sep2016"
Solution 2:
You want to add the new_names literally as suffixes to the 'old' names, in which case you would use pasteor paste0 thus:
names(my_list) <- paste0(names(my_list), "_", new_names)
And the result is:
names(my_list)
[1] "df1x_june2016" "df2x_july2016" "df3x_aug2016" "df4x_sep2016"
You could use an index number within lapply to reference both the list and your vector of suffixes. Because there are a couple steps, I'll wrap the process in a function(). (Called an anonymous function because we aren't assigning a name to it.)
data2016v2 <- lapply(1:7, function(i) {
this_data <- data2016[[i]] # Double brackets for a list
names(this_data) <- paste0(names(this_data), new_names[i]) # Single bracket for vector
this_data # The renamed data frame to be placed into data2016v2
})
Notice in the paste0() line we are recycling the term in new_names[i], so for example if new_names[i] is "june2016" and your first data.frame has columns "A", "B", and "C" then it would give you this:
> paste0(c("A", "B", "C"), "june2016")
[1] "Ajune2016" "Bjune2016" "Cjune2016"
(You may want to add an underscore in there?)
As an aside, it sounds like you might be better served by adding the "june2016" as a column in your data (like say a variable named month with "june2016" as the value in each row) and combining your data using something like bind_rows() from the dplyr package, running it "long" instead of "wide".

Why does lapply() convert my R data frame to list?

I was modifying a data frame in R with lapply() and observed that my data frame was converted to a list object when I didn't use brackets to assign it.
For example, the following returns a list
junk <- data.frame(col1 = 1:3,
col2 = c("a,b,c"),
col3 = c(T,T,F))
junk <- lapply(junk, function(x) {
if (is.numeric(x)) return(x*2)
else return(x)})
str(junk)
where as the following returns a data frame.
junk <- data.frame(col1 = 1:3,
col2 = c("a,b,c"),
col3 = c(T,T,F))
junk[] <- lapply(junk, function(x) {
if (is.numeric(x)) return(x*2)
else return(x)})
str(junk)
I'd like to know why [] preserves the data frame structure, and what [] is doing in this case. I understand why the first code chunk converts junk to a list, but don't understand why the second chunk preserves structure, though I couldn't think of a clear title to describe the question/situation. Thanks.
It is natural for lapply to return a list, because it is not always guaranteed that function FUN returns processing results of the same size.
dat <- data.frame(a = c(1,1,2), b = c(1,1,1))
lapply(dat, unique)
The second does not preserve structure by modifying the original data frame in place. It does this
tmp <- lapply(...); junk[] <- tmp; rm(tmp)

One liner wanted: Create data frame and give colnames: R data.frame(..., colnames = c("a", "b", "c"))

Is there an easier (i.e. one line of code instead of two!) way to do the following:
results <- as.data.frame(str_split_fixed(c("SampleID_someusefulinfo.countsA" , "SampleID_someusefulinfo.countsB" , "SampleID_someusefulinfo.counts"), "\\.", n=2))
names(results) <- c("a", "b")
Something like:
results <- data.frame(str_split_fixed(c("SampleID_someusefulinfo.countsA" , "SampleID_someusefulinfo.countsB" , "SampleID_someusefulinfo.counts"), "\\.", n=2), colnames = c("a", "b"))
I do this a lot, and would really love to have a way to have this in one line of code.
/data.table works too, if it's easier to do there than in base data.frame/
Clarifying:
My expected output (which is achieved by running the two lines of code at the top - AND I WANT IT TO BE ONE - THAT's IT!!!) is a result data frame of the structure:
results
a b
1 SampleID_someusefulinfo countsA
2 SampleID_someusefulinfo countsB
3 SampleID_someusefulinfo counts
What I would like to do is:
CREATE the data frame from a matrix or with some content (for example the toy code of matrix(c(1,2,3,4),nrow=2,ncol=2) I provided in the first example I wrote)
SPECIFY IN THAT SAME LINE what I would like the column names of my data frame to be
Use setNames() around a data.frame
setNames(data.frame(matrix(c(1,2,3,4),nrow=2,ncol=2)), c("a","b"))
# a b
#1 1 3
#2 2 4
?setNames:
a convenience function that sets the names on an object and returns the object
> setNames
function (object = nm, nm)
{
names(object) <- nm
object
}
We can use the dimnames option in matrix as the OP was using matrix to create the data.
data.frame(matrix(1:4, 2, 2, dimnames=list(NULL, c("a", "b"))))
Or
`colnames<-`(data.frame(matrix(1:4, 2, 2)), c('a', 'b'))

Resources