R Programming - Combine lists with similar names after looping - r

I currently have the following loop.
> margin_values
$margINCBJP
[1] 0.8481856 0.9165585 0.9270849 0.7932756 0.8296131 0.8284826 0.7584834 0.2566567
$margINCTRS
[1] NA NA NA NA NA 0.84499199 0.73135251 -0.06664292
$margBJPTRS
[1] NA NA NA NA NA 0.01650935 -0.02713086 -0.32329962
for(i in 1:length(margin_values)) {
nam <- paste("x", i, sep = "")
assign(nam, margin_values[[i:i]])
}
This creates separate lists starting at x1 to xn. How can I then automatically combine the numbers from all the lists to create one list? I know I can manually type c(x1, x2, x3...) all the way up until n, but since n is variable, is there anyway to have R simply do c() on all values starting with x? For this example, n=3, but depending on parameters I have earlier in my code it may change.

I Just ran into this myself and here is what I came up with (tweaked for you of course):
total<-c(lapply(ls(pattern = "x"),get))
This will create a list, total, with each element being one of your variables starting with x

Related

Add Unique Names from List to Create Columns in Dataframe

I currently have the below list and am trying to develop a data frame from the results. Essentially, I would like to take the first list and add it to a new data frame, creating columns with variable names, and storing the results in the first row. Then move to the next list and create a new y column so that any variables with y results would be added to that list.
list
[[1]]
x xy
1.000365 1.000365
[[2]]
x y
1.007184 1.007184
[[3]]
x y
1.020803 1.020803
[[4]]
NA
Is this possible to do? I've been trying to figure out how a for loop or lapply might work in this scenario but am unsure.
Thanks.
You can use [ in lappy on unique names of the vectors:
i <- unique(unlist(lapply(x, names)))
setNames(as.data.frame(do.call(rbind, lapply(x, `[`, i))), i)
# x xy y
#1 1.000365 1.000365 NA
#2 1.007184 NA 1.007184
#3 1.020803 NA 1.020803
#4 NA NA NA
Data:
x <- list(c(x=1.000365, xy=1.000365), c(x=1.007184, y=1.007184),
c(x=1.020803, y=1.020803), NA)
lapply, or even better bind_rows would be good for this:
library(dplyr)
d <- bind_rows( your.list )
Note, I assume the xy name of the 2nd element of the first list entry is a typo?

How to create a new variable based on another variable's value?

I have a vector containing string representing names of variables that should be in my final df. Those names could change every time based on other conditions.
x <- colnames(df)
y <- c("blue", "yellow", "red")
z <- setdiff(y,x)
Let's say my result now is that: z = c("blue", "red")
I would like a function that, if any element of vector y is missing from z, THEN the function will create a column on df with such element as variable name.
Here's my inconclusive attempt:
if (length(z) > 0) {
for (i in z) {
df$i <- NA
}
}
The part I don't know how to do is pass i as an argument for creating a new variable on df.
In my example: I should finally get df$yellow as a new variable of df.
I checked many posts, either I don't understand how it works, or they are not doing what I need, some for reference:
Create new variables based on another df
Rename variable based on textInput value in Shiny
Executing a function with paste to create a new variable in a dataframe in R
Evaluate expression given as a string
this is one possibility without any loops:
df <- data.frame(x = 1:5)
z <- c("blue", "red")
df[z] <- NA_character_
x blue red
1 1 NA NA
2 2 NA NA
3 3 NA NA
4 4 NA NA
5 5 NA NA
Solution was indeed the simple suggestion from #akrun:
You can use [ instead of $ i.e. df[z] <- NA Reproducible mtcars[z] <- NA; head(mtcars)
Hence, as follows:
if (length(z) > 0) {
for (i in z) {
df[i] <- NA
}
}

doing for loop in R

I have a file that I have filtered my SNPs for LD (in the example below;my.filtered.snp.id). I want to keep only these SNPs in my genotype matrix (geno_snp), I am trying to write a for loops in R, and I would appreciate any help to fix my code. I want to keep those lines (the whole line including snp.id and genotype information) in the genotype matrix where snp.id matches with snp.id in my my.filtered.snp.id and delete those that are not match.
head(my.filtered.snp.id)
Chr10_31458
Chr10_31524
Chr10_45901
Chr10_102754
Chr10_102828
Chr10_103480
head (geno_snp)
XRQChr10_103805 NA NA NA 0 NA 0 NA NA NA NA NA 0 0
XRQChr10_103937 NA NA NA 0 NA 1 NA NA NA NA NA 0 2
XRQChr10_103990 NA NA NA 0 NA 0 NA NA NA NA NA 0 NA
I am trying something like this:
for (i in 1:length(geno_snp[,1])){
for (j in 1:length(my.filtered.snp.id)){
if geno_snp[i,] == my.filtered.snp.i[j]
print (the whole line in geno_snp)
}
else (remove the line)
}
If I understood it correctly, you want a subset of your data.frame geno_snp in which the row names must match the selected SNP IDs from the vector my.filtered.snp.id.
Please check if this solution works for you:
index <- unlist(sapply(row.names(geno_snp), function(x) grep(pattern = x, x = my.filtered.snp.id)))
selected_subset <- geno_snp[index,]
What I did was to create an index adressing the rows with names that were a match with any value in my.filtered.snp.id. Then I used the index to make the subset of the dataframe. Since the result from applying the grep function with the aid of sapply was in the form of a list, I used unlist to obtain the results in the form of a vector.
EDIT:
I noticed you had some row.names that weren't an exact match with your original my.filtered.snp.id values. In this case, maybe what you wanna do is:
index <- unlist(sapply(my.filtered.snp.id, function(x) grep(pattern = x, x = row.names(geno_snp))))
selected_subset <- geno_snp[index,]
The thing is that you have row.names beggining with XRQ... so in this last case the code uses the reference values from my.filtered.snp.id to detect matches in row.names(geno_snp), even if there is this XRQ string in the beggining of it.
Finally, in the case I have misunderstood your data and what I'm calling row names here are, in fact, data in a column (the SNP IDs), just use geno_snp[,1] instead of row.names(geno_snp) in both codes above.

Appending to an R List one by one

Let's say I have data like:
> data[295:300,]
Date sulfate nitrate ID
295 2003-10-22 NA NA 1
296 2003-10-23 NA NA 1
297 2003-10-24 3.47 0.363 1
298 2003-10-25 NA NA 1
299 2003-10-26 NA NA 1
300 2003-10-27 NA NA 1
Now I would like to add all the nitrate values into a new list/vector. I'm using the following code:
i <- 1
my_list <- c()
for(val in data)
{
my_list[i] <- val
i <- i + 1
}
But this is what happens:
Warning message:
In x[i] <- val :
number of items to replace is not a multiple of replacement length
> i
[1] 2
> x
[1] NA
Where am I going wrong? The data is part of a Coursera R Programming coursework. I can assure you that this is not an assignment/quiz. I have been trying to understand what is the best way append elements into a list with a loop? I have not proceeded to the lapply or sapply part of the coursework, so thinking about workarounds.
Thanks in advance.
If it's a duplicate question, please direct me to it.
As we mention in the comments, you are not looping over the rows of your data frame, but the columns (also sometimes variables). Hence, loop over data$nitrate.
i <- 1
my_list <- c()
for(val in data$nitrate)
{
my_list[i] <- val
i <- i + 1
}
Now, instead of looping over your values, a better way is to use that you want the new vector and the old data to have the same index, so loop over the index i. How do you tell R how many indexes there are? Here you have several choices again: 1:nrow(data), 1:length(data$nitrate) and several other ways. Below I have given you a few examples of how to extract from the data frame.
my_vector <- c()
for(i in 1:nrow(data)){
my_vector[i] <- data$nitrate[i] ## Version 1 of extracting from data.frame
my_vector[i] <- data[i,"nitrate"] ## Version 2: [row,column name]
my_vector[i] <- data[i,3] ## Version 3: [row,column number]
}
My suggestion: Rather than calling the collection a list, call it a vector, since that is what it is. Vectors and lists behave a little differently in R.
Of course, in reality you don't want to get the data out one by one. A much more efficient way of getting your data out is
my_vector2 <- data$nitrate

R: Mean subset sequence from dataframe

I have been facing a problem for three days and I cannot get any answer about why it does not work. I have tried quite a lot different ways, but I am just going to post the one I believe is likely to be closest to the solution. I am going to put a reduce example about what I want to ask.
I have 7 csv files (called 001.csv, 002.csv, ... etc), in a folder called "Myfolder".
I have been trying to get a function that merged into an unique data.frame all this different .csv files using for-loop and r.bind and finally return the mean from either column "Colour1" or "Colour2" depending in the "colour" (column) and the "Children" (rows) I choose, and of course without missing values "Na". As an example when I merge the files I get a data frame like this data frame:
Colour1 Colour2 Children
NA NA 1
9 NA 2
NA NA 2
NA 5 3
7 NA 4
NA NA 5
NA 8 5
2 NA 6
6 3 6
14 NA 7
This is the the example of the function I want to build get_mean <- function(directory, colour, children)
What I have tried
get_mean <- function(directory, colour, children) {
files <- list.files(directory, full.names=TRUE)
allfiles <- data.frame()
for(i in 1:7) {
allfiles <- rbind(allfiles, read.csv(files[i]))
}
if(colour == "colour1"){
mean(allfiles$colour1[allfiles$Children == children], na.rm = TRUE)
}
if(colour == "colour2"){
mean(alllists$colour2[alllist$Children == children], na.rm = TRUE)
}
}
When I tried for example:
get_mean("Myfolder", "colour1", 3:6)
I get
In alllist$ID == id :
longer object length is not a multiple of shorter object length
and when I try:
get_mean("Myfolder", "colour1", 6)
I get:
>
Yes guys....I get back absolutely nothing.
What do you think guys? any correction to it? any other way to get the mean?
Note: all the data I put in here is not the one I am using. This is just an example from an exercise much bigger. I have tried to make a really small example with different names and numbers in order to don't discuss about the exercise itself and other could copy the solution
Here is a corrected and more readable version of your function - I named your data.frame all files df, I also added a check on colour:
get_mean <- function(directory, colour, children) {
files = list.files(directory, full.names=T)
df = do.call(rbind, lapply(files[1:7], read.csv))
# check the colour argument
if(!is.element(colour, c('colour1','colour2')))
stop(sprtinf('colour argument value %s is not part of df column', colour))
mean(df[[colour]][df$Children == children], na.rm=TRUE)
}

Resources