Appending to an R List one by one - r

Let's say I have data like:
> data[295:300,]
Date sulfate nitrate ID
295 2003-10-22 NA NA 1
296 2003-10-23 NA NA 1
297 2003-10-24 3.47 0.363 1
298 2003-10-25 NA NA 1
299 2003-10-26 NA NA 1
300 2003-10-27 NA NA 1
Now I would like to add all the nitrate values into a new list/vector. I'm using the following code:
i <- 1
my_list <- c()
for(val in data)
{
my_list[i] <- val
i <- i + 1
}
But this is what happens:
Warning message:
In x[i] <- val :
number of items to replace is not a multiple of replacement length
> i
[1] 2
> x
[1] NA
Where am I going wrong? The data is part of a Coursera R Programming coursework. I can assure you that this is not an assignment/quiz. I have been trying to understand what is the best way append elements into a list with a loop? I have not proceeded to the lapply or sapply part of the coursework, so thinking about workarounds.
Thanks in advance.
If it's a duplicate question, please direct me to it.

As we mention in the comments, you are not looping over the rows of your data frame, but the columns (also sometimes variables). Hence, loop over data$nitrate.
i <- 1
my_list <- c()
for(val in data$nitrate)
{
my_list[i] <- val
i <- i + 1
}
Now, instead of looping over your values, a better way is to use that you want the new vector and the old data to have the same index, so loop over the index i. How do you tell R how many indexes there are? Here you have several choices again: 1:nrow(data), 1:length(data$nitrate) and several other ways. Below I have given you a few examples of how to extract from the data frame.
my_vector <- c()
for(i in 1:nrow(data)){
my_vector[i] <- data$nitrate[i] ## Version 1 of extracting from data.frame
my_vector[i] <- data[i,"nitrate"] ## Version 2: [row,column name]
my_vector[i] <- data[i,3] ## Version 3: [row,column number]
}
My suggestion: Rather than calling the collection a list, call it a vector, since that is what it is. Vectors and lists behave a little differently in R.
Of course, in reality you don't want to get the data out one by one. A much more efficient way of getting your data out is
my_vector2 <- data$nitrate

Related

R Programming - Combine lists with similar names after looping

I currently have the following loop.
> margin_values
$margINCBJP
[1] 0.8481856 0.9165585 0.9270849 0.7932756 0.8296131 0.8284826 0.7584834 0.2566567
$margINCTRS
[1] NA NA NA NA NA 0.84499199 0.73135251 -0.06664292
$margBJPTRS
[1] NA NA NA NA NA 0.01650935 -0.02713086 -0.32329962
for(i in 1:length(margin_values)) {
nam <- paste("x", i, sep = "")
assign(nam, margin_values[[i:i]])
}
This creates separate lists starting at x1 to xn. How can I then automatically combine the numbers from all the lists to create one list? I know I can manually type c(x1, x2, x3...) all the way up until n, but since n is variable, is there anyway to have R simply do c() on all values starting with x? For this example, n=3, but depending on parameters I have earlier in my code it may change.
I Just ran into this myself and here is what I came up with (tweaked for you of course):
total<-c(lapply(ls(pattern = "x"),get))
This will create a list, total, with each element being one of your variables starting with x

ifelse in 'r' returns NA value

I have two data frames, and I want to match contents of one with other, for this I am using following function:
t <- read.csv("F:/M.Tech/Semester4/Thesis/Code/Book1.csv")
s <- read.csv("F:/M.Tech/Semester4/Thesis/Code/a4.csv")
x <- nrow(s)
y <- nrow(t)
for(i in 1:x)
for(j in 1:y)
ifelse (match(s[i,2], t[j,1]), s[i,9] <- t[j,2] , s[i,9] <- 0)
for this code, when the contents match then it works fine. But the else part returns NA. How can I assign 0 to all the places where there is no match.
I am getting the result as:
# word count word tf score word robability log values TFxIDF score Keyword Probability
# yemen 380 yemen 1 0.053938964 2.919902172 2.919902172 NA
# strikes 116 strikes 0.305263158 0.016465578 4.106483233 1.25355804 0.5
# deadly 105 deadly 0.276315789 0.014904187 4.206113074 1.162215455 0.7
# new 88 new 0.231578947 0.012491128 4.38273661 1.014949531 NA
Instead of the NA. I want to store 0 there.
Issue 1: ifelse returns one of two values, depending on the test condition. It's not a flow control function that executes code snippet one or code snippet two based on a condition.
This is right:
my_var <- ifelse(thing_to_test, value_if_true, value_if_false)
This is wrong, and doesn't make sense in R
ifelse(thing_to_test, my_var <- value_if_true, my_var <- value_if_false)
Issue 2: make sure thing_to_test is a logical expression.
Putting those things together, you can see you should follow the instruction left by Richard Scriven as a comment above

Generate a sequence of Data frame from function

I searched but I couldn't find a similar question, so Apologies in advance if this is a duplicate question. I am trying to Generate a data frame from within a for loop in R.
what I want to do:
Define each columns of each data frame by a function,
Generate n data frames (length of my sequence of data frame) using loop,
As example I will use n=100 :
n<-100
k<-8
d1 <- data.frame()
for(i in 1:(k)) {d1 <- rbind(d1,c(a="i+1",b="i-1",c="i/1"))}
d2 <- data.frame()
for(i in 1:(k+2)) {d2 <- rbind(d2,c(a="i+2",b="i-2",c="i/2"))}
...
d100 <- data.frame()
for(i in 1:(k+100)) {d100 <- rbind(d100,c(i+100, i-100, i/100))}
It is clear that It'll be difficult to construct one by one each data.frame. I tried this:
d<-list()
for(j in 1:100) {
d[j] <- data.frame()
for(i in 1:(k+j)) {d[j] <- rbind(d[j] ,c(i+j, i-j, i/j))}
But I cant really do anything with it, I run into an error :
Error in d[j] <- data.frame() : replacement has length zero
In addition: Warning messages:
1: In d[j] <- rbind(d[j], c(i + j, i - j, i/j)) :
number of items to replace is not a multiple of replacement length
And a few more remarks about your example:
the number of rows in each data frame are not the same : d1 has 8 rows, d2 has 10 rows, and d100 has 8+100 rows,
the algorithm should give us : D=(d1,d2, ... ,d100).
It would be great to get an answer using the same approach (rbind) and a more base like approach. Both will aid in my understanding. Of course, please point out where I'm going wrong if it's obvious.
Here's how to create an empty data.frame (and it's not what you are trying):
Create an empty data.frame
And you should not be creating 100 separate dataframes but rather a list of dataframes. I would not do it with rbind, since that would be very slow. Instead I would create them with a function that returns a dataframe of the required structure:
make_df <- function(n,var) {data.frame( a=(1:n)+var,b=(1:n)-var,c=(1:n)/var) }
mylist <- setNames(
lapply(1:100, function(n) make_df(n,n)) , # the dataframes
paste0("d_", 1:100)) # the names for access
head(mylist,3)
#---------------
$d_1
a b c
1 2 0 1
$d_2
a b c
1 3 -1 0.5
2 4 0 1.0
$d_3
a b c
1 4 -2 0.3333333
2 5 -1 0.6666667
3 6 0 1.0000000
Then if you want the "d_40" dataframe it's just:
mylist[[ "d_40" ]]
Or
mylist$d_40
If you want to perform the same operation or get a result from all of them at nce; just use lapply:
lapply(mylist, nrow) # will be a list
Or:
sapply(mylist, nrow) #will be a vector because each value is the same length.

Creating a list of data frames using indexes for start and stop, in R

In R, take any large data frame (example 300,000 rows and 30 columns). I want to create a list of data frames using start and stop index values I have stored in another data frame (two columns, first column are the start values and the second contains the stop values.) The number of rows in the start-stop df will be the number of dataframes stored in the list (in this small example, 6). To me there sounds like there might be an easy function to do this, but before I've always created lists of data frames before using the split command or with different conditional statements, so I did some research but couldn't find a solution. Also, I'm double looping below, which is not preferable. Any help greatly appreciated!
Example of start, stop data frame
> df
headID tailID
[1,] 688 704
[2,] 2576 2583
[3,] 4005 4018
[4,] 4336 5761
[5,] 5762 7201
[6,] 7202 8641
So I'm thinking something like (pseudo-code):
n <- length(bigDF)
subList <- list()
start.idx <- NA
obs <- dim(bigDF)
for(i in 2:obs){
for(j in 1:df) {
start.idx <- df$headID[j]
}
else if
end.idx <- df$tailID[j]
subMat <- bigDF[start.idx:end.idx,]
subList[[counter]] <- subMat
start.idx <- NA
counter <- counter + 1
}
}
}
I would write a function and apply it...
f <- function(x, data) {
data[x[1]:x[2],]
}
apply(df, 1, f, bigDF)

Change multiple dataframes in a loop

I have, for example, this three datasets (in my case, they are many more and with a lot of variables):
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame2 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
on each data frame I want to add a variable resulting from a transformation of an existing variable on that data frame. I would to do this by a loop. For example:
datasets <- c("data_frame1","data_frame2","data_frame3")
vars <- c("a","b","c")
for (i in datasets){
for (j in vars){
# here I need a code that create a new variable with transformed values
# I thought this would work, but it didn't...
get(i)$new_var <- log(get(i)[,j])
}
}
Do you have some valid suggestions about that?
Moreover, it would be great for me if it were possible also to assign the new column names (in this case new_var) by a character string, so I could create the new variables by another for loop nested in the other two.
Hope I've not been too tangled in explain my problem.
Thanks in advance.
You can put your dataframes in a list and use lapply to process them one by one. So no need to use a loop in this case.
For example you can do this :
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame3 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
ll <- list(data_frame1,data_frame2,data_frame3)
lapply(ll,function(df){
df$log_a <- log(df$a) ## new column with the log a
df$tans_col <- df$a+df$b+df$c ## new column with sums of some columns or any other
## transformation
### .....
df
})
the dataframe1 becomes :
[[1]]
a b c log_a tans_col
1 1 3 4 0.0000000 8
2 5 6 4 1.6094379 15
3 3 1 1 1.0986123 5
4 3 5 9 1.0986123 17
5 2 5 2 0.6931472 9
I had the same need and wanted to change also the columns in my actual list of dataframes.
I found a great method here (the purrr::map2 method in the question works for dataframes with different columns), followed by
list2env(list_of_dataframes ,.GlobalEnv)

Resources