Summing multiple values from a data frame using a loop - r

I have a single data frame data and a vector cryptos <- c("btc","eth","bnb","xrp") (where "btc" and etc. are the names of crypto currencies). I need to create a FOR loop that would sum the values of each coin.
So far, I've managed to 'return' every value with a print function:
cryptos <- c("btc","eth","bnb","xrp")
for(i in 1:4) {
print(data[data$crypto_name == cryptos[i], 3]) #where 3 is the number of a column with crypto values
}
So it prints the given currencies' values:
[1] 45065
[1] 2190.07
[1] 459.61
[1] 1.12
Yet, I do not want to print these values, just sum them with the use of a loop. Please tell me, how could I possibly do this.

Is this what you need?
sum( data[data$crypto_name %in% cryptos, 3] )
A basic sum loop is trival:
s = 0
for(i in 1:4) {
s = s + data[data$crypto_name == cryptos[i], 3]
}
s

Related

How do I save a single column of data produced from a while loop in R to a dataframe?

I have written the following very simple while loop in R.
i=1
while (i <= 5) {
print(10*i)
i = i+1
}
I would like to save the results to a dataframe that will be a single column of data. How can this be done?
You may try(if you want while)
df1 <- c()
i=1
while (i <= 5) {
print(10*i)
df1 <- c(df1, 10*i)
i = i+1
}
as.data.frame(df1)
df1
1 10
2 20
3 30
4 40
5 50
Or
df1 <- data.frame()
i=1
while (i <= 5) {
df1[i,1] <- 10 * i
i = i+1
}
df1
If you already have a data frame (let's call it dat), you can create a new, empty column in the data frame, and then assign each value to that column by its row number:
# Make a data frame with column `x`
n <- 5
dat <- data.frame(x = 1:n)
# Fill the column `y` with the "missing value" `NA`
dat$y <- NA
# Run your loop, assigning values back to `y`
i <- 1
while (i <= 5) {
result <- 10*i
print(result)
dat$y[i] <- result
i <- i+1
}
Of course, in R we rarely need to write loops like his. Normally, we use vectorized operations to carry out tasks like this faster and more succinctly:
n <- 5
dat <- data.frame(x = 1:n)
# Same result as your loop
dat$y <- 10 * (1:n)
Also note that, if you really did need a loop instead of a vectorized operation, that particular while loop could also be expressed as a for loop.
I recommend consulting an introductory book or other guide to data manipulation in R. Data frames are very powerful and their use is a necessary and essential part of programming in R.

R programming: How to set while loop condition based if all required values in vector have been copied from sample?

I new to R and I'm trying to see how many iterations are needed to fill a vector with numbers 1 to 55 (no duplicates) from a random sample using runif.
At the moment, the vector has a lots of duplicates in it and my number of iterations being returned is the size of the vector. So, i'm not sure if my logic is correct.
The aim of the if statement is to check if the value from the sample exists in the vector, and if it does, choose the next one. But i'm not sure if it's correct, since the next number could already exist in the vector. Any help would be much appreciated
numbers=as.integer(runif(800, min=1, max=55)) ## my sample from runif
i=sample(numbers, 1)
## setting up my vector to store 55 unique values (1 to 55)
p=rep(0,55)
## my counters
j=0
n=1
## my while loop
while (p[n] %in% 0){
## if the sample value already exists in the vector, choose the next value from the sample
if (numbers[n] %in% p) {
p[n]=numbers[n+1]
}
else {
p[n] = numbers[n]
}
n = n + 1
j = j + 1
}
I believe that the following is what you want. Instead pf a while loop on p, the while loop should search for a new value in numbers.
set.seed(2021) # make the results reproducible
numbers <- sample(55, 800, TRUE)
## setting up my vector to store 55 unique values (1 to 55)
p <- integer(55)
# assign the elemnts of p one by one
for(j in seq_along(p)){
## if the sample value already exists in the vector,
## choose the next value from the sample
n <- 1
while (numbers[n] %in% p) {
n <- n + 1
}
if(n <= length(numbers)){
p[j] <- numbers[n]
}
}
j
#[1] 55
length(unique(p)) == length(p)
#[1] TRUE

cor() function in R with a subset

I have a table in R with three columns. I want to get the correlation of the first two columns with a subset of the third column following a specific set of conditions (values are all numeric, I want them to be > a certain number). The cor() function doesn't seem to have an argument to define such a subset.
I know that I could use the summary(lm()) function and square-root the r^2, but the issue is that I'm doing this inside a for loop and am just appending the correlation to a separate list that I have. I can't really append part of the summary of the regression easily to a list.
Here is what I am trying to do:
for (i in x) {list[i] = cor(data$column_a, data$column_b, subset = data$column_c > i)}
Obviously, though, I can't do that because the cor() function doesn't work with subsets.
(Note: x = seq(1,100) and list = NULL)
You can do this without a loop using lapply. Here's some code that will output a data frame with the month-range in one column and the correlation in another column. The do.call(rbind... business is just to take the list output from lapply and turn it into a data frame.
corrs = do.call(rbind, lapply(min(airquality$Month):max(airquality$Month),
function(x) {
data.frame(month_range=paste0(x," - ", max(airquality$Month)),
correlation = cor(airquality$Temp[airquality$Month >= x & airquality$Temp < 80],
airquality$Wind[airquality$Month >= x & airquality$Temp < 80]))
}))
corrs
month_range correlation
1 5 - 9 -0.3519351
2 6 - 9 -0.2778532
3 7 - 9 -0.3291274
4 8 - 9 -0.3395647
5 9 - 9 -0.3823090
You can subset the data first, and then find the correlation.
a <- subset(airquality, Temp < 80 & Month > 7)
cor(a$Temp, a$Wind)
Edit: I don't really know what your list variable is, but here is an example of dynamically changing the subset based on i (see how the month requirement changes with each iteration)
list <- seq(1, 5)
for (i in 1:5){
a <- subset(airquality, Temp < 80 & Month > i)
list[i] <- cor(a$Temp, a$Wind)
}
Based on the pseudo-code you provided alone, here's something that should work:
for (i in x) {
df <- subset(data, column_c > i)
list[i] = cor(df$column_a, df$column_b)
}
However, I don't know why you would want your index in list[i] to be the same value that you use to subset column_c. That could be another source of problems.

Fill data.frame with missing columns

I have the following function taken from R: iterative outliers detection (this is an updated version):
dropout<-function(x) {
outliers <- NULL
res <- NULL
if(length(x)<2) return (1)
vals <- rep.int(1, length(x))
r <- chisq.out.test(x)
while (r$p.value<.05 & sum(vals==1)>2) {
if (grepl("highest",r$alternative)) {
d <- which.max(ifelse(vals==1,x, NA))
res <- rbind(list(as.numeric(strsplit(r$alternative," ")[[1]][3]),as.numeric(r$p.value)),fill=TRUE)
}
else {
d <- which.min(ifelse(vals==1, x, NA))
}
vals[d] <- r$p.value
r <- chisq.out.test(x[vals==1])
}
return(res)
}
The problem is that in each round it gives me some missing rows to fill in the data.frame
i want to fill res but in some iterations it contains missing values.
I used all possible things e.g rbindlist, rbind.fill, rbind (with fill=TRUE) but nothing is working.
When i do something like :
res <- c(res,as.numeric(strsplit(r$alternative," ")[[1]][3]),as.numeric(r$p.value))
it works but it creates 2 rows for each set of (V1,V2), one with the last column as r$alternativeand the second row with the same first 2 columns but with the p-value in the last column instead.
Thats how I'm calling the function on data similar as the one in the mentioned question:
outliers <- d[, dropout(V3), list(V1, V2)]
and im getting always this error : j doesn't evaluate to the same number of columns for each group

Store values in For Loop

I have a for loop in R in which I want to store the result of each calculation (for all the values looped through). In the for loop a function is called and the output is stored in a variable r in the moment. However, this is overwritten in each successive loop. How could I store the result of each loop through the function and access it afterwards?
Thanks,
example
for (par1 in 1:n) {
var<-function(par1,par2)
c(var,par1)->var2
print(var2)
So print returns every instance of var2 but in var2 only the value for the last n is saved..is there any way to get an array of the data or something?
initialise an empty object and then assign the value by indexing
a <- 0
for (i in 1:10) {
a[i] <- mean(rnorm(50))
}
print(a)
EDIT:
To include an example with two output variables, in the most basic case, create an empty matrix with the number of columns corresponding to your output parameters and the number of rows matching the number of iterations. Then save the output in the matrix, by indexing the row position in your for loop:
n <- 10
mat <- matrix(ncol=2, nrow=n)
for (i in 1:n) {
var1 <- function_one(i,par1)
var2 <- function_two(i,par2)
mat[i,] <- c(var1,var2)
}
print(mat)
The iteration number i corresponds to the row number in the mat object. So there is no need to explicitly keep track of it.
However, this is just to illustrate the basics. Once you understand the above, it is more efficient to use the elegant solution given by #eddi, especially if you are handling many output variables.
To get a list of results:
n = 3
lapply(1:n, function(par1) {
# your function and whatnot, e.g.
par1*par1
})
Or sapply if you want a vector instead.
A bit more complicated example:
n = 3
some_fn = function(x, y) { x + y }
par2 = 4
lapply(1:n, function(par1) {
var = some_fn(par1, par2)
return(c(var, par1)) # don't have to type return, but I chose to make it explicit here
})
#[[1]]
#[1] 5 1
#
#[[2]]
#[1] 6 2
#
#[[3]]
#[1] 7 3

Resources