Colwise sum of matrix in a sequential order [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I would like to do colwise sum of matrix that follow a particular sequence. For example, if I have a matrix of 50 rows, the first four rows will be added in a colwise manner, then 2 to 5 rows, 3 to 6, ... etc. following that pattern. How can I do this in R?

set.seed(123)
mat <- matrix(sample(100,50*10,replace=TRUE),nrow=50)
n <- nrow(mat)
sapply(1:(n-3), function(i) colSums(mat[i:(i+3),]))
#Update
oddInd <- sapply(1:(n-3), function(i) {ind <-i:(i+3); ind[!!ind%%2] })
evenInd <- sapply(1:(n-3), function(i) {ind <-i:(i+3); ind[!ind%%2] })

Related

Dice Probability in R script [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Roll five six-sided dice. Write a script in R to calculate the probability of getting between 15 and 20 as the total sum of your roll. Exact solutions are preferred.
dice <- expand.grid(1:6, 1:6, 1:6, 1:6, 1:6)
dice.sums <- rowSums(dice)
mean(15 <= dice.sums & dice.sums <=20)
[1] 0.5570988
This is the code that I have, which the answer happens to be 0.5570988. Is there any other way to write it in one line of code? Or condense it? Any thoughts are welcome.
From this answer, which references this answer:
dDice <- Vectorize(function(k, m, n) {
# returns the probability of n m-sided dice summing to k
s <- 0:(floor((k - n)/m))
return(sum((-1)^(s)*choose(n, s)*choose(k - s*m - 1, n - 1))/m^n)
}, "k")
sum(dDice(15:20, 6, 5))
#> [1] 0.5570988
Note that I did not take care in the order in which I added the terms of the alternating sum, so the function may need to be modified to return accurate probabilities for larger input values.

Loop over NA values in subsequent rows in R [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to replace each missing value in the first column of my dataframe with the previous one multiplied by a scalar (eg. 3)
nRowsDf <- nrow(df)
for(i in 1:nRowsDf){
df[i,1] =ifelse(is.na(df[i,1]), lag(df[i,1])+3*lag(df[i,1]), df[i,1])
}
The above code does not give me an error but does not do the job either.
In addition, is there a better way to do this instead of writing a loop?
Update and Data:
Here is an example of data. I want to replace each missing value in the first column of my dataframe with the previous one multiplied by a scalar (eg. 3). The NA values are in subsequent rows.
df <- mtcars
df[c(2,3,4,5),1] <-NA
IND <- is.na(df[,1])
df[IND,1] <- df[dplyr::lead(IND,1L, F),1] * 3
The last line of the above code does the job row by row (I should run it 4 times to fill the 4 missing rows). How can I do it once for all rows?
reproducible data which YOU should provide:
df <- mtcars
df[c(1,5,8),1] <-NA
code:
IND <- is.na(df[,1])
df[IND,1] <- df[dplyr::lag(IND,1L, F),1] * 3
since you use lag I use lag. You are saying "previous". So maybe you want to use lead.
What happens if the first value in lead case or last value in lag case is missing. (this remains a mystery)

Subset rows from list of dataframe [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I want to subset rows,say from 130:150, in each dataframe present in a list. I have written a below code to subset:
test<-lapply(res,subset, [130:150,]) # res contains the list of dataframes
But this code is throwing below error:
Error in res[130:150, ] : incorrect number of dimensions
Thanks in advance!
res <- list(mtcars,mtcars)
lapply(res, function(x) return(x[2:4,]))
is returning the rows 2 to 4 of each dataframe. If you want the columns, use
lapply(res, function(x) return(x[,2:4]))
or Gregors solution lapply(res, "[", 2:4)

R: sort rows, query them and add results as colum [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have an R dataframe with the dimension 32 x 11. For each row I would like to determine the highest value, the second highest, and the third highest value and add these values as extra colums to the initial dataframe (32 x 14). Many thanks in advance!
library(car)
data(mtcars)
mtcars
First, create a function to get the nth highest value for a vector. Then, create a copy of the dataframe, since the second highest value may change as you add more columns. Then apply your function using apply and 1 to operate row-wise. I'm not sure what would happen if there are NAs in the data. I haven't tested it...
Something like this...
nth_highest <- function(x, n)sort(x, decreasing=TRUE)[n]
tmp <- mtcars
mtcars$highest <- apply(tmp, 1, function(x)nth_highest(x,1))
mtcars$second_highest <- apply(tmp, 1, function(x)nth_highest(x,2))
mtcars$third_highest <- apply(tmp, 1, function(x)nth_highest(x,3))
rm(tmp)

Faster method for aggregating this data in R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a fairly large dataset (6.5 M rows, 8 cols) that I'm summarizing in a time series of aggregate counts of observations by day.
I'm currently summing across the intersection of two vectors that are the axes in my time series matrix. The iterations are taking hours to run, and I'm wondering if I'm overlooking something that might give better performance.
My code:
m<-length(datespace)
sensorlist<-as.vector(unique(sensordata$SOURCE))
n<-length(sensorlist)
y <- matrix(0, nrow=m, ncol=n)
colnames(y) <- sensorlist
for(sensor in 1:n){
for(date in 1:m){
count<-sum(as.vector(sensordata$SOURCE==sensorlist[sensor] & di==datespace[date]))
y[date,sensor] = count
}
}
I know FOR loops are less efficient are an indicator that there's probably a better way in R to get this done.
The crux of this problem seems to be a fast way to create a sparse matrix that fills in the missing summary data with zeros.
Pretty sure this is a simple tally:
library(dplyr)
sensordata %>%
group_by(SOURCE) %>% # or maybe group_by(SOURCE, di)?
tally()

Resources