Faster method for aggregating this data in R [closed]

Faster method for aggregating this data in R [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a fairly large dataset (6.5 M rows, 8 cols) that I'm summarizing in a time series of aggregate counts of observations by day.
I'm currently summing across the intersection of two vectors that are the axes in my time series matrix. The iterations are taking hours to run, and I'm wondering if I'm overlooking something that might give better performance.
My code:
m<-length(datespace)
sensorlist<-as.vector(unique(sensordata$SOURCE))
n<-length(sensorlist)
y <- matrix(0, nrow=m, ncol=n)
colnames(y) <- sensorlist
for(sensor in 1:n){
for(date in 1:m){
count<-sum(as.vector(sensordata$SOURCE==sensorlist[sensor] & di==datespace[date]))
y[date,sensor] = count
}
}
I know FOR loops are less efficient are an indicator that there's probably a better way in R to get this done.
The crux of this problem seems to be a fast way to create a sparse matrix that fills in the missing summary data with zeros.

Pretty sure this is a simple tally:
library(dplyr)
sensordata %>%
group_by(SOURCE) %>% # or maybe group_by(SOURCE, di)?
tally()

Related

Dice Probability in R script [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Roll five six-sided dice. Write a script in R to calculate the probability of getting between 15 and 20 as the total sum of your roll. Exact solutions are preferred.
dice <- expand.grid(1:6, 1:6, 1:6, 1:6, 1:6)
dice.sums <- rowSums(dice)
mean(15 <= dice.sums & dice.sums <=20)
[1] 0.5570988
This is the code that I have, which the answer happens to be 0.5570988. Is there any other way to write it in one line of code? Or condense it? Any thoughts are welcome.

From this answer, which references this answer:
dDice <- Vectorize(function(k, m, n) {
# returns the probability of n m-sided dice summing to k
s <- 0:(floor((k - n)/m))
return(sum((-1)^(s)*choose(n, s)*choose(k - s*m - 1, n - 1))/m^n)
}, "k")
sum(dDice(15:20, 6, 5))
#> [1] 0.5570988
Note that I did not take care in the order in which I added the terms of the alternating sum, so the function may need to be modified to return accurate probabilities for larger input values.

How to construct matrices/vectors from a table in R? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm quite new to R, and if I imported a .csv file and if rows represent
time and columns represent n variables of interest, how could I construct a
function that returns any given 1xn vector from the table?
P.S. I'm not just interested in constructing a vector, but I will perform
matrix algebra with iterative calculations to estimate parameters, which means
I will need to use a for-loop.

If the data structure contains e.g. m rows and n columns i.e. n variables, you can easily construct the n vectors without much effort.
data<-read.csv(".../file.csv")
class(data)
[1] "data.frame"
class(as.numeric(data[1,]))
[1] "numeric"
So it is not a big deal to convert 1*n matrix i.e. vector of length(ncol(data)).
In a loop just use
data["required Row Number",]
to access the particular row. Each case it will ultimately give 1*n matrix or a vector of length(n)

You can use the function melt() from the package reshape2
Or if you want to use the for loop, try something like:
one_col <- data[,1]
for (i in 2:ncol(data)){
one_col <- rbind(one_col, data[,i])
}

Function for matrix in R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I'm new (very new) in R. I'm struggling with making a function that's supposed to take a matrix (old_matrix) and return a new matrix (new_matrix), but in new_matrix all values in old_matrix that is a prime should be multiplied by 2 when it appears in new_matrix. So the new matrix should look the same as the old matrix, but where a prime occurs in old, this element should be multiplied by 2.
I'm thinking that I should start out with a for loop, but I'm already struggling with how to make the loop go through all elements of the matrix. I appreciate all the help I can get to get closer to making this function!

The isPrime function in the numbers package could be a big help
# Start by creating an example to work with
old_matrix <- matrix(sample.int(100, 25), 5, 5)
# Create your new matrix and determine which numbers are prime
new_matrix <- old_matrix
primeVals <- numbers::isPrime(old_matrix)
# Index into the matrix using the prime value indicator and multiply by 2
new_matrix[primeVals] <- new_matrix[primeVals]*2

R: sort rows, query them and add results as colum [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have an R dataframe with the dimension 32 x 11. For each row I would like to determine the highest value, the second highest, and the third highest value and add these values as extra colums to the initial dataframe (32 x 14). Many thanks in advance!
library(car)
data(mtcars)
mtcars

First, create a function to get the nth highest value for a vector. Then, create a copy of the dataframe, since the second highest value may change as you add more columns. Then apply your function using apply and 1 to operate row-wise. I'm not sure what would happen if there are NAs in the data. I haven't tested it...
Something like this...
nth_highest <- function(x, n)sort(x, decreasing=TRUE)[n]
tmp <- mtcars
mtcars$highest <- apply(tmp, 1, function(x)nth_highest(x,1))
mtcars$second_highest <- apply(tmp, 1, function(x)nth_highest(x,2))
mtcars$third_highest <- apply(tmp, 1, function(x)nth_highest(x,3))
rm(tmp)

ttest on many columns in Matlab/R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Does anybody know the examples on how to run paired ttest in Matlab/R/SAS or Python/Java on many columns (I have 1139 variables) in all combinations or selected respective columns in a loop.
thank you

MATLAB Solution:
If I understand correctly, you're just looking for a way to feed ttest with two different columns from your input matrix everytime. You can get all possible combinations of column pairs using nchoosek:
pairs = nchoosek(1:size(X, 2), 2);
Now you can iterate over these indices, each time invoking ttest with a different pair:
for idx = transpose(pairs)
h = ttest(X(:, idx(1)), X(:, idx(2)));
%// Do something with the result...
end