I'm new to R and am struggling with the apply function. It is really slow to execute and I was trying to optimize some code I received.
I am trying to do some matrix operations (element-wise multiplication and division on ~10^6 element matrices) then sum the rows of the resulting matrix. I found the fantastic library Rfast and it executes what I thought was the same code in about 1/30 the time, but I am getting systematic differences between my 'optimized' answer and the previous answer.
The original code was something along the lines of
ans <- apply(object, 1, function(x) sum((x - a) / b))
and my code is
ans = Rfast:::rowsums((object-a)/b)
I'm not sure if it's because one of the methods is throwing away precision or making rounding errors - any thoughts?
Edit
Trying to reproduce the error is pretty hard...
I have been able to isolate the discrepancy to when I divide by my vector b with entries each ~ 3000 (i.e. [3016.460436, 3021.210321, 3033.3303219]. If I take this term out the two methods give the same answer.
I then tried two methods to improve my answer, one was dividing b by 1000 then dividing the sum by 1000 at the end. This didn't work, presumably because the float precision is the same either way.
I also tried forcing my b vector to be integers, which also didn't work.
Sample data doesn't reproduce my error either, which is frustrating...
objmat = rbind(rep(c(1,0,0),1000),rep(c(0,0,1),1000))
amat = rbind(rep(c(0.064384654, 0.025465132, 0.36543214),1000))
bmat = rbind(rep(c(1016.460431,1021.210431,1033.330431),1000))
ans = apply(objmat,1,function(x) sum((x-amat)/bmat))
gives
ans[1] = 0.5418828413
rowsums((objmat[1,]-amat)/bmat) = 0.5418828413
I think it has to be a floating point precision error, but I'm not sure why my dummy data doesn't reproduce it, or which method (apply or rowsums) would be more accurate!
I'm trying to understand this question on leetcode
Partition Equal subset problem
The solution section has recommended a naive approach to recurse, in one part it suggests this:
isSum (subSetSum, n) = isSum(subSetSum- nums[n], n-1) || isSum(subSetSum, n-1)
But in the sample code, the recursion logic is set as:
bool result = dfs(nums, n - 1, subSetSum - nums[n - 1]) || dfs(nums, n - 1, subSetSum);
Why is it that in the solution we're subtracting nums[n] and in the final solution we're subtracting nums[n-1]. And which one is the right solution ? I tried and both seem to be giving the right result, something is wrong here but I cannot see what.
Any suggestions ?
I'm a beginner in RStudio and I'm facing a problem. I have a dataset called the sensor_data which has sensors S12, S13, S14 (as column names). I want to record values of: S14 - S13, and S13 - S12 (in this fashion only) and include them in my data frame. Below I've mentioned a simple example of the problem below (which is not working) to see how the result would look like. But this doesn't work because [val -1] is not acting like an index.
I can do them individually and then add them to the dataframe but that is a costly operation. Wondering if there is a smarter way to do it through a for loop.
P001<- list("S12","S13","S14")
for (val in P001){
print(sensor_data[[val]] - sensor_data[[val - 1]])
}
I follow the logic from Python programming where I can index lists through a for loop but that doesn't seem to be the case in R.
Any sort of help will be useful. Plus if anybody can recommend a good book where I can learn to do such operations then that would be amazing as well.
You can use the base::transform function or dplyr::mutate
Using mutate:
library(dplyr)
sensor_data %>%
mutate(difference1 = s14 - s13,
difference2 = s13 - s12)
If you want to use a for loop over multiple columns, you could do this:
newdata <- sensor_data
for (i in 2:ncol(sensor_data)){
newdata[ncol(sensor_data) + i - 1] <- sensor_data[i] - sensor_data[i - 1]
colnames(newdata)[ncol(sensor_data) + i - 1] <-
paste0(colnames(sensor_data)[i], "-", colnames(sensor_data)[i - 1])
}
Though there may be a more easily readable way out there to do it.
Check out: https://r4ds.had.co.nz/transform.html and https://r4ds.had.co.nz/iteration.html for info on manipulating datasets and iterations.
I'm trying to figure out how to use recursion on count and sum rules.
I usually do it with lists, using findall and length or findall and sum_list, but I'm not sure if that's my best option on all cases.
This is my approach with lists:
%person(name, surname, age)
person('A', 'H', 22).
person('B', 'G', 24).
person('C', 'F', 20).
person('D', 'E', 44).
person('E', 'D', 45).
person('F', 'C', 51).
person('G', 'B', 40).
person('H', 'A', 51).
count_person(Total_count) :- % rule to count how many person are.
findall(N, person(N, _, _), List),
length(List, Total_count).
sum_ages(Total_sum) :- % rule to sum all the ages.
findall(Age, person(_, _, Age), List),
sum_list(List, Total_sum).
or here: https://swish.swi-prolog.org/p/cswl.pl
How should I do this using recursion?
You should take a look at library(aggregate).
For instance:
count_person(Total_count) :-
aggregate(count, A^B^C^person(A,B,C), Total_count).
or the simpler form (try to understand the difference, it's a a good way to learn the basic about variables quantification)
count_person(Total_count) :-
aggregate_all(count, person(_,_,_), Total_count).
The library has grown out of the necessity to simplify the implementation of typical aggregation functions available in SQL (since Prolog is relational at heart):
sum_ages(Total_sum) :-
aggregate(sum(Age), A^B^person(A,B,Age), Total_sum).
You can also get combined aggregates in a step. Average is readily implemented:
ave_ages(Ave) :-
aggregate(t(count,sum(Age)), A^B^person(A,B,Age), t(Count,Sum)), Ave is Sum/Count.
If you implement using count_person/1 and sum_ages/1 the interpreter would scan twice the goal...
I do not have an elegant solution. But with retract and assert you can control the recursion:
:- dynamic([person/3,person1/3]).
count_person(N) :-
count_person(0,N).
count_person(Acc,N) :-
retract(person(A,B,C)),
!,
assert(person1(A,B,C)),
N1 is Acc+1,
count_person(N1,N).
count_person(N,N) :-
clean_db.
clean_db :-
retract(person1(A,B,C)),
assert(person(A,B,C)),
fail.
clean_db.
I'm trying to translate a code from MatLab into R, but I'm stuck on the following line:
SqO=U.* sqrt(D)*V'
I feel like I'm close:
SqO<-Conj(t(U%*%sqrt(D)*V))
...but the output still isn't matching up. All the variables (Sq0, U, D, and V) are 20x20 matrices if that helps.
Hmmm, I'm no expert in R, but I do know a bit of Matlab. In Matlab the sub-expression
U.* sqrt(D)
does an element-by-element multiplication of U and the square root of D. That is, element (i,j) in U is multiplied by element (i,j) in sqrt(D); so this is not the usual matrix multiplication. Is that what your U%*%sqrt(D) does ? sqrt(D) also operates on the individual elements, that is sqrt(D)~=D^(1/2)*D^(1/2).
Then the Matlab code multiplies the result of the previous operation by the transpose of V (if V is a real array); again my R is too weak to know whether you've done this or an equivalent operation.
From what HighPerformanceMark wrote the translation should be:
SqO=U.* sqrt(D)*V' # Matlab
SqO <- U* sqrt(D) %*% t(V) # R