non-comprehensible modulo calculation [duplicate] - r

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 6 years ago.
I've got a strange result for my modulo query here. Maybe somebody has a solution for it:
d <- seq(0.0,1.0,0.1)
lab.y <- ifelse(((d*10) %% 2.0 == 0.0),d, NA)
will give the result:
[1] 0.0 NA 0.2 NA 0.4 NA NA NA 0.8 NA 1.0
so the 0.6 is missing.
I tried to add a query like:
ifelse((d*10/2 == 3.0), d, NA)
which is all FALSE even though
d*10/2
[1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
...
I don't really understand what's going on here.
Thanks a lot in advance!

This is due to floating point errors, you should look for low absolute differences instead of exact matches. It is not 0.6, but (just an example) 0.6000000003 or 0.5999999997. Try something like:
ifelse((abs((d*10) %% 2)<0.000001), d, NA)

Related

How to rank data from multiple rows and columns?

Example data:
>data.frame("A" = c(20,40,53), "B" = c(40,11,60))
What's the easiest way in R to get from this
A B
1 20 40
2 40 11
3 53 60
to this?
A B
1 2.0 3.5
2 3.5 1.0
3 5.0 6.0
I couldn't find a way to make rank() or frank() work on multiple rows/columns and googling things like "r rank dataframe" "r rank multiple rows" yielded only questions on how to rank multiple rows/columns individually, which is weird, as I suspect the question must have been answered before.
Try rank like below
df[] <- rank(df)
or
df <- list2DF(relist(rank(df),skeleton = unclass(df)))
and you will get
> df
A B
1 2.0 3.5
2 3.5 1.0
3 5.0 6.0

Extracting complete paired values (non-NA) from a matrix in R [duplicate]

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Closed 7 years ago.
I apologize if this is elementary or has been answered before, but I haven't found an answer to my question despite extensive searching. I'm also very new to programming so please bear with me here.
I have a bunch of 25 by 2 matrices of data, however some of the cells have NA values. I'm looking to extract a subset of the matrix consisting of only the complete paired values (so no NA values).
So say I have:
3.6 4.2
9.2 8.4
4.8 NA
1.1 8.2
NA 11.6
NA NA
2.7 3.5
I want:
3.6 4.2
9.2 8.4
1.1 8.2
2.7 3.5
Is there some function that would do this easily?
Thanks!
Try this
df <- read.table(text = "3.6 4.2
9.2 8.4
4.8 NA
1.1 8.2
NA 11.6
NA NA
2.7 3.5")
df[complete.cases(df), ]
# V1 V2
# 1 3.6 4.2
# 2 9.2 8.4
# 4 1.1 8.2
# 7 2.7 3.5
df[ apply(!is.na(df), 1, all) , ]
df <- data.frame(V1 = c(3.6,9.2,4.8,1.1,NA,NA,2.7),
V2 = c(4.2,8.4,NA,8.2,11.6,NA,3.5))
EDIT: I forgot na.omit or complete.cases Doh.

R: Improvement of loop to create distance matrix from data frame

I am creating a distance matrix using the data from a data frame in R.
My data frame has the temperature of 2244 locations:
plot temperature
A 12
B 12.5
C 15
... ...
I would like to create a matrix that shows the temperature difference between each pair of locations:
. A B C
A 0 0.5 3
B 0.5 0 0.5
C 3 2.5 0
This is what I have come up with in R:
temp_data #my data frame with the two columns: location and temperature
temp_dist<-matrix(data=NA, nrow=length(temp_data[,1]), ncol=length(temp_data[,1]))
temp_dist<-as.data.frame(temp_dist)
names(temp_dist)<-as.factor(temp_data[,1]) #the locations are numbers in my data
rownames(temp_dist)<-as.factor(temp_data[,1])
for (i in 1:2244)
{
for (j in 1:2244)
{
temp_dist[i,j]<-abs(temp_data[i,2]-temp_data[j,2])
}
}
I have tried the code with a small sample with:
for (i in 1:10)
and it works fine.
My problem is that the computer has been running now for two full days and it hasn't finished.
I was wondering if there is a way of doing this quicker. I am aware that loops in loops take lots of times and I am trying to fill in a matrix of more than 5 million cells and it makes sense it takes so long, but I am hoping there is a formula that gets the same result in a quicker time as I have to do the same with the precipitation and other variables.
I have also read about dist, but I am unsure if with the data frame I have I can use that formula.
I would very much appreciate your collaboration.
Many thanks.
Are you perhaps just looking for the following?
out <- dist(temp_data$temperature, upper=TRUE, diag=TRUE)
out
# 1 2 3
# 1 0.0 0.5 3.0
# 2 0.5 0.0 2.5
# 3 3.0 2.5 0.0
If you want different row/column names, it seems you have to convert this to a matrix first:
out_mat <- as.matrix(out)
dimnames(out_mat) <- list(temp_data$plot, temp_data$plot)
out_mat
# A B C
# A 0.0 0.5 3.0
# B 0.5 0.0 2.5
# C 3.0 2.5 0.0
Or just as an alternative from the toolbox:
m <- with(temp_data, abs(outer(temperature, temperature, "-")))
dimnames(m) <- list(temp_data$plot, temp_data$plot)
m
# a b c
# a 0.0 0.5 3.0
# b 0.5 0.0 2.5
# c 3.0 2.5 0.0

generate an output from a calculation between 2 columns in R

I have a data set representing movement through a 2d environment with respect to time:
time(s) start_pos fwd_dist rev_dist end_pos
1 0.0 4.0 -3.0 2.0
2 2.0 5.1 0.5 3.0
3 3.0 4.7 -0.5 3.5
4 3.5 3.6 -1.8 2.1
5 2.1 2.6 -2.1 1.0
6 1.0 1.5 -1.5 -0.2
I want to make another column which is the result of a check to see which is larger between "end_pos" and "start_pos" and subtracting the larger number from "fwd_dist". I'm trying to loop through the dataset but seem to be struggling with the syntax in R
i<-0
while (i < length(data[,1]){if (data[i,4] > data[i,1]){print (data[i,2]-data[i,4])} else {print (data[i,2]-data[i,1])}; i<-i+1}
I keep getting the error:
Error in if (data[i, 4] > data[i, 1]) { :
argument is of length zero
pmax(start_pos,end_pos)
will give you the parallel maximum (i.e., componentwise) of two vectors. So you are probably looking for
fwd_dist-pmax(start_pos,end_pos)
A data frame based approach:
data$difference <- data$fwd_dist - pmax(data$start_pos, data$end_pos)

R - moving window comparison with datasets of unequal size

I need to compare a large set of values to a small set and find the minimum difference between the two. Maybe this is “moving window” comparison? I’ve looked at several time series packages but can’t find (or recognize) a function that compares data sets of different sizes. Text example below. Any help is greatly appreciated.
----------1st comparison-----------
Time S1 S2 Diff Mean Diff
1 1.3 1.2 0.1
2 1.7 1.6 0.1 0.10
3 1.2
4 1.6
----------2nd comparison------------
1 1.3
2 1.7 1.2 0.5
3 1.2 1.6 -0.4 0.05
4 1.6
----------3rd comparison------------
1 1.3
2 1.7
3 1.2 1.2 0.0
4 1.6 1.6 0.0 0.00 <- minimum difference
What about something like this:
require(zoo)
S1 <- c(1.3,1.7,1.2,1.6)
S2 <- c(1.2,1.6)
We can use rollapply to apply a function rolling along a vector. The width is set at the size of the smaller comparison vector. We then use an anonymous function to pass the values from our large vector, S1, as the variable x from which we then subtract the values from the small vector and take the mean. We can then use min to return the smallest value:
> min( rollapply( S1 , width = 2 , function(x) mean(x-S2) ) )
[1] 0
It's hard to make it more generalisable without the structure of your data

Resources