Suppose I have the following data:
a<- c(1:10)
b<- c(10:1)
Now I want to make a Consecutive calculation (of variable length in this example 2)on both rows (a and b) and save the output in two separate lists(a and b).
The calculation should look like the following:
for a:
(1+2)/2; (2+3)/2; (3+4)/2;...; (9+10)/2
for b(the same):
(10+9)/2; (9+8)/2; (8+7)/2;...;(2+1)/2
a
1,5 2,5 3,5 ... 9,5
b
9,5 8,5 7,5 ... 1,5
I found this function in StackOverflow:
v <- c(1, 2, 3, 10, 20, 30)
grp <- 3
res <- sapply(1:(length(v)-grp+1),function(x){sum(v[x:(x+grp-1)])})
Which pretty much does what i need but I would prefer a function which does that without using sapply and just base R.
Any Help would be appreciated!
You can do base R:
f = function(x) (head(x,-1) + tail(x,-1))/2
list(a=f(a), b=f(b))
#$a
#[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
#$b
#[1] 9.5 8.5 7.5 6.5 5.5 4.5 3.5 2.5 1.5
Or if you want to use the apply family:
library(zoo)
list(a=rollapply(a,2, mean), b=rollapply(b,2, mean))
sapply is really not recommended but if you want to use it (just for test!):
sapply(1:(length(a)-1), function(i) mean(a[i:(i+1)]))
#[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
#same for b
na.omit(filter(a, c(1,1))/2)
na.omit(filter(b, c(1,1))/2)
You could try this:
d1 <- ((a + a[seq(a)+1])/2)[-length(a)]
#[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
and
d2 <- ((b + b[seq(b)+1])/2)[-length(b)]
#[1] 9.5 8.5 7.5 6.5 5.5 4.5 3.5 2.5 1.5
The last part [-length(a)] and [-length(b)] removes NA entries at the end of the sequence.
If the length of a and b is same
for(i in 1:(length(a) - 1))
{
list1[i] <- (a[i] + a[i+1])/2
list2[i] <- (b[i] + b[i+1])/2
}
> list1
#[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
> list2
#[1] 9.5 8.5 7.5 6.5 5.5 4.5 3.5 2.5 1.5
Or else write two different loops for both
for(i in 1:(length(a) - 1))
{
list1[i] <- ((a[i] + a[i+1])/2)
}
for(i in 1:(length(b) - 1))
{
list2[i] <- ((b[i] + b[i+1])/2)
}
Related
My data frame is based on 0.25 degree dataset and is composed of latitudes, longitudes and relevant temperature. And now I want to change the resolution from 0.25 to 0.5. For example, the latitudes and longitudes of my data frame are 70.5, 70.25, 70, 69.75, 69.5..., and now I just need integer and decimal part 0.5 coordinates like 70.5, 70, 69.5, 69...How can I do that easily?
We can use round_any from plyr:
library(plyr)
unrounded <- c(runif(10)*10)
> unrounded
[1] 9.796907 4.237637 4.758592 1.109172 5.037765 3.077775 7.616236 3.872094
[9] 3.471238 8.831574
rounded <- round_any(unrounded, 0.5)
> rounded
[1] 10.0 4.0 5.0 1.0 5.0 3.0 7.5 4.0 3.5 9.0
As a data.frame you'll have to wrap it back into a data.frame:
unrounded2 <- data.frame(x = c(runif(10)*10))
> unrounded2
x
1 6.1078737
2 1.8496701
3 3.5469245
4 9.7893189
5 0.5503520
6 8.4338650
7 2.5316328
8 0.1954177
9 4.0447613
10 7.9741839
rounded2 <- data.frame(x= round_any(unrounded2$x, 0.5))
> rounded2
x
1 6.0
2 2.0
3 3.5
4 10.0
5 0.5
6 8.5
7 2.5
8 0.0
9 4.0
10 8.0
You can round to 0.5 by first multiplying and then dividing by two.
set.seed(1)
x <- c(runif(10)*10)
x
# [1] 2.6550866 3.7212390 5.7285336 9.0820779 2.0168193 8.9838968 9.4467527 6.6079779 6.2911404 0.6178627
round(x * 2)/2
# [1] 2.5 3.5 5.5 9.0 2.0 9.0 9.5 6.5 6.5 0.5
As part of a data.frame
d <- data.frame(lon=x)
d$lon <- round(d$lon * 2) / 2
I try to write a function that would replace a value in a vector by the same value divided by two.
# replacement function for a vector
rep <- function(x)
x.half <- {abs(replace(x, which(x<0),x/2))}
But, I know that this function does not work properly, because if I simulate a vector with negative and positive numbers I get wrong result:
a <- c(1,-1,2,-2,3,-3,4,-4,5,-5,11,-11,12,-12,13,-13,21,-21,25,-25)
a.rep <- rep(a)
# data frame to test
test <- cbind(a,a.rep)
Also, when I apply this function to vector a, I get a warning message like this:
Warning message:
In replace(x, which(x < 0), x/2) :
number of items to replace is not a multiple of replacement length
Obviously, there is something wrong with my function.
The third argument in replace needs to be the same length as the values being subsetted in the second.
## determine which values are below zero
a0 <- a < 0
## replace them with their halved values
replace(a, a0, a[a0] / 2)
[1] 1.0 -0.5 2.0 -1.0 3.0 -1.5 4.0 -2.0 5.0 -2.5 11.0
[12] -5.5 12.0 -6.0 13.0 -6.5 21.0 -10.5 25.0 -12.5
Although the question has been answered already, I felt challenged to add an arithmetical solution.
The expression
(sign(a) + 3) / 4 * a
will return
# [1] 1.0 -0.5 2.0 -1.0 3.0 -1.5 4.0 -2.0 5.0 -2.5 11.0 -5.5 12.0 -6.0 13.0 -6.5
#[17] 21.0 -10.5 25.0 -12.5
as requested.
How it works
The requirement is that
negative numbers should be multiplied by 1/2,
positive numbers should not be changed.
This can be translated to
if sign(a) == -1 then multiply a by 1/2
if sign(a) == +1 then multiply a by 1
Now, we need to find a linear function y = f(x) = p * x + q which satisfies the equations
f(-1) = -p + q = 1/2
f(1) = p + q = 1.
After solving for p and q we get f(x) = 1/4 * x + 3/4. With x = sign(a), the factor to multiply each element of a with is:
1/4 * sign(a) + 3/4
you could use ifelse():
half_if_neg <- function(x) {ifelse(x < 0, x / 2, x)}
#sapply(a, half_if_neg)
half_if_neg(a)
We can also do the assignment
i1 <- a < 0
a[i1] <- a[i1]/2
Or using
pmax(a, (a*NA^(a >=0))/2, na.rm = TRUE)
#[1] 1.0 -0.5 2.0 -1.0 3.0 -1.5 4.0 -2.0 5.0 -2.5 11.0
#[12] -5.5 12.0 -6.0 13.0 -6.5 21.0 -10.5 25.0 -12.5
In using range referencing I normally expect to see an error or at least a warning message when the operations in '[' ']' do not match the dimensions of the parent object, however I have just discovered that I am not seeing said warnings and errors. Is there a setting for this or a way to force an error? Example:
x = 1:5
y = 10:12
x[y>10]
y[x>2]
likewise this applies to data frames and other R objects:
dat = data.frame(x=runif(100),y=1:100)
dat[sample(c(TRUE,FALSE),23),c(TRUE,FALSE)]
The silent repetition and truncation of the references to match the dimensions of the parent object is unexpected, having used R for years, I've somehow never noticed this before.
I'm using R Console (64-bit) 3.0.1 for Windows (could be updated yes, but I hope this isn't the cause).
Edit: Fixed data.frame example as data.frame's don't allow more column references than columns. Thanks zero323.
You could modify the `[.data.frame` function to throw a warning when indexing with a logical vector that doesn't evenly divide the number of rows:
`[.data.frame` <- function(x, i, j, drop = if (missing(i)) TRUE else length(cols) == 1) {
if (!missing(i) && is.logical(i) && nrow(x) %% length(i) != 0) {
warning("Indexing data frame with logical vector that doesn't evenly divide row count")
}
base::`[.data.frame`(x, i, j, drop)
}
Here's a demonstration with the 150-row iris dataset, passing logical indexing vectors of length 11 (should cause warning) and 15 (should not cause warning):
iris[c(rep(FALSE, 10), TRUE),]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 11 5.4 3.7 1.5 0.2 setosa
# 22 5.1 3.7 1.5 0.4 setosa
# 33 5.2 4.1 1.5 0.1 setosa
# 44 5.0 3.5 1.6 0.6 setosa
# 55 6.5 2.8 4.6 1.5 versicolor
# 66 6.7 3.1 4.4 1.4 versicolor
# 77 6.8 2.8 4.8 1.4 versicolor
# 88 6.3 2.3 4.4 1.3 versicolor
# 99 5.1 2.5 3.0 1.1 versicolor
# 110 7.2 3.6 6.1 2.5 virginica
# 121 6.9 3.2 5.7 2.3 virginica
# 132 7.9 3.8 6.4 2.0 virginica
# 143 5.8 2.7 5.1 1.9 virginica
# Warning message:
# In `[.data.frame`(iris, c(rep(FALSE, 10), TRUE), ) :
# Indexing data frame with logical vector that doesn't evenly divide number of rows
iris[c(rep(FALSE, 14), TRUE),]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 15 5.8 4.0 1.2 0.2 setosa
# 30 4.7 3.2 1.6 0.2 setosa
# 45 5.1 3.8 1.9 0.4 setosa
# 60 5.2 2.7 3.9 1.4 versicolor
# 75 6.4 2.9 4.3 1.3 versicolor
# 90 5.5 2.5 4.0 1.3 versicolor
# 105 6.5 3.0 5.8 2.2 virginica
# 120 6.0 2.2 5.0 1.5 virginica
# 135 6.1 2.6 5.6 1.4 virginica
# 150 5.9 3.0 5.1 1.8 virginica
Expanding on #josilber I've written the following for atomic vector and matrix subsetting in case anyone else wants it:
`[` <- function(x, i) {
if(!missin
g(i) && is.logical(i) && (length(x) %% length(i) != 0 || length(i) > length(x))) {
warning("Indexing atomic vector with logical vector that doesn't evenly divide row count")
}
base::`[`(x,i)
}
`[` <- function(x,i,j,...,drop=TRUE) {
if (!missing(i) && is.logical(i) && nrow(x) %% length(i) != 0) {
warning("Indexing matrix with logical vector that doesn't evenly divide row count")
}
if (!missing(j) && is.logical(j) && nrow(x) %% length(j) != 0) {
warning("Indexing matrix with logical vector that doesn't evenly divide column count")
}
base::`[`(x,i,j,...,drop)
}
Testing my original example afterwards with this modification now produces the warning and other operations behave as per normal:
> x =
1:5
> y = 10:12
> x[y>10]
[1] 2 3 5
Warning message:
In x[y > 10] :
Indexing atomic vector with logical vector that doesn't evenly divide row count
> y[x>2]
[1] 12 NA NA
Warning message:
In y[x > 2] :
Indexing atomic vector with logical vector that doesn't evenly divide row count
> x[x>2]
[1] 3 4 5
> x[1:2]
[1] 1 2
I have a 207x7 xts object (called temp). I have a 207x3 matrix (called ac.topn), each row of which contains the columns I'd like from the corresponding row in the xts object.
For example, given the following top two rows of temp and ac.topn,
temp
v1 v2 v3 v4 v5 v6 v7
1997-09-30 14.5 8.7 -5.8 2.6 4.7 1.9 17.2
1997-10-31 6.0 -2.0 -25.7 2.9 4.9 9.6 8.4
head(ac.topn)
Rank1 Rank2 Rank3
1997-09-30 7 4 2
1997-10-31 6 5 7
I would like to get the result:
1997-09-30 17.2 2.6 8.7 (elements 7, 4, and 2 from the first row of temp)
1997-10-31 9.6 4.9 8.4 (elements 6, 5, 7 from the second row of temp)
My first attempt was temp[,ac.topn]. I've browsed for help, but am struggling to word my request effectively.
Thank you.
Well, this works, but I've got to think there's a better way...
result <- do.call(rbind,lapply(index(temp),function(i)temp[i,ac.topn[i]]))
colnames(result) <- colnames(as.topn)
result
# Rank1 Rank2 Rank3
# 1997-09-30 17.2 2.6 8.7
# 1997-10-31 9.6 4.9 8.4
You may subset a matrix version of the xts object, using indexing via a numeric matrix:
m <- as.matrix(temp)
cols <- as.vector(ac.topn)
rows <- rep(1:nrow(ac.topn), ncol(ac.topn))
vals <- m[cbind(rows, cols)]
xts(x = matrix(vals, nrow = nrow(temp)), order.by = index(temp))
# [,1] [,2] [,3]
# 1997-09-30 17.2 2.6 8.7
# 1997-10-31 9.6 4.9 8.4
However, I say the same as #jlhoward: I've got to think there's a better way...
I have a data frame laid out in the follwing manner:
Species Trait.p Trait.y Trait.z
a 20.1 7.2 14.1
b 20.4 8.3 15.2
b 19.2 6.8 13.9
I would like to apply, for each species combination, (Xa) - (Xb) where is X is the trait value and the letter is the species and Xa > Xb. I.e has to be such that the larger value of each respective species combination has to come first, calculated for every trait
Would this be a multi-step process?
An example output could be
Combination Trait.p Trait.y Trait.z
a/b 0.3 1.1 1.1
I assumed you choose the largest value but David brings up a good point. I doubt this is the best approach but I think it gives you what you're after. Note I added a c as I'm sure your problem is a bit more complex that just a and b:
dat <- read.table(text="Species Trait.p Trait.y Trait.z
a 20.1 7.2 14.1
b 20.4 8.3 15.2
b 19.2 6.8 13.9
c 14.2 3.8 11.9", header=T)
li <- lapply(split(dat, dat$Species), function(x) apply(x[, -1], 2, max))
com <- expand.grid(names(li), names(li))
inds <- com[com[, 1] != com[, 2], ]
inds <- t(apply(inds, 1, sort))
inds <- inds[!duplicated(inds), ]
ans <- lapply(1:nrow(inds), function(i) {
abs(li[[inds[i, 1]]]-li[[inds[i, 2]]])
})
cbind(Combination = paste(inds[, 1], inds[, 2], sep="/"),
as.data.frame(do.call(rbind, ans)))
This gives us:
Combination Trait.p Trait.y Trait.z
1 a/b 0.3 1.1 1.1
2 a/c 5.9 3.4 2.2
3 b/c 6.2 4.5 3.3
Sorry for the lack of annotation but I'm heading to class.