Return attributes of pmax function output

Return attributes of pmax function output - r

I have the following numeric vectors x and y
x <- c(a=1,b=2,c=3)
y <- c(d=2,e=1,f=4)
I want to find the parallel maximum of each elements in the vectors, so I used:
> pmax(x,y)
a b c
2 2 4
The output has the right values, however, it returns the wrong names. The documentation for pmax mentions that it returns the attributes of the first argument, hence the a b c. Is there a way of getting the names of the maximum values? The desired output is as follow:
d b f
2 2 4

One option would be using max.col for finding the index of the maximum value per each row. For that, we need to create a matrix/data.frame by cbinding the vectors ('xy') and its names ('nmxy'). Create a row/column index ('ij') and subset the elements of 'xy' and set the names from 'nmxy'.
xy <- cbind(x,y)
nmxy <- cbind(names(x), names(y))
ij <- cbind(1:nrow(xy), max.col(xy))
setNames(xy[ij], nmxy[ij])
# d b f
# 2 2 4

Let
r <- pmax(x,y)
Simply add after the function a rename command
names(r)[y == r] <- names(y)[y == r]
If you want to be fancy, you can overload the pmax function to have the desired output.
old.pmax = pmax
pmax <- function(x,y){
r <- old.pmax(x,y)
names(r)[y == r] <- names(y)[y == r]
return(r)
}

Related

Tables and bins from two vectors in R

As an exercise I was given two samples from a seed called u and v and asked to show how many values are in v but not in u fell into the bins [1,50] and [51,100]. Then I am asked to add a line of code in to confirm my answer using a relational operator (like >) and sum().
I solved the first part:
table(findInterval(setdiff(v,u),c(50))
But for the second part, i don't really get what I need to do; any help is appreciated!
Example:
set.seed(1201)
u = sample(100,100,replace=TRUE)
v = sample(100,100,replace=TRUE)
table(findInterval(setdiff(v,u),c(50)))
Output:
0 1
12 12

If we want to use comparative operators and sum, create a logical vector and get the sum of logical vector
i1 <- v[!v %in% u] > 50
sum(i1)
sum(!i1)
Note: If the OP intended to use only unique values (as in setdiff), then get the unique
i1 <- unique(v[!v %in% u]) > 50
out1 <- sum(i1)
out2 <- sum(!i1)
-checking with the output of table
tbl1 <- table(findInterval(setdiff(v,u),c(50)))
all.equal(as.numeric(tbl1), c(out1, out2), check.attributes = FALSE)
#[1] TRUE

Since there is only one number that you are cutting the intervals in, you can verify your answer using > directly.
This is your code
set.seed(1201)
u = sample(100,100,replace=TRUE)
v = sample(100,100,replace=TRUE)
table(findInterval(setdiff(v,u),50))
#0 1
#9 9
Without findInterval
table(setdiff(v,u) > 50)
#FALSE TRUE
# 9 9

Define function that takes arguments from multiple vectors sequentially

See how addition works over components:
a<-1:3
a+a #Gives (1+1), (2+2), (3+3)
I've considered using loops over argument lengths or transforming them into a data.frame and then using apply but I have the intuition there is a more efficient way of going about this.
Specifically, I'd like to calculate the mean of each set of components ignoring zero values, like so:
function(x) {
mean(x[x!=0])
}
Except x would be the i-th components of an arbitrary amount of arguments.

If I understand correctly, mapply or its wrapper Map would work fairly well here.
mapply(function(...) {temp <- c(...); mean(temp[temp != 0])}, 1:10, 11:20)
[1] 6 7 8 9 10 11 12 13 14 15
With mapply, the given function is applied to the collection of the first elements of each vector, then the collection of the second elements and so on. The function creates a new vector with c and then calculates the mean for all non-zero elements. This function returns an atomic vector.
Map(function(...) {temp <- c(...); mean(temp[temp != 0])}, 1:10, 11:20)
returns a list instead. This could be wrapped in unlist to return a vector.

If we need to do this sequentially from multiple vectors
Reduce(`+`, listofvectors)
Or rbind or cbind it to create a matrix and then do the colSums or rowSums
colSums(m1)
Update
Regarding the second part of the question (not clear), if it is to get the mean of individual vectors in a list excluding the 0 value
sapply(listofvectors, function(x) mean(x[x!=0]))
Or if we need the mean of sequence of elements in the matrix (created by rbinding the vectors), then replace the 0 values with NA, and get the colMeans with na.rm = TRUE
colMeans(replace(m1, m1==0, NA), na.rm = TRUE)
colMeans(replace(m2, m2==0, NA), na.rm = TRUE)
#[1] 6 7 8 9 10 11 12 13 14 15
NOTE: The colMeans and matrix approach are vectorized. No looping done here
data
a1 <- 1:5
b1 <- 6:10
c1 <- 11:15
listofvectors <- list(a1, b1, c1)
m1 <- rbind(a1, b1, c1)
m2 <- rbind(1:10, 11:20)

R Vectorize: Assign values to each row, diff col based on col index for every row

I have a matrix of A of 40000 rows and 9 cols and a vector B with 40000 items.
Each item in B is a number from 1 to 9. I want to assign the particular column in A corresponding to the item in B with 1.
Right now, I'm using a for loop for it.
for(r in 1:40000){
A[r,B[r]]=1
}
But is there a way to vectorize it ?
Thanks

You could try
A[cbind(1:nrow(A), B)] <- 1
Checking results with the OP's code
for(r in 1:nrow(A1)){
A1[r, B[r]] <- 1
}
identical(A, A1)
#[1] TRUE
Here we use a matrix that we created with cbind. From ?"[":
When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
data
set.seed(24)
A <- matrix(sample(1:40, 25*9, replace=TRUE), ncol=9)
B <- sample(1:9, 25, replace=TRUE)
A1 <- A

Don't understand how apply gets its parameters in r

I am struggling to make my apply() work: I have two dataframes:
from <- c(1,2,3)
to <- c(2,3,4)
df1 <- data.frame(from, to)
long <-c(9,9.2,9.4,9.6)
lat <- c(45,45.2,45.4,45.6)
id <- c(1,2,3,4)
df2 <- data.frame(long, lat, id)
Now I want something like this:
myFunction <- function(arg){
>>> How do I access arg$from and arg$to? <<<<
}
apply(df1,1,myFunction)
In myFunction I need to make some calculations and return a value for each from-to pair. I don't understand how to access parts of the arg, since arg[0] gives me numeric(0) and arg$from just crashes.

The problem is that apply(...) requires a matrix or array as the first argument. If you pass a dataframe, it will coerce that to a matrix. Matrices are 1 indexed, so the upper left element is [1,1], not [0,0]. Also, matrix columns cannot be referenced using the $ notation.
So,
f <- function(x) {
from <- x[1]
to <- x[2]
# do stuff with from and to...
}
apply(df,1,f)
would work.
One other thing to watch out for is that if your dataframe has (other) columns that have character strings, the conversion will make everything character (including the numbers!). This is because, by definition, all elements of a matrix must have the same data type. Your example does not have that problem, though.

Try mapply(). It's a multivariate version of sapply(). For example:
> myFunction <- function(arg1, arg2){
+ return(sum(arg1, arg2))
+ }
>
> mapply(myFunction, df1$from, df1$to)
[1] 3 5 7
You can also use it to make a new variable in your data frame.
> df1$newvar <- mapply(myFunction, df1$from, df1$to)
> df1
from to newvar
1 1 2 3
2 2 3 5
3 3 4 7

Sum object in a column between an interval defined by another dataframe

I am trying to obtain the sum of values of a column (B) based on the interval between two values on another column (A) in a "reference" dataframe (df):
A <- seq(1:10)
B <- c(4,3,5,7,5,7,4,7,3,7)
df <- data.frame(A,B)
I have found two ways of doing this:
y <- sum(subset(df, A < 3 & A >= 1, select = "B"))
> y
[1] 7
and
z <- with(df,sum(df[A<3 & A>=1,"B"]))
> z
[1] 7
However, I would like to do this based on a two vectors of values stored on another dataframe
C <- c(3,7,7)
D <- c(1,1,5)
df2 <- data.frame(C,D)
to obtain a column of y values for each pair of C and D values.
I have created a function:
myfn <- function(c,d) {
y <-sum(subset(df, A < c & A >= d, select = "B"))
return(y)
}
Which works fine with numbers
myfn(3,1)
[1] 7
but not with vectors.
myfn(c=C,d=D)
[1] 19
Warning messages:
1: In A < a :
longer object length is not a multiple of shorter object length
2: In A >= b :
longer object length is not a multiple of shorter object length
> myfn(df2$C,df2$D)
[1] 19
Warning messages:
1: In A < a :
longer object length is not a multiple of shorter object length
2: In A >= b :
longer object length is not a multiple of shorter object length
>
Does anyone have any suggestion about how I could calculate such interval for sequence of values?

Try:
mapply(myfn, C, D)
# [1] 7 31 12
The problem is that your function is not naturally vectorized. You can see that because your return value is a sum of the inputs, and sum is not a vectorized operation.
Beyond that, if you look at myfn, the expression A < c & A >= d doesn't make sense when c and d have more than one value. There, you are comparing each value in df to the corresponding value in your C and D vectors (so first value to first, second to second, etc.), instead of comparing all the values in df to each value in C and D in turn.
By using mapply, I'm basically looping through your function with as arguments a single value from C and D at a time.
Fortunately in your case it turns out that C,D have different number of elements than df, so you actually got a warning. If they were the same length you would not have gotten a warning and you would have gotten a single value answer, instead of the three you are presumably looking for.
There are better ways to do this, but the mapply approach is pretty trivial here and works with your code pretty much as is.

Another way...
is.between <- function(x,vec){
return(x>=min(vec) & x<max(vec))
}
apply(df2,1,function(x){sum(df[is.between(df$A,x),]$B)})
# [1] 7 31 12

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Return attributes of pmax function output - r

Let r <- pmax(x,y) Simply add after the function a rename command names(r)[y == r] <- names(y)[y == r] If you want to be fancy, you can overload the pmax function to have the desired output. old.pmax = pmax pmax <- function(x,y){ r <- old.pmax(x,y) names(r)[y == r] <- names(y)[y == r] return(r) }

Related

Tables and bins from two vectors in R

Define function that takes arguments from multiple vectors sequentially

R Vectorize: Assign values to each row, diff col based on col index for every row

Don't understand how apply gets its parameters in r

Sum object in a column between an interval defined by another dataframe

Categories

Resources