I'm looking for an easy way to add the minimum value for each column inside my dataframe.
This feels like a common thing, but I haven't been able to find any good answers yet...maybe I'm missing something obvious.
Let's say I've got two columns (in reality I have close to 100) with positive and negative numbers.
w <- c(9, 9, 9, 9)
x <- c(-2, 0, 1, 3)
y <- c(-1, 1, 3, 4)
z <- as.data.frame(cbind(w, x, y))
w x y
1 9 -2 -1
2 9 0 1
3 9 1 3
4 9 3 4
I want z to look like this after a transformation for only x and y columns [,2:3]
w x y
1 9 0 0
2 9 2 2
3 9 3 4
4 9 5 5
Does that make sense?
library(dplyr)
dplyr::mutate(z, across(c(x, y), ~ . + abs(min(.))))
w x y
1 9 0 0
2 9 2 2
3 9 3 4
4 9 5 5
You can also do by column position rather than column name by changing c(x,y) to 2:3 or c(2:3, 5) for non-sequential column positions.
Depends exactly what you mean and what you want to happen if there aren't negative values. No matter the values, this will anchor the minimum at 0, but you should be able to adapt it if you want something slightly different.
z[] = lapply(z, function(col) col - min(col))
z
# x y
# 1 0 0
# 2 2 2
# 3 3 4
# 4 5 5
As a side note, as.data.frame(cbind(x, y)) is bad - if you have a mix of numeric and character values, cbind() will convert everything to character. It's shorter and better to simplify to data.frame(x, y).
Do you want
z[] <- lapply(z, function(columnValues) columnValues + abs(min(columnValues)))
Related
I followed this example to do a rolling mean rollmin in R similar to zoo package rollmax
But the first few are filled with NA's. How can I fill the NA's with the original value so that I don't lose datapoints?
We may use coalesce with the original vector to replace the NA with that corresponding non-NA element from original vector
library(dplyr)
library(zoo)
coalesce(rollmeanr(x, 3, fill = NA), x)
If it is a data.frame
ctd %>%
group_by(station) %>%
mutate(roll_mean_beam = coalesce(rollmeanr(beam_coef,
k = 5, fill = NA), beam_coef))
data
x <- 1:10
1) Using the original values seems a bit bizarre. Taking the rolling minimum of 1:10 using a width of 3 would give
1 2 1 2 3 4 5 6 7 8
I think what you really want is to apply min to however many points are available so that in this example we get
1 1 1 2 3 4 5 6 7 8
Now rollapplyr with partial=TRUE will use whatever number of points are available if fewer than width=3 exist at that point. At the first point only one point is available so it returns min(x[1]). At the second only two points are available so it returns min(x[1:2]). For all the rest it can use three points. Only zoo is used.
library(zoo)
x <- 1:10
rollapplyr(x, 3, min, partial = TRUE)
## [1] 1 1 1 2 3 4 5 6 7 8
2) The above seems more logical than filling the first two points with the first two input values but if you really wanted to do that anyways then simply prefix the series with the original values using c or use one of the other alternatives shown below. Only zoo is used.
c(x[1:2], rollapplyr(x, 3, min))
## [1] 1 2 1 2 3 4 5 6 7 8
pmin(rollapplyr(x, 3, min, fill = max(x)), x)
## [1] 1 2 1 2 3 4 5 6 7 8
replace(rollapplyr(x, 3, min, fill = NA), 1:2, x[1:2])
## [1] 1 2 1 2 3 4 5 6 7 8
Min <- function(x) if (length(x) < 3) tail(x, 1) else min(x)
rollapplyr(x, 3, Min, partial = TRUE)
## [1] 1 2 1 2 3 4 5 6 7 8
How can I push one step down or one row down to my data and replacing the first row of the third column to NA and removing the last row of the third column using r?
I want the following data:
x y z
1 2 3
4 5 6
7 8 9
to
x y z
1 2 NA
4 5 3
7 8 6
the code:
ave(data, data$z, FUN = function(x) c(diff(x), NA)
, gives me the difference, not the way I want.
In base R you could do:
transform(df, z = c(NA, head(z, -1)))
x y z
1 1 2 NA
2 4 5 3
3 7 8 6
You could also do:
library(tidyverse)
mutate(df, z = lag(z))
Given a random integer vector below:
z <- c(3, 2, 4, 2, 1)
I'd like to create a new vector that contains all z's indices a number of times specified by the value corresponding to that element of z. To illustrate this. The desired result in this case should be:
[1] 1 1 1 2 2 3 3 3 3 4 4 5
There must be a simple way to do this.
You can use rep and seq to repeat the indices of a vector based on the values of that same vector. seq to get the indices and rep to repeat them.
rep(seq(z), z)
# [1] 1 1 1 2 2 3 3 3 3 4 4 5
Starting with all the indices of the vector z. These are given by:
1:length(z)
Then these elements should be repeated. The number of times these numbers should be repeated is specified by the values of z. This can be done using a combination of the lapply or sapply function and the rep function:
unlist(lapply(X = 1:length(z), FUN = function(x) rep(x = x, times = z[x])))
[1] 1 1 1 2 2 3 3 3 3 4 4 5
unlist(sapply(X = 1:length(z), FUN = function(x) rep(x = x, times = z[x])))
[1] 1 1 1 2 2 3 3 3 3 4 4 5
Both alternatives give the same result.
I've read most of the similar questions here, but I'm still having a hard time understanding how passing arguments in the order function break ties.
The example introduced in the R documentation shows that :
order(x <- c(1,1,3:1,1:4,3), y <- c(9,9:1), z <- c(2,1:9))
returns
[1] 6 5 2 1 7 4 10 8 3 9
However, what does it mean when y is 'breaking ties' of x, and z 'breaking ties' of y? the x vector is:
[1] 1 1 3 2 1 1 2 3 4 3
and the y vector is:
[1] 9 9 8 7 6 5 4 3 2 1
Also, if I eliminate z from the first function,
order(x <- c(1,1,3:1,1:4,3), y <- c(9,9:1))
it returns :
[1] 6 5 1 2 7 4 10 8 3 9
so I'm unclear how the numbers in the y vector are relevant with ordering the four 1s, the two 2s, and the three 3s in x. I would very much appreciate the help. Thanks!
Let's take a look at
idx <- order(x <- c(1,1,3:1,1:4,3), y <- c(9,9:1), z <- c(2,1:9))
idx;
#[1] 6 5 2 1 7 4 10 8 3 9
First thing to note is that
x[idx]
# [1] 1 1 1 1 2 2 3 3 3 4
So idx orders entries in x from smallest to largest values.
Values in y and z affect how order treats ties in x.
Take entries x[5] = 1 and x[6] = 1. Since there is a tie here, order looks up entries at the corresponding positions in y, i.e. y[5] = 6 and y[6] = 5. Since y[6] < y[5], the entries in x are sorted x[6] < x[5].
If there is a tie in y as well, order will look up entries in the next vector z. This happens for x[1] = 1 and x[2] = 2, where both y[1] = 9 and y[2] = 9. Here z breaks the tie because z[2] = 1 < z[1] = 2 and therefore x[2] < x[1].
I have these numbers:
login.day$wday
[1] 5 6 7 1 2 3 4
and I want to map them to:
login.day$wday
[1] 4 5 6 7 1 2 3
Each number is subtracted by 1, and if the answer is 0, wrap it around back to 7. This is embarrassingly simple but I just can't figure it out. My attempt keeps giving me a zero:
> (login.day$wday + 6) %% 7
[1] 4 5 6 0 1 2 3
Prefer solution in R. Is it possible to do with modulo arithmetic or must I use an if statement?
Mathematically equivalent to the other solution, and with some explanation.
(login.day$wday - 1 - 1) %% 7 + 1
The problem is that it is hard to do modular arithmetic with numbers starting at 1.
We start by doing -1 to shift everything down by 1, so we have a zero-based numbers ranging from [0,6].
We then subtract 1, because that is what we were trying to do to begin with.
Next, we take the modulus, and add 1 back to shift everything back up to the range [1,7].
(login.day$wday + 5) %% 7 + 1
perhaps?
The boundary conditions are 7 -> 6, 1 -> 7 and 2 -> 1.
The result had to involve %% 7 as you so rightly spotted.
And since the last of these boundary conditions results in 1, then we need to add 1 after doing the modulo, and reduce the number added before the modulo by 1.
I have a silly function I've written called shift that does this:
shift <- function(x = 1:10, n = 1) {
if (n == 0) x <- x
else x <- c(tail(x, -n), head(x, n))
x
}
x <- c(5, 6, 7, 1, 2, 3, 4)
shift(x, -1)
# [1] 4 5 6 7 1 2 3
shift(x, -2)
# [1] 3 4 5 6 7 1 2
The use I had in mind for this was something like the following:
set.seed(1)
X <- sample(7, 20, TRUE)
X
# [1] 2 3 5 7 2 7 7 5 5 1 2 2 5 3 6 4 6 7 3 6
shift(sort(unique(X)), -1)[X]
# [1] 1 2 4 6 1 6 6 4 4 7 1 1 4 2 5 3 5 6 2 5
I like the solution of #merlin2011 but just to add to the options here is a lookup table approach:
c(7, 1:6)[login.day$wday]