TIdyverse mutate using value from column to reference another column value in perhaps a different row - r

I have created a tibble thus:
library(tidyverse)
set.seed(68)
a <- c(1, 2, 3, 4, 5)
b <- runif(5)
c <- c(1, 3, 3, 3, 1)
tib <- tibble(a, b, c)
which produces this
tib
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 1 0.924 1
2 2 0.661 3
3 3 0.402 3
4 4 0.637 3
5 5 0.353 1
I would like to add another column, d, which is the value of b according to the a value given in column c. The resulting data frame should look thus:
a b c d
<dbl> <dbl> <dbl> <dbl>
1 1 0.924 1 0.924
2 2 0.661 3 0.402
3 3 0.402 3 0.402
4 4 0.637 3 0.402
5 5 0.353 1 0.924
Thanks for looking!

Use c to index the desired row of b:
tib %>% mutate(d = b[c])
a b c d
<dbl> <dbl> <dbl> <dbl>
1 1 0.924 1 0.924
2 2 0.661 3 0.402
3 3 0.402 3 0.402
4 4 0.637 3 0.402
5 5 0.353 1 0.924

Related

Display Image in R

I have a 2D matrix of pixel coordinates ($n \times 2$ matrix) and a 3 dimensional ($n \times 3$ matrix) of RGB pixel values. Does anyone know of a function in R that would be able to generate the image (similar to imshow() in Python for example)?
I have googled and looked on this platform but I haven't found anything similar - I apologize if I've missed something obvious. Thanks for any help you can provide!
The imager package can help with this!
1. Prepare data for a demonstration
require( tidyr )
require( imager )
# example data from `imager` package
dat <- boats
# plot( boats )
# tall
dat_original <- as.data.frame( dat )
head( dat_original )
# x y cc value
#1 1 1 1 0.3882
#2 2 1 1 0.3859
#3 3 1 1 0.3849
#4 4 1 1 0.3852
#5 5 1 1 0.3860
#6 6 1 1 0.3856
dat_wide <- pivot_wider( dat_original, names_from = 'cc', values_from = 'value' )
head( dat_wide )
# A tibble: 6 × 5
# x y `1` `2` `3`
# <int> <int> <dbl> <dbl> <dbl>
#1 1 1 0.388 0.388 0.388
#2 2 1 0.386 0.386 0.386
#3 3 1 0.385 0.385 0.385
#4 4 1 0.385 0.385 0.385
#5 5 1 0.386 0.386 0.386
#6 6 1 0.386 0.386 0.386
2. Demonstrate
( You will need to shape your data from M2d, M3d to five columns wide so it looks like this ):
For example, you might use:
YOUR_DATA <- data.frame(
x = M2d$x
, y = M2d$y
, R = M3d$red
, G = M3d$green
, B = M3d$blue
)
head( YOUR_DATA )
# A tibble: 6 × 5
# x y R G B
# <int> <int> <dbl> <dbl> <dbl>
#1 1 1 0.388 0.388 0.388
#2 2 1 0.386 0.386 0.386
#3 3 1 0.385 0.385 0.385
#4 4 1 0.385 0.385 0.385
#5 5 1 0.386 0.386 0.386
#6 6 1 0.386 0.386 0.386
Then make your data tall
dat_tall <- pivot_longer(
YOUR_DATA
, cols= c( 'R','G','B' )
, names_to = 'cc'
, values_to = 'value'
)
head(dat_tall)
## A tibble: 6 × 4
# x y cc value
# <int> <int> <chr> <dbl>
#1 1 1 1 0.388
#2 1 1 2 0.388
#3 1 1 3 0.388
#4 2 1 1 0.386
#5 2 1 2 0.386
#6 2 1 3 0.386
And fix column 'cc' (it must be an integer, not a character)
dat_tall$cc %<>% as.integer
## A tibble: 6 × 4
# x y cc value
# <int> <int> <int> <dbl>
#1 1 1 1 0.388
#2 1 1 2 0.388
#3 1 1 3 0.388
#4 2 1 1 0.386
#5 2 1 2 0.386
#6 2 1 3 0.386
Finally, make an image
image_data <- dat_tall
as.cimg( image_data ) %>% plot

Flag run-length of grouped intervals

I have a dataframe grouped by grp:
df <- data.frame(
v = rnorm(25),
grp = c(rep("A",10), rep("B",15)),
size = 2)
I want to flag the run-length of intervals determined by size. For example, for grp == "A", size is 2, and the number of rows is 10. So the interval should have length 10/2 = 5. This code, however, creates intervals with length 2:
df %>%
group_by(grp) %>%
mutate(
interval = (row_number() -1) %/% size)
# A tibble: 25 × 4
# Groups: grp [2]
v grp size interval
<dbl> <chr> <dbl> <dbl>
1 -0.166 A 2 0
2 -1.12 A 2 0
3 0.941 A 2 1
4 -0.913 A 2 1
5 0.486 A 2 2
6 -1.80 A 2 2
7 -0.370 A 2 3
8 -0.209 A 2 3
9 -0.661 A 2 4
10 -0.177 A 2 4
# … with 15 more rows
How can I flag the correct run-length of the size-determined intervals? The desired output is this:
# A tibble: 25 × 4
# Groups: grp [2]
v grp size interval
<dbl> <chr> <dbl> <dbl>
1 -0.166 A 2 0
2 -1.12 A 2 0
3 0.941 A 2 0
4 -0.913 A 2 0
5 0.486 A 2 0
6 -1.80 A 2 1
7 -0.370 A 2 1
8 -0.209 A 2 1
9 -0.661 A 2 1
10 -0.177 A 2 1
# … with 15 more rows
If I interpreted your question correctly, this small change should do the trick?
df %>%
group_by(grp) %>%
mutate(
interval = (row_number() -1) %/% (n()/size))
You can use gl:
df %>%
group_by(grp) %>%
mutate(interval = gl(first(size), ceiling(n() / first(size)))[1:n()])
output
# A tibble: 26 × 4
# Groups: grp [2]
v grp size interval
<dbl> <chr> <dbl> <fct>
1 -1.12 A 2 1
2 3.04 A 2 1
3 0.235 A 2 1
4 -0.0333 A 2 1
5 -2.73 A 2 1
6 -0.0998 A 2 1
7 0.976 A 2 2
8 0.414 A 2 2
9 0.912 A 2 2
10 1.98 A 2 2
11 1.17 A 2 2
12 -0.509 B 2 1
13 0.704 B 2 1
14 -0.198 B 2 1
15 -0.538 B 2 1
16 -2.86 B 2 1
17 -0.790 B 2 1
18 0.488 B 2 1
19 2.17 B 2 1
20 0.501 B 2 2
21 0.620 B 2 2
22 -0.966 B 2 2
23 0.163 B 2 2
24 -2.08 B 2 2
25 0.485 B 2 2
26 0.697 B 2 2

Create a column that takes the first value of another column and subsequent values are the scaler of the prior value

I am trying to create a new column called g_it in a grouped data frame where the first value for each group will be the initial value in the column called exp and the subsequent values are (1 - 0.1) * lag(g_it) + exp.
I believe purrr:accumulate is what I'm looking for but I'm not sure how to set it up.
My data is:
structure(list(group = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3,
4, 4, 4, 4), exp = c(0.493735461892577, 0.501836433242221, 0.4916437138759,
0.515952808021378, 0.503295077718154, 0.49179531615882, 0.504874290524285,
0.507383247051292, 0.505757813516535, 0.496946116128436, 0.515117811684508,
0.503898432364114, 0.493787594194582, 0.477853001128225, 0.511249309181431,
0.499550663909848)), class = "data.frame", row.names = c(NA,
-16L))
Expected output:
group exp g_it
1 0.4937355 0.4937355
1 0.5018364 0.94619835
1 0.4916437 1.343222215
1 0.5159528 1.724852794
2 0.5032951 0.5032951
2 0.4917953 0.94476089
2 0.5048743 1.355159101
2 0.5073832 1.727026391
3 0.5057578 0.5057578
3 0.4969461 0.95212812
3 0.5151178 1.372033108
3 0.5038984 1.738728197
4 0.4937876 0.4937876
4 0.477853 0.92226184
4 0.5112493 1.341284956
4 0.4995507 1.70670716
If you provide a function to accumulate with the ~ syntax, .x is the "accumulated" (previous) value and .y is the "next" value.
df <- structure(list(group = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3,
4, 4, 4, 4), exp = c(0.493735461892577, 0.501836433242221, 0.4916437138759,
0.515952808021378, 0.503295077718154, 0.49179531615882, 0.504874290524285,
0.507383247051292, 0.505757813516535, 0.496946116128436, 0.515117811684508,
0.503898432364114, 0.493787594194582, 0.477853001128225, 0.511249309181431,
0.499550663909848)), class = "data.frame", row.names = c(NA,
-16L))
library(dplyr, warn.conflicts = F)
library(purrr)
df %>%
group_by(group) %>%
mutate(g_it = accumulate(exp, ~ (1 - 0.1)*.x + .y))
#> # A tibble: 16 × 3
#> # Groups: group [4]
#> group exp g_it
#> <dbl> <dbl> <dbl>
#> 1 1 0.494 0.494
#> 2 1 0.502 0.946
#> 3 1 0.492 1.34
#> 4 1 0.516 1.72
#> 5 2 0.503 0.503
#> 6 2 0.492 0.945
#> 7 2 0.505 1.36
#> 8 2 0.507 1.73
#> 9 3 0.506 0.506
#> 10 3 0.497 0.952
#> 11 3 0.515 1.37
#> 12 3 0.504 1.74
#> 13 4 0.494 0.494
#> 14 4 0.478 0.922
#> 15 4 0.511 1.34
#> 16 4 0.500 1.71
Created on 2022-01-10 by the reprex package (v2.0.1)
A base R option using ave + Reduce
transform(
df,
g_it = ave(
exp,
group,
FUN = function(v) {
Reduce(
function(x, y) 0.9 * x + y,
v,
accumulate = TRUE
)
}
)
)
gives
group exp g_it
1 1 0.4937355 0.4937355
2 1 0.5018364 0.9461983
3 1 0.4916437 1.3432222
4 1 0.5159528 1.7248528
5 2 0.5032951 0.5032951
6 2 0.4917953 0.9447609
7 2 0.5048743 1.3551591
8 2 0.5073832 1.7270264
9 3 0.5057578 0.5057578
10 3 0.4969461 0.9521281
11 3 0.5151178 1.3720331
12 3 0.5038984 1.7387283
13 4 0.4937876 0.4937876
14 4 0.4778530 0.9222618
15 4 0.5112493 1.3412850
16 4 0.4995507 1.7067071
Another possible solution, based only on dplyr and cumsum:
library(dplyr)
df %>%
group_by(group) %>%
mutate(g_it = cumsum((1 - 0.1)^(row_number() - 1) * exp)) %>% ungroup
#> # A tibble: 16 × 3
#> group exp g_it
#> <dbl> <dbl> <dbl>
#> 1 1 0.494 0.494
#> 2 1 0.502 0.945
#> 3 1 0.492 1.34
#> 4 1 0.516 1.72
#> 5 2 0.503 0.503
#> 6 2 0.492 0.946
#> 7 2 0.505 1.35
#> 8 2 0.507 1.72
#> 9 3 0.506 0.506
#> 10 3 0.497 0.953
#> 11 3 0.515 1.37
#> 12 3 0.504 1.74
#> 13 4 0.494 0.494
#> 14 4 0.478 0.924
#> 15 4 0.511 1.34
#> 16 4 0.500 1.70

Function that compares one column values against all other column values and returns matching one in R

So let's say I have two data frames
df1 <- data.frame(n = rep(n = 2,c(0,1,2,3,4)), nn =c(rep(x = 1, 5), rep(x=2, 5)),
y = rnorm(10), z = rnorm(10))
df2 <- data.frame(x = rnorm(20))
Here is the first df:
> head(df1)
n nn y z
1 0 1 1.5683647 0.48934096
2 1 1 1.2967556 -0.77891030
3 2 1 -0.2375963 1.74355935
4 3 1 -1.2241501 -0.07838729
5 4 1 -0.3278127 -0.97555379
6 0 2 -2.4124503 0.07065982
Here is the second df:
x
1 -0.4884289
2 0.9362939
3 -1.0624084
4 -0.9838209
5 0.4242479
6 -0.4513135
I'd like to substact x column values of df2 from z column values of df1. And return the rows of both dataframes for which the substracted value is approximately equal to that of y value of df1.
Is there a way to construct such function, so that I could imply the approximation to which the values should be equal?
So, that it's clear, I'd like to substract all x values from all z values and then compare the value to y column value of df1, and check if there is approximately matching value to y.
Here's an approach where I match every row of df1 with every row of df2, then take x and y from z (as implied by your logic of comparing z-x to y; this is the same as comparing z-x-y to zero). Finally, I look at each row of df1 and keep the match with the lowest absolute difference.
library(dplyr)
left_join(
df1 %>% mutate(dummy = 1, row = row_number()),
df2 %>% mutate(dummy = 1, row = row_number()), by = "dummy") %>%
mutate(diff = z - x - y) %>%
group_by(row.x) %>%
slice_min(abs(diff)) %>%
ungroup()
Result (I used set.seed(42) before generating df1+df2.)
# A tibble: 10 x 9
n nn y z dummy row.x x row.y diff
<dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int> <dbl>
1 0 1 1.37 1.30 1 1 0.0361 20 -0.102
2 1 1 -0.565 2.29 1 2 1.90 5 0.956
3 2 1 0.363 -1.39 1 3 -1.76 8 0.0112
4 3 1 0.633 -0.279 1 4 -0.851 18 -0.0607
5 4 1 0.404 -0.133 1 5 -0.609 14 0.0713
6 0 2 -0.106 0.636 1 6 0.705 12 0.0372
7 1 2 1.51 -0.284 1 7 -1.78 2 -0.0145
8 2 2 -0.0947 -2.66 1 8 -2.41 19 -0.148
9 3 2 2.02 -2.44 1 9 -2.41 19 -2.04
10 4 2 -0.0627 1.32 1 10 1.21 4 0.168

Creating a new column using the previous value of a different column and the previous value of itself

how can I create a new column which starting value is 1 and the following values are a multiplication of the previous value of a column (b) and the previous value of itself (d)?
these data are only made up, but have the structure of my data:
> a <- rep(1:10, 3)
> b <- runif(30)
> c <- tibble(a,b)
> c
# A tibble: 30 x 2
a b
<int> <dbl>
1 1 0.945
2 2 0.280
3 3 0.464
4 4 0.245
5 5 0.917
6 6 0.913
7 7 0.144
8 8 0.481
9 9 0.873
10 10 0.754
# ... with 20 more rows
Then I try to calculate column d:
> c <- c %>%
+ group_by(a) %>%
+ mutate(d = accumulate(lag(b, k = 1), `*`, .init = 1))
and it should look like this
# A tibble: 30 x 3
# Groups: a [10]
a b d
<int> <dbl> <dbl>
1 1 0.945 1 <--- b[1] * d[1] = d[2]
2 2 0.280 0.945
3 3 0.464 0.265
4 4 0.245 0.123
5 5 0.917 0.03
#...
But instead I am getting this error message.
Fehler: Column `d` must be length 3 (the group size) or one, not 4
The problem is that when you initialize accumulate with .init = that adds an extra first element of the vector.
You could try this:
library(dplyr)
library(purrr)
c %>%
group_by(a) %>%
mutate(d = accumulate(b[(2:length(b))-1], `*`,.init=1)) %>%
arrange(a)
# a b d
# <int> <dbl> <dbl>
# 1 1 0.266 1
# 2 1 0.206 0.266
# 3 1 0.935 0.0547
# 4 2 0.372 1
# 5 2 0.177 0.372
# … with 25 more rows
Data
library(tibble)
set.seed(1)
a <- rep(1:10, 3)
b <- runif(30)
c <- tibble(a,b)
Using dplyr, I would do this:
c %>%
mutate(d = 1*accumulate(.x = b[-length(b)],
.init = 1,
.f = `*`))
# # A tibble: 30 x 3
# a b d
# <int> <dbl> <dbl>
# 1 1 0.562 1
# 2 2 0.668 0.562
# 3 3 0.100 0.375
# 4 4 0.242 0.0376
# 5 5 0.0646 0.00907
# 6 6 0.373 0.000586
# 7 7 0.664 0.000219
# 8 8 0.915 0.000145
# 9 9 0.848 0.000133
# 10 10 0.952 0.000113
# # ... with 20 more rows

Resources