TIdyverse mutate using value from column to reference another column value in perhaps a different row

TIdyverse mutate using value from column to reference another column value in perhaps a different row - r

I have created a tibble thus:
library(tidyverse)
set.seed(68)
a <- c(1, 2, 3, 4, 5)
b <- runif(5)
c <- c(1, 3, 3, 3, 1)
tib <- tibble(a, b, c)
which produces this
tib
# A tibble: 5 x 3
a b c
<dbl> <dbl> <dbl>
1 1 0.924 1
2 2 0.661 3
3 3 0.402 3
4 4 0.637 3
5 5 0.353 1
I would like to add another column, d, which is the value of b according to the a value given in column c. The resulting data frame should look thus:
a b c d
<dbl> <dbl> <dbl> <dbl>
1 1 0.924 1 0.924
2 2 0.661 3 0.402
3 3 0.402 3 0.402
4 4 0.637 3 0.402
5 5 0.353 1 0.924
Thanks for looking!

Use c to index the desired row of b:
tib %>% mutate(d = b[c])
a b c d
<dbl> <dbl> <dbl> <dbl>
1 1 0.924 1 0.924
2 2 0.661 3 0.402
3 3 0.402 3 0.402
4 4 0.637 3 0.402
5 5 0.353 1 0.924

Related

Display Image in R

I have a 2D matrix of pixel coordinates ($n \times 2$ matrix) and a 3 dimensional ($n \times 3$ matrix) of RGB pixel values. Does anyone know of a function in R that would be able to generate the image (similar to imshow() in Python for example)?
I have googled and looked on this platform but I haven't found anything similar - I apologize if I've missed something obvious. Thanks for any help you can provide!

The imager package can help with this!
1. Prepare data for a demonstration
require( tidyr )
require( imager )
# example data from `imager` package
dat <- boats
# plot( boats )
# tall
dat_original <- as.data.frame( dat )
head( dat_original )
# x y cc value
#1 1 1 1 0.3882
#2 2 1 1 0.3859
#3 3 1 1 0.3849
#4 4 1 1 0.3852
#5 5 1 1 0.3860
#6 6 1 1 0.3856
dat_wide <- pivot_wider( dat_original, names_from = 'cc', values_from = 'value' )
head( dat_wide )
# A tibble: 6 × 5
# x y `1` `2` `3`
# <int> <int> <dbl> <dbl> <dbl>
#1 1 1 0.388 0.388 0.388
#2 2 1 0.386 0.386 0.386
#3 3 1 0.385 0.385 0.385
#4 4 1 0.385 0.385 0.385
#5 5 1 0.386 0.386 0.386
#6 6 1 0.386 0.386 0.386
2. Demonstrate
( You will need to shape your data from M2d, M3d to five columns wide so it looks like this ):
For example, you might use:
YOUR_DATA <- data.frame(
x = M2d$x
, y = M2d$y
, R = M3d$red
, G = M3d$green
, B = M3d$blue
)
head( YOUR_DATA )
# A tibble: 6 × 5
# x y R G B
# <int> <int> <dbl> <dbl> <dbl>
#1 1 1 0.388 0.388 0.388
#2 2 1 0.386 0.386 0.386
#3 3 1 0.385 0.385 0.385
#4 4 1 0.385 0.385 0.385
#5 5 1 0.386 0.386 0.386
#6 6 1 0.386 0.386 0.386
Then make your data tall
dat_tall <- pivot_longer(
YOUR_DATA
, cols= c( 'R','G','B' )
, names_to = 'cc'
, values_to = 'value'
)
head(dat_tall)
## A tibble: 6 × 4
# x y cc value
# <int> <int> <chr> <dbl>
#1 1 1 1 0.388
#2 1 1 2 0.388
#3 1 1 3 0.388
#4 2 1 1 0.386
#5 2 1 2 0.386
#6 2 1 3 0.386
And fix column 'cc' (it must be an integer, not a character)
dat_tall$cc %<>% as.integer
## A tibble: 6 × 4
# x y cc value
# <int> <int> <int> <dbl>
#1 1 1 1 0.388
#2 1 1 2 0.388
#3 1 1 3 0.388
#4 2 1 1 0.386
#5 2 1 2 0.386
#6 2 1 3 0.386
Finally, make an image
image_data <- dat_tall
as.cimg( image_data ) %>% plot

Flag run-length of grouped intervals

I have a dataframe grouped by grp:
df <- data.frame(
v = rnorm(25),
grp = c(rep("A",10), rep("B",15)),
size = 2)
I want to flag the run-length of intervals determined by size. For example, for grp == "A", size is 2, and the number of rows is 10. So the interval should have length 10/2 = 5. This code, however, creates intervals with length 2:
df %>%
group_by(grp) %>%
mutate(
interval = (row_number() -1) %/% size)
# A tibble: 25 × 4
# Groups: grp [2]
v grp size interval
<dbl> <chr> <dbl> <dbl>
1 -0.166 A 2 0
2 -1.12 A 2 0
3 0.941 A 2 1
4 -0.913 A 2 1
5 0.486 A 2 2
6 -1.80 A 2 2
7 -0.370 A 2 3
8 -0.209 A 2 3
9 -0.661 A 2 4
10 -0.177 A 2 4
# … with 15 more rows
How can I flag the correct run-length of the size-determined intervals? The desired output is this:
# A tibble: 25 × 4
# Groups: grp [2]
v grp size interval
<dbl> <chr> <dbl> <dbl>
1 -0.166 A 2 0
2 -1.12 A 2 0
3 0.941 A 2 0
4 -0.913 A 2 0
5 0.486 A 2 0
6 -1.80 A 2 1
7 -0.370 A 2 1
8 -0.209 A 2 1
9 -0.661 A 2 1
10 -0.177 A 2 1
# … with 15 more rows

If I interpreted your question correctly, this small change should do the trick?
df %>%
group_by(grp) %>%
mutate(
interval = (row_number() -1) %/% (n()/size))

You can use gl:
df %>%
group_by(grp) %>%
mutate(interval = gl(first(size), ceiling(n() / first(size)))[1:n()])
output
# A tibble: 26 × 4
# Groups: grp [2]
v grp size interval
<dbl> <chr> <dbl> <fct>
1 -1.12 A 2 1
2 3.04 A 2 1
3 0.235 A 2 1
4 -0.0333 A 2 1
5 -2.73 A 2 1
6 -0.0998 A 2 1
7 0.976 A 2 2
8 0.414 A 2 2
9 0.912 A 2 2
10 1.98 A 2 2
11 1.17 A 2 2
12 -0.509 B 2 1
13 0.704 B 2 1
14 -0.198 B 2 1
15 -0.538 B 2 1
16 -2.86 B 2 1
17 -0.790 B 2 1
18 0.488 B 2 1
19 2.17 B 2 1
20 0.501 B 2 2
21 0.620 B 2 2
22 -0.966 B 2 2
23 0.163 B 2 2
24 -2.08 B 2 2
25 0.485 B 2 2
26 0.697 B 2 2

Create a column that takes the first value of another column and subsequent values are the scaler of the prior value

I am trying to create a new column called g_it in a grouped data frame where the first value for each group will be the initial value in the column called exp and the subsequent values are (1 - 0.1) * lag(g_it) + exp.
I believe purrr:accumulate is what I'm looking for but I'm not sure how to set it up.
My data is:
structure(list(group = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3,
4, 4, 4, 4), exp = c(0.493735461892577, 0.501836433242221, 0.4916437138759,
0.515952808021378, 0.503295077718154, 0.49179531615882, 0.504874290524285,
0.507383247051292, 0.505757813516535, 0.496946116128436, 0.515117811684508,
0.503898432364114, 0.493787594194582, 0.477853001128225, 0.511249309181431,
0.499550663909848)), class = "data.frame", row.names = c(NA,
-16L))
Expected output:
group exp g_it
1 0.4937355 0.4937355
1 0.5018364 0.94619835
1 0.4916437 1.343222215
1 0.5159528 1.724852794
2 0.5032951 0.5032951
2 0.4917953 0.94476089
2 0.5048743 1.355159101
2 0.5073832 1.727026391
3 0.5057578 0.5057578
3 0.4969461 0.95212812
3 0.5151178 1.372033108
3 0.5038984 1.738728197
4 0.4937876 0.4937876
4 0.477853 0.92226184
4 0.5112493 1.341284956
4 0.4995507 1.70670716

If you provide a function to accumulate with the ~ syntax, .x is the "accumulated" (previous) value and .y is the "next" value.
df <- structure(list(group = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3,
4, 4, 4, 4), exp = c(0.493735461892577, 0.501836433242221, 0.4916437138759,
0.515952808021378, 0.503295077718154, 0.49179531615882, 0.504874290524285,
0.507383247051292, 0.505757813516535, 0.496946116128436, 0.515117811684508,
0.503898432364114, 0.493787594194582, 0.477853001128225, 0.511249309181431,
0.499550663909848)), class = "data.frame", row.names = c(NA,
-16L))
library(dplyr, warn.conflicts = F)
library(purrr)
df %>%
group_by(group) %>%
mutate(g_it = accumulate(exp, ~ (1 - 0.1)*.x + .y))
#> # A tibble: 16 × 3
#> # Groups: group [4]
#> group exp g_it
#> <dbl> <dbl> <dbl>
#> 1 1 0.494 0.494
#> 2 1 0.502 0.946
#> 3 1 0.492 1.34
#> 4 1 0.516 1.72
#> 5 2 0.503 0.503
#> 6 2 0.492 0.945
#> 7 2 0.505 1.36
#> 8 2 0.507 1.73
#> 9 3 0.506 0.506
#> 10 3 0.497 0.952
#> 11 3 0.515 1.37
#> 12 3 0.504 1.74
#> 13 4 0.494 0.494
#> 14 4 0.478 0.922
#> 15 4 0.511 1.34
#> 16 4 0.500 1.71
Created on 2022-01-10 by the reprex package (v2.0.1)

A base R option using ave + Reduce
transform(
df,
g_it = ave(
exp,
group,
FUN = function(v) {
Reduce(
function(x, y) 0.9 * x + y,
v,
accumulate = TRUE
)
}
)
)
gives
group exp g_it
1 1 0.4937355 0.4937355
2 1 0.5018364 0.9461983
3 1 0.4916437 1.3432222
4 1 0.5159528 1.7248528
5 2 0.5032951 0.5032951
6 2 0.4917953 0.9447609
7 2 0.5048743 1.3551591
8 2 0.5073832 1.7270264
9 3 0.5057578 0.5057578
10 3 0.4969461 0.9521281
11 3 0.5151178 1.3720331
12 3 0.5038984 1.7387283
13 4 0.4937876 0.4937876
14 4 0.4778530 0.9222618
15 4 0.5112493 1.3412850
16 4 0.4995507 1.7067071

Another possible solution, based only on dplyr and cumsum:
library(dplyr)
df %>%
group_by(group) %>%
mutate(g_it = cumsum((1 - 0.1)^(row_number() - 1) * exp)) %>% ungroup
#> # A tibble: 16 × 3
#> group exp g_it
#> <dbl> <dbl> <dbl>
#> 1 1 0.494 0.494
#> 2 1 0.502 0.945
#> 3 1 0.492 1.34
#> 4 1 0.516 1.72
#> 5 2 0.503 0.503
#> 6 2 0.492 0.946
#> 7 2 0.505 1.35
#> 8 2 0.507 1.72
#> 9 3 0.506 0.506
#> 10 3 0.497 0.953
#> 11 3 0.515 1.37
#> 12 3 0.504 1.74
#> 13 4 0.494 0.494
#> 14 4 0.478 0.924
#> 15 4 0.511 1.34
#> 16 4 0.500 1.70

Function that compares one column values against all other column values and returns matching one in R

So let's say I have two data frames
df1 <- data.frame(n = rep(n = 2,c(0,1,2,3,4)), nn =c(rep(x = 1, 5), rep(x=2, 5)),
y = rnorm(10), z = rnorm(10))
df2 <- data.frame(x = rnorm(20))
Here is the first df:
> head(df1)
n nn y z
1 0 1 1.5683647 0.48934096
2 1 1 1.2967556 -0.77891030
3 2 1 -0.2375963 1.74355935
4 3 1 -1.2241501 -0.07838729
5 4 1 -0.3278127 -0.97555379
6 0 2 -2.4124503 0.07065982
Here is the second df:
x
1 -0.4884289
2 0.9362939
3 -1.0624084
4 -0.9838209
5 0.4242479
6 -0.4513135
I'd like to substact x column values of df2 from z column values of df1. And return the rows of both dataframes for which the substracted value is approximately equal to that of y value of df1.
Is there a way to construct such function, so that I could imply the approximation to which the values should be equal?
So, that it's clear, I'd like to substract all x values from all z values and then compare the value to y column value of df1, and check if there is approximately matching value to y.

Here's an approach where I match every row of df1 with every row of df2, then take x and y from z (as implied by your logic of comparing z-x to y; this is the same as comparing z-x-y to zero). Finally, I look at each row of df1 and keep the match with the lowest absolute difference.
library(dplyr)
left_join(
df1 %>% mutate(dummy = 1, row = row_number()),
df2 %>% mutate(dummy = 1, row = row_number()), by = "dummy") %>%
mutate(diff = z - x - y) %>%
group_by(row.x) %>%
slice_min(abs(diff)) %>%
ungroup()
Result (I used set.seed(42) before generating df1+df2.)
# A tibble: 10 x 9
n nn y z dummy row.x x row.y diff
<dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int> <dbl>
1 0 1 1.37 1.30 1 1 0.0361 20 -0.102
2 1 1 -0.565 2.29 1 2 1.90 5 0.956
3 2 1 0.363 -1.39 1 3 -1.76 8 0.0112
4 3 1 0.633 -0.279 1 4 -0.851 18 -0.0607
5 4 1 0.404 -0.133 1 5 -0.609 14 0.0713
6 0 2 -0.106 0.636 1 6 0.705 12 0.0372
7 1 2 1.51 -0.284 1 7 -1.78 2 -0.0145
8 2 2 -0.0947 -2.66 1 8 -2.41 19 -0.148
9 3 2 2.02 -2.44 1 9 -2.41 19 -2.04
10 4 2 -0.0627 1.32 1 10 1.21 4 0.168

Creating a new column using the previous value of a different column and the previous value of itself

how can I create a new column which starting value is 1 and the following values are a multiplication of the previous value of a column (b) and the previous value of itself (d)?
these data are only made up, but have the structure of my data:
> a <- rep(1:10, 3)
> b <- runif(30)
> c <- tibble(a,b)
> c
# A tibble: 30 x 2
a b
<int> <dbl>
1 1 0.945
2 2 0.280
3 3 0.464
4 4 0.245
5 5 0.917
6 6 0.913
7 7 0.144
8 8 0.481
9 9 0.873
10 10 0.754
# ... with 20 more rows
Then I try to calculate column d:
> c <- c %>%
+ group_by(a) %>%
+ mutate(d = accumulate(lag(b, k = 1), `*`, .init = 1))
and it should look like this
# A tibble: 30 x 3
# Groups: a [10]
a b d
<int> <dbl> <dbl>
1 1 0.945 1 <--- b[1] * d[1] = d[2]
2 2 0.280 0.945
3 3 0.464 0.265
4 4 0.245 0.123
5 5 0.917 0.03
#...
But instead I am getting this error message.
Fehler: Column `d` must be length 3 (the group size) or one, not 4

The problem is that when you initialize accumulate with .init = that adds an extra first element of the vector.
You could try this:
library(dplyr)
library(purrr)
c %>%
group_by(a) %>%
mutate(d = accumulate(b[(2:length(b))-1], `*`,.init=1)) %>%
arrange(a)
# a b d
# <int> <dbl> <dbl>
# 1 1 0.266 1
# 2 1 0.206 0.266
# 3 1 0.935 0.0547
# 4 2 0.372 1
# 5 2 0.177 0.372
# … with 25 more rows
Data
library(tibble)
set.seed(1)
a <- rep(1:10, 3)
b <- runif(30)
c <- tibble(a,b)

Using dplyr, I would do this:
c %>%
mutate(d = 1*accumulate(.x = b[-length(b)],
.init = 1,
.f = `*`))
# # A tibble: 30 x 3
# a b d
# <int> <dbl> <dbl>
# 1 1 0.562 1
# 2 2 0.668 0.562
# 3 3 0.100 0.375
# 4 4 0.242 0.0376
# 5 5 0.0646 0.00907
# 6 6 0.373 0.000586
# 7 7 0.664 0.000219
# 8 8 0.915 0.000145
# 9 9 0.848 0.000133
# 10 10 0.952 0.000113
# # ... with 20 more rows

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

TIdyverse mutate using value from column to reference another column value in perhaps a different row - r

Use c to index the desired row of b: tib %>% mutate(d = b[c]) a b c d <dbl> <dbl> <dbl> <dbl> 1 1 0.924 1 0.924 2 2 0.661 3 0.402 3 3 0.402 3 0.402 4 4 0.637 3 0.402 5 5 0.353 1 0.924

Related

Display Image in R

Flag run-length of grouped intervals

Create a column that takes the first value of another column and subsequent values are the scaler of the prior value

Function that compares one column values against all other column values and returns matching one in R

Creating a new column using the previous value of a different column and the previous value of itself

Categories

Resources