mutate rnorm with multiple column output - r

I would like to create random variables using rnorm() with the mean and sd specified in separate columns in a tibble
n <- 2
ti <- tibble(Name = letters[1:10], mean = 1:10, sd = 1:10)
How do I use mutate to add n columns to the tibble with output from rnorm(n, mean, sd) for every row?
(I know I can do this in base R, but am curious to learn how this works using dplyr)

One dplyr and tidyr option could be:
ti %>%
rowwise() %>%
mutate(col_rnorm = list(setNames(rnorm(n, mean, sd), c("col1", "col2")))) %>%
unnest_wider(col_rnorm)
Name mean sd col1 col2
<chr> <int> <int> <dbl> <dbl>
1 a 1 1 -1.73 1.18
2 b 2 2 3.86 0.0943
3 c 3 3 3.54 -0.502
4 d 4 4 3.21 -3.90
5 e 5 5 3.61 9.48
6 f 6 6 7.07 16.1
7 g 7 7 17.4 5.95
8 h 8 8 5.32 13.6
9 i 9 9 19.2 19.8
10 j 10 10 9.67 11.3

Related

How to create a new column which is less than the previous column value

I am trying to create a new column which is formed by taking the value from column A as the upper bound of the sample which I would like to take. I tried the following but no avail. Any suggestions? Suppose the value A is 15.25, then I would like to generate a value which is between 1 and 15.25 which are spaced by 1/4.
my_data = data.frame(A = sample(seq(1, 20, 1/4), 10))
my_data %>% mutate(B = sample(seq(1, A, 1/4), 1))
You can use rowwise that
allows you to compute on a data frame a row-at-a-time
as follows. Otherwise it would take the entire A column into sample where you need to specify only 1 value as upper-bound:
my_data %>%
rowwise() %>%
mutate(B = sample(seq(1, A, 1/4), size = 1))
# A tibble: 10 × 2
# Rowwise:
A B
<dbl> <dbl>
1 19 9.25
2 8 6.25
3 7.25 1.5
4 3.25 2.75
5 2.75 2.25
6 11 4
7 14.8 13
8 9.25 7.75
9 10 9
10 10.5 8.25
A base R solution:
set.seed(11) # for reproducibility
my_data <- data.frame(A = sample(seq(1, 20, 1/4), 10))
my_data$B <- sapply(my_data$A, \(x) sample(seq(1, x, 1/4), 1))
which gives:
> my_data
A B
1 9.25 9.25
2 14.75 13.00
3 7.00 2.75
4 4.75 4.50
5 10.00 2.25
6 15.75 12.25
7 3.75 1.50
8 16.25 9.50
9 9.50 5.25
10 10.50 3.75

The recast fun to spread the rows is omitting key columns in R?

I am trying to spread my data such that months are the columns associated with both site and spx. I tried to use recast but I lose the informaton about species. What do I do to get the expected output (attached)?
set.seed(111)
month <- rep(c("J","F","M"), each = 6)
site <- rep(c(1,2,3,4,5,6), times = 3)
spA <- rnorm(18,0,2)
spB <- rnorm(18,0,2)
spC <- rnorm(18,0,2)
spD <- rnorm(18,0,2)
df <- data.frame(month, site, spA, spB, spC, spD)
df.test <- reshape2::recast(df, site ~ month)
Here is what I am getting.
site F J M
1 1 5 5 5
2 2 5 5 5
3 3 5 5 5
4 4 5 5 5
5 5 5 5 5
6 6 5 5 5
#Expected output (It's dummy data)
site sp J F M
1 A 5 6 7
1 B 2 3 4
..
6 D 1 2 3
If the intention is not to aggregate, but just transpose, then we can use pivot_longer to reshape to long and then reshape back to wide with pivot_wider
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with('sp'), names_prefix = 'sp',
names_to = 'sp') %>%
pivot_wider(names_from = month, values_from = value)
-output
# A tibble: 24 × 5
site sp J F M
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 0.470 -2.99 3.69
2 1 B -2.39 0.653 -6.23
3 1 C -0.232 -2.72 4.97
4 1 D 0.350 -0.433 0.405
5 2 A -0.661 -2.02 0.788
6 2 B 0.728 1.20 -1.88
7 2 C 0.669 0.962 3.92
8 2 D -1.69 2.89 -1.61
9 3 A -0.623 -1.90 1.60
10 3 B 0.723 -3.68 2.80
# … with 14 more rows
Or using recast - specify the id.var and then include the variable also in the formula
library(reshape2)
reshape2::recast(df, site + variable ~ month, id.var = c("month", "site"))
site variable F J M
1 1 spA -2.99485331 0.4704414 3.6912725
2 1 spB 0.65309848 -2.3872179 -6.2264346
3 1 spC -2.72380897 -0.2323101 4.9713231
4 1 spD -0.43285732 0.3501913 0.4046144
5 2 spA -2.02037684 -0.6614717 0.7881082
6 2 spB 1.19650840 0.7283735 -1.8827148
7 2 spC 0.96224916 0.6685120 3.9199634
8 2 spD 2.89295633 -1.6945355 -1.6123984
9 3 spA -1.89695121 -0.6232476 1.5950570
10 3 spB -3.68306860 0.7233249 2.8005176
11 3 spC 1.48394325 -1.2417162 0.3833268
12 3 spD 0.81941960 1.9564633 0.5892684
13 4 spA -0.98792443 -4.6046913 -3.1333307
14 4 spB 5.43611120 0.6939287 -3.2409401
15 4 spC 0.05564925 -2.6196898 3.1050885
16 4 spD 1.82183314 3.6117365 2.8097662
...

Add observation number by group in R [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 1 year ago.
This is a silly question but I am new to R and it would make my life so much easier if I could figure out how to do this!
So here is some sample data
data <- read.table(text = "Category Y
A 5.1
A 3.14
A 1.79
A 3.21
A 5.57
B 3.68
B 4.56
B 3.32
B 4.98
B 5.82
",header = TRUE)
I want to add a column that counts the number of observations within a group. Here is what I want it to look like:
Category Y OBS
A 5.1 1
A 3.14 2
A 1.79 3
A 3.21 4
A 5.57 5
B 3.68 1
B 4.56 2
B 3.32 3
B 4.98 4
B 5.82 5
I have tried:
data <- data %>% group_by(Category) %>% mutate(count = c(1:length(Category)))
which just creates another column numbered from 1 to 10, and
data <- data %>% group_by(Category) %>% add_tally()
which just creates another column of all 5s
Base R:
data$OBS <- ave(seq_len(nrow(data)), data$Category, FUN = seq_along)
data
# Category Y OBS
# 1 A 5.10 1
# 2 A 3.14 2
# 3 A 1.79 3
# 4 A 3.21 4
# 5 A 5.57 5
# 6 B 3.68 1
# 7 B 4.56 2
# 8 B 3.32 3
# 9 B 4.98 4
# 10 B 5.82 5
BTW: one can use any of the frame's columns as the first argument, including ave(data$Category, data$Category, FUN=seq_along), but ave chooses its output class based on the input class, so using a string as the first argument will result in a return of strings:
ave(data$Category, data$Category, FUN = seq_along)
# [1] "1" "2" "3" "4" "5" "1" "2" "3" "4" "5"
While not heinous, it needs to be an intentional choice. Since it appears that you wanted an integer in that column, I chose the simplest integer-in, integer-out approach. It could also have used rep(1L,nrow(data)) or anything that is both integer and the same length as the number of rows in the frame, since seq_along (the function I chose) won't otherwise care.
library(data.table)
setDT(data)[, OBS := seq_len(.N), by = .(Category)]
data
Category Y OBS
1: A 5.10 1
2: A 3.14 2
3: A 1.79 3
4: A 3.21 4
5: A 5.57 5
6: B 3.68 1
7: B 4.56 2
8: B 3.32 3
9: B 4.98 4
10: B 5.82 5
library(dplyr)
data %>% group_by(Category) %>% mutate(Obs = row_number())
# A tibble: 10 x 3
# Groups: Category [2]
Category Y Obs
<chr> <dbl> <int>
1 A 5.1 1
2 A 3.14 2
3 A 1.79 3
4 A 3.21 4
5 A 5.57 5
6 B 3.68 1
7 B 4.56 2
8 B 3.32 3
9 B 4.98 4
10 B 5.82 5
OR
data$OBS <- ave(data$Category, data$Category, FUN = seq_along)
data
Category Y OBS
1 A 5.10 1
2 A 3.14 2
3 A 1.79 3
4 A 3.21 4
5 A 5.57 5
6 B 3.68 1
7 B 4.56 2
8 B 3.32 3
9 B 4.98 4
10 B 5.82 5
Another base R
category <- c(rep('A',5),rep('B',5))
sequence <- sequence(rle(as.character(category))$lengths)
data <- data.frame(category=category,sequence=sequence)
head(data,10)

Plot aggregate with multiple columns and multiple variables

Attempting to plot aggregate data from the following data.
Person Time Period Value SMA2 SMA3 SMA4
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 1 1 14 NA NA NA
2 A 2 1 8 11 NA NA
3 A 3 1 13 10.5 11.7 NA
4 A 4 1 12 12.5 11 11.8
5 A 5 1 19 15.5 14.7 13
6 A 6 1 9 14 13.3 13.2
7 A 7 2 14 NA NA NA
8 A 8 2 7 10.5 NA NA
9 A 9 2 11 9 10.7 NA
10 A 10 2 14 12.5 10.7 11.5
# ... with 26 more rows
I have used aggregate(DataSet[,c(4,5,6,7)], by=list(DataSet$Person), na.rm = TRUE, max) to get the following:
Group.1 Value SMA2 SMA3 SMA4
1 A 20 18.0 16.66667 15.25
2 B 20 17.0 16.66667 15.00
3 C 19 18.5 14.33333 14.50
I'd like to plot the maxes for each SMA for Person A, B, and C on the same plot.
I would also like to be able to plot the mean of these maxes for each SMA column.
Any help is appreciated.
Like so? Or are you looking for something different?
df <- data.frame("Group.1"=c("A","B","C"), "Value"=c(20,20,20),
"SMA2"=c(18.0, 17.0, 18.5), "SMA3" =c(16.667, 16.667, 14.333),
"SMA4"=c(15.25, 15.00, 14.50))
library(ggplot2)
library(tidyr)
df.g <- df %>%
gather(SMA, Value, -Group.1)
df.g$SMA <- factor(df.g$SMA, levels=c("Value", "SMA2", "SMA3", "SMA4"))
means <- df.g %>%
group_by(SMA) %>%
summarise(m=mean(Value))
ggplot(df.g, aes(x=SMA, y=Value, group=Group.1, colour=Group.1)) +
geom_line() +
geom_point(data=means, aes(x=SMA, y=m), inherit.aes = F)

dplyr: How to prevent transmute from showing 'adding missing grouping variables' message?

I am performing a mutation on a dataframe using dplyr's group_by and then transmute. How do I prevent the message 'Adding missing grouping variables: gender'?
Reproducible example:
library(tidyverse)
df <- data.frame(gender = rep(c("M", "F"), 10), score = rnorm(10, 5, 2))
df %>%
group_by(gender) %>%
transmute(rank = rank(score), score = score) %>% arrange(score)
# Adding missing grouping variables: `gender`
# A tibble: 20 x 2
# Groups: gender [2]
gender rank
<fct> <dbl>
1 M 9.50
2 F 1.50
3 M 7.50
4 F 3.50
5 M 3.50
6 F 5.50
7 M 1.50
8 F 7.50
9 M 5.50
10 F 9.50
11 M 9.50
12 F 1.50
13 M 7.50
14 F 3.50
15 M 3.50
16 F 5.50
17 M 1.50
18 F 7.50
19 M 5.50
20 F 9.50
df %>%
group_by(gender) %>%
transmute(gender, rank = rank(score))
# Error in mutate_impl(.data, dots) :
# Column `gender` can't be modified because it's a grouping variable
This code is part of a larger Shiny app. I know I can prevent the message using 'suppressMessages()', but I am looking for a more specific solution.
Edit: R version 3.3.2
dplyr_0.7.4
Off topic: I have used stackoverflow as a reader for quite some time, but this is my first question. Hope it follows the rules.
We can use transmute after the group_by. Only thing is that the group by variable is not needed in the transmute as group_by will return that column
#R 3.4.4
library(dplyr) # using dplyr_0.7.5
df %>%
group_by(gender) %>%
transmute(rank = rank(score))
# A tibble: 20 x 2
# Groups: gender [2]
# gender rank
# <fct> <dbl>
# 1 M 3.5
# 2 F 1.5
# 3 M 7.5
# 4 F 7.5
# 5 M 9.5
# 6 F 9.5
# 7 M 1.5
# 8 F 5.5
# 9 M 5.5
#10 F 3.5
#11 M 3.5
#12 F 1.5
#13 M 7.5
#14 F 7.5
#15 M 9.5
#16 F 9.5
#17 M 1.5
#18 F 5.5
#19 M 5.5
#20 F 3.5

Resources