R animated plotly: line graph not plotting line - r

I am trying to create an animated plotly line graph in R using my own data. The animation works when I used markers as the 'mode', however, when changing the mode to 'lines' or 'plines' nothing shows on the graph.
Any suggestions?
Data:
CH4
X FIRST SECOND
1 1 23.9 71.9
2 2 2.9 23.7
3 3 85.7 6.0
4 4 1.2 94.0
5 5 1.1 66.8
6 6 1.5 99.9
Code:
plot_ly(CH4, x=~X, y=~FIRST, name="FIRST",
hoverinfo = 'text',
text = ~paste('Test Round: ', CH4$X, '<br>',
'Concentration: ', CH4$SECOND),
type = "scatter", mode = "plines", frame=~frame) %>%
add_trace(x=~X, y = ~SECOND, name="SECOND", mode = 'plines') %>%
layout(yaxis = list(title = "CH4 Concentrations"), xaxis = list(title =
"Test Round"))

You can find a solution here.
Starting from your data, you need to generate an accumulated dataframe, using for example the accumulate_by function defined below.
dts <- read.table(text="
X FIRST SECOND
1 1 23.9 71.9
2 2 2.9 23.7
3 3 85.7 6.0
4 4 1.2 94.0
5 5 1.1 66.8
6 6 1.5 99.9
", header=T)
library(plotly)
library(lazyeval)
library(dplyr)
accumulate_by <- function(dat, var) {
var <- f_eval(var, dat)
lvls <- plotly:::getLevels(var)
dats <- lapply(seq_along(lvls), function(x) {
cbind(dat[var %in% lvls[seq(1, x)], ], frame = lvls[[x]])
})
bind_rows(dats)
}
CH4 <- dts %>% accumulate_by(~X)
head(CH4, 10)
# X FIRST SECOND frame
# 1 1 23.9 71.9 1
# 2 1 23.9 71.9 2
# 3 2 2.9 23.7 2
# 4 1 23.9 71.9 3
# 5 2 2.9 23.7 3
# 6 3 85.7 6.0 3
# 7 1 23.9 71.9 4
# 8 2 2.9 23.7 4
# 9 3 85.7 6.0 4
# 10 4 1.2 94.0 4
Now your code works correctly:
plot_ly(CH4, x=~X, y=~FIRST, frame=~frame,
name='FIRST', hoverinfo = 'text',
text = ~paste('Test Round: ', CH4$X, '<br>',
'Concentration: ', CH4$SECOND),
type = 'scatter', mode = 'plines') %>%
add_trace(x=~X, y = ~SECOND, name='SECOND', mode = 'plines') %>%
layout(yaxis = list(title = 'CH4 Concentrations'),
xaxis = list(title = 'Test Round'))

Related

How to write function with multiple grouping variable in R? I am using curly curly operator

I have written following function to calculate average of the desired columns after grouping by another variable.
calculate_avg <- function(df, of_colmn, by_var = NULL){
df %>%
group_by({{ by_var }}) %>%
summarise(across({{ of_colmn }}, list(avg = ~ mean(., na.rm = TRUE))))
}
This work perfectly fine if I keep zero or only one by_var but when I try to run it with more than one by_var, it produces error.
For example, when I run this code "mpg" dataset which has 234 rows, it produces following error:
Code : calculate_avg(mpg, c(cty, hwy, displ), c(class, trans))
Error: Problem adding computed columns in `group_by()`.
x Problem with `mutate()` input `..1`.
i `..1 = c(class, trans)`.
i `..1` must be size 234 or 1, not 468.
Use across in group_by -
library(dplyr)
calculate_avg <- function(df, of_colmn, by_var = NULL){
df %>%
group_by(across({{ by_var }})) %>%
summarise(across({{ of_colmn }}, list(avg = ~ mean(., na.rm = TRUE))))
}
calculate_avg(mpg, c(cty, hwy, displ), c(class, trans))
# class trans cty_avg hwy_avg displ_avg
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 2seater auto(l4) 15 23 5.7
# 2 2seater auto(s6) 15 25 6.2
# 3 2seater manual(m6) 15.7 25.3 6.3
# 4 compact auto(av) 19.5 28.5 2.55
# 5 compact auto(l3) 24 30 1.8
# 6 compact auto(l4) 20.2 27.9 2.25
# 7 compact auto(l5) 16.2 26.2 2.3
# 8 compact auto(s4) 20 26 2.5
# 9 compact auto(s5) 20 29 2.85
#10 compact auto(s6) 20.2 27.8 2.32
# … with 27 more rows

How to add "labels" and "value" arguments with highcharter

I want to specify which column I want as label and value in a pie chart
The problem is when I use the function hc_add_series_labels_values() which accept this 2 argument I have no output because seems to be deprecated.
The hc_add_series() seems to automaticly the 2 column depending on order, type ...
This package is not well documented I couldnt find what I need
Thanks
In my example I want to specify the name2 column as label
and high as value, how to do that ?
library(dplyr)
library(highcharter)
n <- 5
set.seed(123)
colors <- c("#d35400", "#2980b9", "#2ecc71", "#f1c40f", "#2c3e50", "#7f8c8d")
colors2 <- c("#000004", "#3B0F70", "#8C2981", "#DE4968", "#FE9F6D", "#FCFDBF")
df <- data.frame(x = seq_len(n) - 1) %>%
mutate(
y = 10 + x + 10 * sin(x),
y = round(y, 1),
z = (x*y) - median(x*y),
e = 10 * abs(rnorm(length(x))) + 2,
e = round(e, 1),
low = y - e,
high = y + e,
value = y,
name = sample(fruit[str_length(fruit) <= 5], size = n),
color = rep(colors, length.out = n),
segmentColor = rep(colors2, length.out = n)
)
df$name2 <- c("mos", "ok", "kk", "jji", "hufg")
## x y z e low high value name color segmentColor
## 1 0 10.0 -25.6 7.6 2.4 17.6 10.0 plum #d35400 #000004
## 2 1 19.4 -6.2 4.3 15.1 23.7 19.4 lemon #2980b9 #3B0F70
## 3 2 21.1 16.6 17.6 3.5 38.7 21.1 mango #2ecc71 #8C2981
## 4 3 14.4 17.6 2.7 11.7 17.1 14.4 pear #f1c40f #DE4968
## 5 4 6.4 0.0 3.3 3.1 9.7 6.4 apple #2c3e50 #FE9F6D
highchart() %>%
hc_chart(type = "pie") %>%
hc_add_series(df, name = "Fruit Consumption", showInLegend = FALSE)
For People who have same problem you can check this :
This package seems to work like ggplot2, the function hchart do the job with the hcaes argument
hchart(df, type = "pie", hcaes(name2, high))
Output :

How can I plot using 2 y-axes using a single data frame with 7 variables having a wide range of values?

I have 7 variables (density of plankton functional groups) in a time series which I want to place in a single plot to compare their trends over time. I used ggplot, geom_point and geom_line. Since each of the variables vary in range, those with smaller values end up as almost flat lines when plotted against those with the larger values. Since I am only after the trends, not the density, I would prefer to see all lines in one plot. I considered using the sec.axis function, but could not figure out how to assign the variables to either of the y-axes.
Below is my sample data:
seq=1:6
fgrp:
Cop<-c(4.166667,4.722222,3.055556,4.444444,2.777778,2.222222)
Cyan<-c(7.222222,3.888889,1.388889,0.555556,6.944444,3.611111)
Dia<-c(96.66667,43.88889,34.44444,111.8056,163.0556,94.16667)
Dino<-c(126.9444,71.11111,50,55.97222,65,38.33333)
Naup<-c(271.9444,225.5556,207.7778,229.8611,139.7222,92.5)
OT<-c(22.5,19.16667,10.27778,18.61111,18.88889,8.055556)
Prot<-c(141.9444,108.8889,99.16667,113.8889,84.44444,71.94444)
And the ggplot script without the sec.axis since I could not make it work yet:
ggplot(data=df,aes(x=seq,y=mean,shape=fgrp,linetype=fgrp))+geom_point(size=2.5)+geom_line(size=0.5)+scale_shape_manual(values=c(16,17,15,18,8,1,0),
guide=guide_legend(title="Functional\nGroups"))+scale_linetype_manual(values=c("solid","longdash","dotted","dotdash","dashed","twodash","12345678"),guide=F)+scale_y_continuous(sec.axis = sec_axis(~./3)) +geom_errorbar(mapping=aes(ymax=mean+se,ymin=mean-se), width=0.04,linetype="longdash",color="gray30")+theme_minimal()+labs(list(title="Control",x="time",y="density"),size=12)+theme(plot.title = element_text(size = 12,hjust = 0.5 ))
The lines do not look terrible, as is, but here's an example that leverages facet_wrap with scales = "free_y" that should get you going in the right direction:
library(tidyverse)
seq <- 1:6
Cop <- c(4.166667,4.722222,3.055556,4.444444,2.777778,2.222222)
Cyan <- c(7.222222,3.888889,1.388889,0.555556,6.944444,3.611111)
Dia <- c(96.66667,43.88889,34.44444,111.8056,163.0556,94.16667)
Dino <- c(126.9444,71.11111,50,55.97222,65,38.33333)
Naup <- c(271.9444,225.5556,207.7778,229.8611,139.7222,92.5)
OT <- c(22.5,19.16667,10.27778,18.61111,18.88889,8.055556)
Prot <- c(141.9444,108.8889,99.16667,113.8889,84.44444,71.94444)
df <- tibble(
seq = seq,
cop = Cop,
cyan = Cyan,
dia = Dia,
dino = Dino,
naup = Naup,
ot = OT,
prot = Prot
)
df
#> # A tibble: 6 x 8
#> seq cop cyan dia dino naup ot prot
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 4.17 7.22 96.7 127. 272. 22.5 142.
#> 2 2 4.72 3.89 43.9 71.1 226. 19.2 109.
#> 3 3 3.06 1.39 34.4 50 208. 10.3 99.2
#> 4 4 4.44 0.556 112. 56.0 230. 18.6 114.
#> 5 5 2.78 6.94 163. 65 140. 18.9 84.4
#> 6 6 2.22 3.61 94.2 38.3 92.5 8.06 71.9
df_tidy <- df %>%
gather(grp, value, -seq)
df_tidy
#> # A tibble: 42 x 3
#> seq grp value
#> <int> <chr> <dbl>
#> 1 1 cop 4.17
#> 2 2 cop 4.72
#> 3 3 cop 3.06
#> 4 4 cop 4.44
#> 5 5 cop 2.78
#> 6 6 cop 2.22
#> 7 1 cyan 7.22
#> 8 2 cyan 3.89
#> 9 3 cyan 1.39
#> 10 4 cyan 0.556
#> # ... with 32 more rows
ggplot(df_tidy, aes(x = seq, y = value, color = grp)) +
geom_line()
ggplot(df_tidy, aes(x = seq, y = value, color = grp)) +
geom_line() +
facet_wrap(~ grp, scales = "free_y")

Simulate data and randomly add missing values to dataframe

How can I randomly add missing values to some or each column (say random ~5% missing in each) in a simulated dataframe, plus, is there a more efficient way of simulating a dataframe with both continuous and factor columns?
#Simulate some data
N <- 2000
data <- data.frame(id = 1:2000,age = rnorm(N,18:90),bmi = rnorm(N,15:40),
chol = rnorm(N,50:350), insulin = rnorm(N,2:40),sbp = rnorm(N, 50:200),
dbp = rnorm(N, 30:150), sex = c(rep(1, 1000), rep(2, 1000)),
smoke = rep(c(1, 2), 1000), educ = sample(LETTERS[1:4]))
#Manually add some missing values
data <- data %>%
mutate(age = "is.na<-"(age, age <19 | age >88),
bmi = "is.na<-"(bmi, bmi >38 | bmi <16),
insulin = "is.na<-"(insulin, insulin >38),
educ = "is.na<-"(educ, bmi >35))
Best solution in my opinion would be using the mice package for this. This is a R package dedicated to imputation. It also has a function called amputate for introducing missing data into a data.frame.
ampute - Generate Missing Data For Simulation Purposes
This function generates multivariate missing data in a MCAR, MAR or MNAR manner.
The advantage of this solution is you can set multiple parameters for the simulation of your missing data.
ampute(data, prop = 0.5, patterns = NULL, freq = NULL, mech = "MAR",
weights = NULL, cont = TRUE, type = NULL, odds = NULL,
bycases = TRUE, run = TRUE)
As you can see you can set the percentage of missing values, the missing data mechanism (MCAR would be your choice for missing completely at random) and several other parameters. This solution would also be quite clean since it is only 1 line of code.
Here's a tidyverse approach that will remove roughly 20% of your data for each column you specify:
set.seed(1)
# example data
N <- 20
data <- data.frame(id = 1:N,
age = rnorm(N,18:90),
bmi = rnorm(N,15:40),
chol = rnorm(N,50:350))
library(tidyverse)
# specify which variables should have missing data and prc of missing data
c_names = c("age","bmi")
prc_missing = 0.20
data %>%
gather(var, value, -id) %>% # reshape data
mutate(r = runif(nrow(.)), # simulate a random number from 0 to 1 for each row
value = ifelse(var %in% c_names & r <= prc_missing, NA, value)) %>% # if it's one of the variables you specified and the random number is less than your threshold update to NA
select(-r) %>% # remove random number
spread(var, value) # reshape back to original format
# id age bmi chol
# 1 1 17.37355 15.91898 49.83548
# 2 2 19.18364 16.78214 50.74664
# 3 3 19.16437 17.07456 52.69696
# 4 4 NA 16.01065 53.55666
# 5 5 22.32951 19.61983 53.31124
# 6 6 22.17953 19.94387 54.29250
# 7 7 24.48743 NA 56.36458
# 8 8 25.73832 20.52925 57.76853
# 9 9 26.57578 NA 57.88765
# 10 10 26.69461 24.41794 59.88111
# 11 11 29.51178 26.35868 60.39811
# 12 12 NA 25.89721 60.38797
# 13 13 NA 27.38767 62.34112
# 14 14 28.78530 27.94619 61.87064
# 15 15 33.12493 27.62294 65.43302
# 16 16 32.95507 NA 66.98040
# 17 17 33.98381 30.60571 65.63278
# 18 18 35.94384 NA 65.95587
# 19 19 36.82122 34.10003 68.56972
# 20 20 37.59390 34.76318 68.86495
And this is an alternative that will remove exactly 20% of data for the columns you specify:
set.seed(1)
# example data
N <- 20
data <- data.frame(id = 1:N,
age = rnorm(N,18:90),
bmi = rnorm(N,15:40),
chol = rnorm(N,50:350))
library(tidyverse)
# specify which variables should have missing data and prc of missing data
c_names = c("age","bmi")
prc_missing = 0.20
n_remove = prc_missing * nrow(data)
data %>%
gather(var, value, -id) %>% # reshape data
sample_frac(1) %>% # shuffle rows
group_by(var) %>% # for each variables
mutate(value = ifelse(var %in% c_names & row_number() <= n_remove, NA, value)) %>% # update to NA top x number of rows if it's one of the variables you specified
spread(var, value) # reshape to original format
# # A tibble: 20 x 4
# id age bmi chol
# <int> <dbl> <dbl> <dbl>
# 1 1 17.4 15.9 49.8
# 2 2 19.2 16.8 50.7
# 3 3 19.2 17.1 52.7
# 4 4 NA 16.0 53.6
# 5 5 22.3 NA 53.3
# 6 6 22.2 19.9 54.3
# 7 7 24.5 20.8 56.4
# 8 8 25.7 NA 57.8
# 9 9 26.6 NA 57.9
# 10 10 NA NA 59.9
# 11 11 NA 26.4 60.4
# 12 12 NA 25.9 60.4
# 13 13 29.4 27.4 62.3
# 14 14 28.8 27.9 61.9
# 15 15 33.1 27.6 65.4
# 16 16 33.0 29.6 67.0
# 17 17 34.0 30.6 65.6
# 18 18 35.9 31.9 66.0
# 19 19 36.8 34.1 68.6
# 20 20 37.6 34.8 68.9
Would this work?
n_rows <- nrow(data)
perc_missing <- 5 # percentage missing data
row_missing <- sample(1:n_rows, sample(1:n_rows, round(perc_missing/100 * n_rows,0))) # sample randomly x% of rows
col_missing <- 1 # define column
data[row_missing, col_missing] <- NA # assign missing values

Removing a column from a matrix

I'm a bit new to R and wanting to remove a column from a matrix by the name of that column. I know that X[,2] gives the second column and X[,-2] gives every column except the second one. What I really want to know is if there's a similar command using column names. I've got a matrix and want to remove the "sales" column, but X[,-"sales"] doesn't seem to work for this. How should I do this? I would use the column number only I want to be able to use it for other matrices later, which have different dimensions. Any help would be much appreciated.
I'm not sure why all the answers are solutions for data frames and not matrices.
Per #Sotos's and #Moody_Mudskipper's comments, here is an example with the builtin state.x77 data matrix.
dat <- head(state.x77)
dat
#> Population Income Illiteracy Life Exp Murder HS Grad Frost Area
#> Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
#> Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
#> Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
#> Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
#> California 21198 5114 1.1 71.71 10.3 62.6 20 156361
#> Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
# for removing one column
dat[, colnames(dat) != "Area"]
#> Population Income Illiteracy Life Exp Murder HS Grad Frost
#> Alabama 3615 3624 2.1 69.05 15.1 41.3 20
#> Alaska 365 6315 1.5 69.31 11.3 66.7 152
#> Arizona 2212 4530 1.8 70.55 7.8 58.1 15
#> Arkansas 2110 3378 1.9 70.66 10.1 39.9 65
#> California 21198 5114 1.1 71.71 10.3 62.6 20
#> Colorado 2541 4884 0.7 72.06 6.8 63.9 166
# for removing more than one column
dat[, !colnames(dat) %in% c("Area", "Life Exp")]
#> Population Income Illiteracy Murder HS Grad Frost
#> Alabama 3615 3624 2.1 15.1 41.3 20
#> Alaska 365 6315 1.5 11.3 66.7 152
#> Arizona 2212 4530 1.8 7.8 58.1 15
#> Arkansas 2110 3378 1.9 10.1 39.9 65
#> California 21198 5114 1.1 10.3 62.6 20
#> Colorado 2541 4884 0.7 6.8 63.9 166
#be sure to use `colnames` and not `names`
names(state.x77)
#> NULL
Created on 2020-06-27 by the reprex package (v0.3.0)
my favorite way:
# create data
df <- data.frame(x = runif(100),
y = runif(100),
remove_me = runif(100),
remove_me_too = runif(100))
# remove column
df <- df[,!names(df) %in% c("remove_me", "remove_me_too")]
so this dataframe:
> df
x y remove_me remove_me_too
1 0.731124508 0.535219259 0.33209113 0.736142042
2 0.612017350 0.404128030 0.84923974 0.624543223
3 0.415403559 0.369818154 0.53817387 0.661263087
4 0.199780006 0.679946936 0.58782429 0.085624708
5 0.343304259 0.892128112 0.02827132 0.038203599
becomes this:
> df
x y
1 0.731124508 0.535219259
2 0.612017350 0.404128030
3 0.415403559 0.369818154
4 0.199780006 0.679946936
5 0.343304259 0.892128112
As always in R there are many potential solutions. You can use the package dplyr and select() to easily remove or select columns in a data frame.
df <- data.frame(x = runif(100),
y = runif(100),
remove_me = runif(100),
remove_me_too = runif(100))
library(dplyr)
select(df, -remove_me, -remove_me_too) %>% head()
#> x y
#> 1 0.35113636 0.134590652
#> 2 0.72545356 0.165608839
#> 3 0.81000067 0.090696049
#> 4 0.29882204 0.004602398
#> 5 0.93492918 0.256870750
#> 6 0.03007377 0.395614901
You can read more about dplyr and its verbs here.
As a general case, if you remove so many columns that only one column remains, R will convert it to a numeric vector. You can prevent it by setting drop = FALSE.
(df <- data.frame(x = runif(6),
y = runif(6),
remove_me = runif(6),
remove_me_too = runif(6)))
# x y remove_me remove_me_too
# 1 0.4839869 0.18672217 0.0973506 0.72310641
# 2 0.2467426 0.37950878 0.2472324 0.80133920
# 3 0.4449471 0.58542547 0.8185943 0.57900456
# 4 0.9119014 0.12089776 0.2153147 0.05584816
# 5 0.4979701 0.04890334 0.7420666 0.44906667
# 6 0.3266374 0.37110822 0.6809380 0.29091746
df[, -c(3, 4)]
# x y
# 1 0.4839869 0.18672217
# 2 0.2467426 0.37950878
# 3 0.4449471 0.58542547
# 4 0.9119014 0.12089776
# 5 0.4979701 0.04890334
# 6 0.3266374 0.37110822
# Result is a numeric vector
df[, -c(2, 3, 4)]
# [1] 0.4839869 0.2467426 0.4449471 0.9119014 0.4979701 0.3266374
# Keep the matrix type
df[, -c(2, 3, 4), drop = FALSE]
# x
# 1 0.4839869
# 2 0.2467426
# 3 0.4449471
# 4 0.9119014
# 5 0.4979701
# 6 0.3266374

Resources