Related
I have a pair data first column shows some values for individuals with disease and column 2 contains values for individuals without disease. I would like to make an scatter plot showing in the x-axis those with disease and in the y-axis those without disease. I want to show disease and non-diseases with two colors and also connect the pairs by line. Each row shows the pairs. For example pair 1 with disease value = 27 and pair 2 without disease value= 29 and so on. I have tried below, but I am not sure how to continue the rest. Any guidance is appreciated.
d <- structure(list(id_case = c(27, 17, 35, 18, 27, 40, 20, 25, 30, 20, 35, 26, 30, 31, 15, 11, 41),
id_control = c(29, 26, 39, 22, 24, 41, 29, 24, 25, 21, 29, 24, 26, 29, 15, 11, 35)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -17L))
ggplot(d, aes(id_case, id_control))+geom_point()+
xlab("with disease")+ylab("without disease")
I have a dataframe which look like this
y = data.frame(subdel = c(1, 2, 3, 1, 57, 14, 1, 2, 57, 57, 57, 3, 1, 1,
31, 21, 34, 56, 12, 45, 1, 63, 31, 34), muni = c("A01", "A83", "A40", NA, NA, NA, NA, NA, NA, NA, NA, "A45", "B26", "B42","B61", "B70", "B90", "C53", "C89","A45", "B26", "B42","B61", "B70"))
I'm expecting the next result:
z = data.frame(subdel = c(1, 2, 3, 57, 57, 57, 57, 3, 1, 1, 31, 21, 34, 56, 12, 45, 1, 63, 31, 34), muni = c("A01", "A83", "A40", NA, NA, NA, NA, "A45", "B26", "B42","B61", "B70", "B90", "C53", "C89", "A45", "B26", "B42","B61", "B70"))
I want to match subdel == 57 with muni == NA, but, as you can see, conservating all the another observations in the dataframe.
Any help would be appreciated.
We can use subset with a logical condition i.e. check for NA in 'muni' (is.na(muni)) and (&) where the 'subdel' is 57 (subdel == 57) or all other non-NA elements from 'muni' (!is.na(muni))
subset(y, is.na(muni) & subdel == 57 | !is.na(muni))
This is small example of my data set. This set contains weekly data about 52 weeks. You can see data with code below:
# CODE
#Data
library(tidyverse)
library(plotly)
ARTIFICIALDATA<-dput(structure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52), `2019 Series_1` = c(534.771929824561,
350.385964912281, 644.736842105263, 366.561403508772, 455.649122807018,
533.614035087719, 829.964912280702, 466.035087719298, 304.421052631579,
549.473684210526, 649.719298245614, 537.964912280702, 484.982456140351,
785.929824561404, 576.736842105263, 685.508771929824, 514.842105263158,
464.491228070175, 608.245614035088, 756.701754385965, 431.859649122807,
524.315789473684, 739.40350877193, 604.736842105263, 669.684210526316,
570.491228070175, 641.649122807018, 649.298245614035, 664.210526315789,
530.385964912281, 754.315789473684, 646.80701754386, 764.070175438596,
421.333333333333, 470.842105263158, 774.245614035088, 752.842105263158,
575.368421052632, 538.315789473684, 735.578947368421, 522, 862.561403508772,
496.526315789474, 710.631578947368, 584.456140350877, 843.19298245614,
563.473684210526, 568.456140350877, 625.368421052632, 768.912280701754,
679.824561403509, 642.526315789474), `2020 Series_1` = c(294.350877192983,
239.824561403509, 709.614035087719, 569.824561403509, 489.438596491228,
561.964912280702, 808.456140350877, 545.157894736842, 589.649122807018,
500.877192982456, 584.421052631579, 524.771929824561, 367.438596491228,
275.228070175439, 166.736842105263, 58.2456140350878, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA)), row.names = c(NA, -52L), class = c("tbl_df", "tbl",
"data.frame")))
colnames(ARTIFICIALDATA) <- c('week', 'series1', 'series2')
So the next step is to plot this data with r-plotly package. I want to have a plot like the example below. Because this is weekly data, first series1 have 52 observations while series2 has 16 observation (series1 is mean data for 2019 and series2 data for 2020). So for that reason, the comparison must be only on 16 observation (all observations which don't have NA) like the example below:
So can anybody help how to plot this graph with plotly?
Try this:
colnames(ARTIFICIALDATA) <- c("week", "series1", "series2")
ARTIFICIALDATA %>%
# Drop rows with NA
drop_na() %>%
# Convert to long format
pivot_longer(-week, names_to = "series") %>%
# Set the labels for the plot. If you want other lables simply adjust
mutate(label = case_when(
series == "series1" ~ "2019 Series_1",
series == "series2" ~ "2020 Series_1")) %>%
# Compute sum by sereis
group_by(label) %>%
summarise(sum = sum(value, na.rm = TRUE)) %>%
ungroup() %>%
# Plot
plot_ly(x = ~label, y = ~sum) %>%
add_bars() %>%
# Remove title for xaxis. But can you can label it as you like
layout(xaxis = list(title = ""))
I want to perform multiple imputation for a set of variables using the MICE package in R.
# Example data
data <- data.frame(
gcs = c(3, 10, NA, NA, NA, 15, 14, 15, 15, 14, 15, NA, 13, 15, 15),
hf = c(50, 66, 78, 99, NA, NA, 56, 55, NA, 76, 98, 105, NA, NA, 65),
...
)
The minimum for gcs is 3 and the maximum is 15, and it may not be a fractional number, how can I set these constraints in MICE? Same goes for hf, but this one only has a bottom limit of 0.
This question comes from a previous one I posted a while ago:
rollsum with fixed dates
I can not make the given solution to work. I have a large data set, the interesting columns are:
id = c(145658, 145658, 145658, 145658, 145658, 145658, 145658, 145658, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)
week_number = c(24, 35, 44, 71, 82, 117, 127, 142, 4, 15, 20, 24, 30, 36, 42, 46, 59, 67, 68, 71, 75, 78, 79, 86, 93, 96)
amount = c(51.9, 51.9, 51.9, 51.9, 51.9, 103.8, 51.9, 51.9, 67.9, 67.9, 67.9, 67.9, 67.9, 67.9, 67.9, 67.9, 67.9, 67.9, 101.0, 168.9, 101.0, 101.0, 135.8, 168.9, 168.9, 67.9)
df = data.frame(id = id, week_number = week_number, amount = amount)
In reality, I have thousands of id's, and each has different week number. I want to calculate the rollsum on the "amount" column for n past weeks (including the present week) for each id.
An extreme example would be with the past 100 weeks. The results would look like:
past_100wk = c(NA, NA, NA, NA, NA, 363.3, 363.3, 363.8, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
Again, this is an extreme case, but it shows the the results should give NA (or -1) when the row value is not included in the week_number window (100 weeks, in this case).
Thank you!