I am trying to plot several line graphs from a list using a loop in R. The list temp.list looks like this:
> temp.list[1]
$`1`
NEW_UPC Week1 Week2 Week3 Week4 Week5 Week6 Week7 Week8 Week9 Week10 Week11 Week12
5 11410008398 3 6 11 15 15 27 31 33 34 34 34 34
Life Status Num_markets Sales
5 197 1 50 186048.1
I use only some part of the data above to plot, specifically items 2 to 13 in the list will go on the y-axis, i.e. 3,6,11,15,...,34. For x-axis, I would like to have Week 1, Week 2, ..., Week 12 at the tick marks. As I don't know how to assign character values to x in gglplot command, I created a variable called weeks for x-axis as below:
weeks = c(1,2,3,4,5,6,7,8,9,10,11,12)
The code that I used to generate the plot is below:
for (i in 1:2) {
markets= temp.list[[i]][2:13]
ggplot(data = NULL,aes(x=weeks,y=markets))+geom_line()+
scale_x_continuous(breaks = seq(0,12,1))+
scale_y_continuous(breaks = seq(0,50,5))
}
This code does not generate any plot. When I run just the below lines:
ggplot(data = NULL,aes(x=weeks,y=markets))+geom_line()+
scale_x_continuous(breaks = seq(0,12,1))+
scale_y_continuous(breaks = seq(0,50,5))
I get this error:
Error: geom_line requires the following missing aesthetics: y
In addition: Warning message:
In (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
row names were found from a short variable and have been discarded
Any help to fix this will be appreciated. I looked at some related discussions here, but I am not clear how to proceed.
Also, any better way to generate multiple plots is also welcome. From temp.list, I am looking to generate over 300 separate line graphs (i.e, not all lines in one chart).
Here is a solution using tidyverse:
library(tidyverse)
# 1: create a function for the plot
## gather will give a long format data.frame
## then extract the number of the week and format it to numeric
## then use ggplot on this data
plot_function <- function(data) {
data %>%
gather(., key = week, value = markets, starts_with("Week")) %>%
mutate(week= as.numeric(str_sub(week, 5, -1))) %>%
ggplot(., aes(x = week, y = markets)) +
geom_line() +
scale_x_continuous(breaks = seq(0, 12, 1)) +
scale_y_continuous(breaks = seq(0, 50, 5))
}
# 2: map the function to your list
plots <- map(temp.list, plot_function)
# 3: access each plot easily
plots[[1]]
plots[[2]]
...
#Data used for this example
temp.list = list('1' = data.frame(NEW_UPC = 11410008398,
Week1 = 3,
Week2 = 6,
Week3 = 11,
Week4 = 15,
Week5 = 15,
Week6 = 27,
Week7 = 31,
Week8 = 33,
Week9 = 34,
Week10 = 34,
Week11 = 34,
Week12 = 34,
Life = 197,
Status = 1,
Num_markets = 50,
Sales = 186048.1),
'2' = data.frame(NEW_UPC = 11410008398,
Week1 = 4,
Week2 = 5,
Week3 = 8,
Week4 = 13,
Week5 = 14,
Week6 = 25,
Week7 = 29,
Week8 = 30,
Week9 = 31,
Week10 = 33,
Week11 = 34,
Week12 = 34,
Life = 201,
Status = 1,
Num_markets = 50,
Sales = 186048.1)
)
Related
Am trying to create an R dodged geom bar with this data but am not getting a plot that i need
Department Male Female
<chr> <int> <int>
1 "Admin Offices" 3 6
2 "Executive Office" 0 1
3 "IT/IS" 28 22
4 "Production " 83 126
5 "Sales" 16 15
6 "Software Engineering" 5 6
What i tried seems pretty wrong so anyone can help
I can only guess what you need actually. However The result could be this:
library(dplyr)
library(tidyr)
library(ggplot2)
df<-data.frame(Department = c("Admin Offices",
"Executive Office" ,
"IT/IS",
"Production",
"Sales",
"Software Engineering" ),
Male = c(3, 0, 28, 83, 16, 5),
Female = c(6, 1, 22, 126, 15, 6))
df %>% pivot_longer(cols = c("Male", "Female")) %>%
transmute(Department, Gender = as.factor(name), Value = value) %>%
ggplot() +
geom_bar(aes(x = Department, y = Value, fill = Gender), stat = "identity", position = position_dodge(0.9))
And the same with barplot from base R:
df<-data.frame(Department = c("Admin Offices",
"Executive Office" ,
"IT/IS",
"Production",
"Sales",
"Software Engineering" ),
Male = c(3, 0, 28, 83, 16, 5),
Female = c(6, 1, 22, 126, 15, 6))
barplot(t(df[,-1]), beside=T,
names.arg=df$Department,
legend.text=names(df[,-1]))
I am trying to subset a data.table within a function, but subsetting by using !is.na(x) is not working. I know it could work, because as I was building my example on a still simpler problem, the subset call worked fine.
library(data.table)
library(ggpubr)
tj = as.data.table(cbind(Name = c("Tom", "Tom", "Tim", "Jerry", NA, "Jerry", "Tim", NA),
var1 = c(12, 12, 20, 30, 31, 21, 21, 31),
var2 = c(12, 11, 27, 32, 31, 11, 21, 41),
var3 = c(10, 10,11, 13, 12, 12, 11, 10),
time = as.numeric(c(1, 2, 1,1, 1,2,2,2))))
plot.tj<- function(dat = tj, color = NULL) {
name <- names(dat)[2:4] # a factor of names to loop over
for (i in seq_along(name)) {
plotms <- ggline(dat[!is.na(color),], x = "time", y = name[i], color = color)
print(plotms)
}
}
plot.tj(color = "Name")
The expected output are the 3 var graphs, but without the NA group.
The thing is that your variable color is a character, so you must call it with get to subset in your data.table. This works:
plot.tj<- function(dat = tj, color = NULL) {
name <- names(dat)[2:4] # a factor of names to loop over
for (i in seq_along(name)) {
plotms <- ggline(dat[!is.na(get(color)),], x = "time", y = name[i], color = color)
print(plotms)
}
}
plot.tj(color = "Name")
I have a linear mixed effects model that determines change in grass based on both the previous year's grass and several environmental variables (and their interaction) at different distinct sites over time.
Using this mixed effects model and established, projected future environmental variables, I want to predict change in grass density. Each year's prediction thus depends on the previous year's density, located on the row above it in my dataframe. We begin with a real value from the present year, and then predict into the future.
library(tidyverse); library(lme4)
#data we have from the past, where each site has annual ChlA/Sal/Temp as well as grass density. our formula, change.mod, predicts grass.change, based on these env variables AND last year's grass coverage (grass.y1)
ThePast = tibble(
year = c(2017, 2018, 2019, 2020, 2021, 2017, 2018, 2019, 2020, 2021,2017, 2018, 2019, 2020, 2021),
site = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C"),
ChlA = c(50, 210, 190, 101, 45, 20, 20, 80, 5, 40, 25, 12, 11, 5, 20),
Sal= c(1, 4, 5, 0.1, 10, 18, 14, 17, 10, 21, 30, 28, 25, 20, 22),
Temp = c(28, 21, 24, 25, 22, 19, 20, 17, 18, 15, 18, 16, 19, 20, 20),
grass = c(.5, .3, .1, .4, .1, .25, .33, .43, .44, .08, .75, .54, .69, .4, .6)) %>%
group_by(site) %>%
mutate(grass.y1 = lag(grass, order_by = year)) %>% #last year's grass
mutate(grass.change = grass - grass.y1) %>% #calculate change
ungroup()
#the ME model
change.mod = lmer(grass.change ~ grass.y1 + log10(ChlA) + log10(Sal) + grass.y1:log10(Temp) + grass.y1:log10(Sal) + (1|site), data = ThePast)
#Future environmental data per site per year, to be used to predict grass.
TheDistantFuture <- tibble(
year = c(2022, 2022, 2022, 2023, 2023, 2023, 2024, 2024, 2024),
site = c( "A", "B", "C","A", "B", "C", "A", "B", "C"),
ChlA = c(40, 200, 10, 95, 10, 4, 149, 10, 15),
Sal= c(12, 11, 15, 16, 21, 32, 21, 21, 22),
Temp = c(24, 22, 26, 28, 29, 32, 31, 20, 18))
#The final dataframe should look like this, where both of the grass columns are predicted out into the future. could have the grass.y1 column in here if we wanted
PredictedFuture <- tibble(
year = c(2022, 2022, 2022, 2023, 2023, 2023, 2024, 2024, 2024),
site = c( "A", "B", "C","A", "B", "C", "A", "B", "C"),
ChlA = c(40, 200, 10, 95, 10, 4, 149, 10, 15),
Sal= c(12, 11, 15, 16, 21, 32, 21, 21, 22),
Temp = c(24, 22, 26, 28, 29, 32, 31, 20, 18),
grass = c(0.237, 0.335, 0.457, 0.700, 0.151, 0.361, 0.176, 0.380, 0.684),
grass.change = c(0.1368, 0.2550, -0.1425, -0.1669, -0.18368, -0.0962, 0.106, 0.229, 0.323 ))
Right now, I can generate the next year's (2022) correct predictions using group_by() and predict(), referencing last year's grass density with a lag function.
#How do we get to PredictedFuture?? Here is what I'm trying:
FutureIsNow = ThePast %>%
filter(year == 2021) %>% #take last year of real data to have baseline starting grass density
bind_rows(TheDistantFuture) %>% #bind future data
arrange(site, year) %>% #arrange by site then year
group_by(site) %>% #maybe this should be rowwise?
mutate(grass.change = predict(change.mod, newdata = data.frame(
grass.y1 = lag(grass, n = 1, order_by = year),
ChlA = ChlA, Sal = Sal, Temp = Temp, site = site))) %>% #this correctly predicts 2022 grass change
mutate(grass = grass.change + lag(grass, n = 1)) #this also works to calculate grass in 2022
This df looks like this:
> FutureIsNow
# A tibble: 12 × 7
# Groups: site [3]
year site ChlA Sal Temp grass grass.change
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021 A 45 10 22 NA NA
2 2022 A 40 12 24 0.237 0.137
3 2023 A 95 16 28 NA NA
4 2024 A 149 21 31 NA NA
5 2021 B 40 21 15 NA NA
6 2022 B 200 11 22 0.335 0.255
7 2023 B 10 21 29 NA NA
8 2024 B 10 21 20 NA NA
9 2021 C 20 22 20 NA NA
10 2022 C 10 15 26 0.457 -0.143
11 2023 C 4 32 32 NA NA
12 2024 C 15 22 18 NA NA
Close, but not really repeatable...
Any ideas for predicting grass change for 2023, 2024, down the rows? I prefer working in tidyverse, though it may be possible to solve this more easily with nested for loops. Potential solutions include a rowwise data structure, or maybe to nest_by(station), but I don't know how to then reference the grass.y1 column. Maybe the solution could be via a rolling prediction with rollify, but I am not sure!
Thank you in advance for your help! Long time reader, first time asker!
So, let's go with a simpler example here for a reprex to show how purrr::accumulate2() can work for you here.
Let's setup a discrete time population model where there is also some covariate that affects time
$N_t = 1.5N_{t-1} + C$
Simple! Heck, we can even use accumulate2 to simulate a population, and then add some noise.
library(tidyverse)
# ok, let's make a population from a simple discrete time growth model
# but, with a covariate!
covariate <- runif(5, 5, 10)
# use accumulate2 with the covariate to generate a population timeseries
pop <- accumulate2(1:5,covariate, ~.x*1.5 + .y, .init = 0) %>% unlist()
pop <- pop[-1]
pop_obs <- rnorm(5, pop, 1) #add some noise
Great! Now, turn it into data and fit a model
# the data ####
dat <- tibble(
time = 1:5,
covariate = covariate,
pop_obs = pop_obs,
lag_pop = lag(pop_obs)
)
# the model ####
mod <- lm(pop_obs ~ covariate + lag_pop, data = dat)
# does this look reasonable?
coef(mod)
My coefficients looked reasonable, but, set a seed and see!
Now we will need some data we want to simulate for - new covariates, but, we will need to incorporate the lag.
# now, simulation data ####
simdat <- tibble(
time = 6:10,
covariate = runif(5, 15,20),
lag_pop = dat$pop_obs[5] #the last lagged value!
)
Great! To make this work, we'll need a function that takes arguments of the lagged value and covariate and runs a prediction. Note, here the second argument is just a numeric. But, you could pass an element of a list - a row of a data frame, if you will. This might be accomplished later with some rowwise nesting or somesuch. For you to work out!
# OK, now we need to get predictions for pop at each step in time! ####
sim_pred <- function(lag_pop, covariate){
newdat <- tibble(covariate = covariate,
lag_pop = lag_pop)
predict(mod, newdata = newdat)
}
With this in hand, we can simulate forward using lag_pop to generate a new population. Note, we'll need to use .init to make sure our first value is correct as well as strip off the final value (I think...might want to check that)
# and let her rip!
# note, we have to init with the first value and
# for multiple covariates, make a rowwise list -
# each element of the list is
# one row of the data and the sim_pred function takes it apart
simdat %>%
mutate(pop = accumulate2(lag_pop,
covariate,
~sim_pred(.x, .y),
.init = lag_pop[1]) %>% `[`(-1) %>% unlist())
That should do!
I am looking to get a bar graph of medals in R. I have 3 distinct columns (gold, silver, bronze). The columns for gold medals has a total of 8, the silver has 10, and the bronze has 13.
For the code, I started writing: ggplot(data, aes(x=?)) + geom_bar()
I am not sure how to write all 3 gold medals on the function where it shows x=?
Thanks
For plotting purposes, it is "easier" to work with long data instead of wide. Below I converted the data you mentioned in your comment to long and plotted the data as a grouped bar.
library(tidyverse)
# load data
raw_data <- structure(list(Rank = c(1, 2, 3, 4, 5, 6),
`Team/Noc` = c("United States of America", "People's Republic of China", "Japan", "Great Britain", "ROC", "Australia"),
Gold = c(39, 38, 27, 22, 20, 17),
Silver = c(41,32, 14, 21, 28, 7),
Bronze = c(33, 18, 17, 22, 23, 22),
Total = c(113, 88, 58, 65, 71, 46),
`Rank by Total` = c(1, 2, 5, 4, 3, 6)),
row.names = c(NA,-6L),
class = c("tbl_df", "tbl", "data.frame"))
# convert wide data to long
long_data <- raw_data %>%
pivot_longer(cols = -`Team/Noc`, names_to = 'Medal') %>% # convert wide data to long format
filter(Medal %in% c("Gold", "Silver", "Bronze")) # only select medal columns
# plot
ggplot(long_data) +
geom_col(aes(x = `Team/Noc`,
y = value,
fill = Medal),
position = "dodge" # grouped bars
)
Hope this gets you started!
This question already has an answer here:
Making line plot with discrete x-axis in ggplot2
(1 answer)
Closed 3 years ago.
I have this data:
samp
date_block_num sales
<dbl> <dbl>
1 0 131479
2 1 128090
3 2 147142
4 3 107190
5 4 106970
6 5 125381
7 6 116966
8 7 125291
9 8 133332
10 9 127541
# ... with 25 more rows
date_block_num represents a month. I want to plot the date in a time series fashion.
If I use this code, date_block_num will be plotted as a continuous variable (0, 10, 20, etc.) but it should be discrete (1,2,3, etc.).
samp %>%
ggplot(aes(date_block_num, sales)) +
geom_line()
If I use this:
samp %>%
ggplot(aes(as.factor(date_block_num), sales)) +
geom_line()
or
samp %>%
ggplot(aes(date_block_num, sales)) +
geom_line(aes(group = date_block num)
I get:
geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
Any Idea how to fix this?
dput(samp)
structure(list(date_block_num = c(0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32, 33, 34), sales = c(131479, 128090,
147142, 107190, 106970, 125381, 116966, 125291, 133332, 127541,
130009, 183342, 116899, 109687, 115297, 96556, 97790, 97429,
91280, 102721, 99208, 107422, 117845, 168755, 110971, 84198,
82014, 77827, 72295, 64114, 63187, 66079, 72843, 71056, 0)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -35L))
You should be able to specify the x-axis labels using scale_x_continuous.
samp %>%
ggplot(aes(date_block_num, sales)) +
geom_line() +
scale_x_continuous(breaks = samp $date_block_num)