The data I'm trying to visualize has two assessments performed on the same time scale, but at different intervals (i.e. Temperature taken 4 times over 12 hours, pain assessed every hour), as an example:
df <- data.frame(
Hour = 0:12,
Pain = sample(7:10, 13, TRUE),
Temp = c(36.8,rep(NA,3),37.2,rep(NA,3),37.4,rep(NA,3),37.0)
)
In ggplot, I'd visualize it like this:
library(ggplot2)
ggplot(df) +
geom_col(aes(x = Hour, y = Pain)) +
geom_point(aes(x = Hour, y = Temp/3)) +
geom_line(data = df[!is.na(df$Temp), ], aes(x = Hour, y = Temp/3)) +
scale_y_continuous(sec.axis = sec_axis(~.*3,name = "Temp"))
In echarts4r however, I cannot get my line to be continuous (I believe because of the NA values)
library(echarts4r)
e_chart(df, Hour) |> e_bar(Pain) |> e_line(Temp)
Is there a way to subset the dataset before e_line to remove the missing values - I've searched online and can't seem to find anything? Or should I be structuring my data differently?
Related
Here the problem:
I have to plot temporal series (9 years of non-continuous data). In the data frame that I am using there is a row with N.A. which separates every year. My aim is to plot this series with a smoothing line.
When I plot the temporal series, with the smooth I use this code:
df$Date <- as.Date(df$Date,"%Y/%m/%d")
ggplot(df, aes(x = Date , y = antSO4 )) +
geom_line(color="gray40")+
scale_y_continuous(expand = c(0, 0), limits = c(0, 3000)) +
scale_x_date(breaks="year", labels=date_format("%Y")) +
geom_smooth(aes(x= Date, y=antSO4), method = lm, formula = y ~ splines::bs(x, 3), se = FALSE, colour = 'red')
I obtain this:
as you can see the are some lines which connected two different yearly series, but not all the series. To eliminate this problem I group the dataset by years, with:
ggplot(df, aes(x = Date , group = year(Date), y = antSO4))
This is the plot that I obtain.
Grouping data eliminates the connections but at the same time, correctly, the smooth is calculated year per year, and not on the complete dataset.
I am quite new in the use of R, so I searched in the previous posts but I saw that many problem are connected with the presence of separation rows with N.A., and generally, the connection are between every series.
Thanks in advance for any kind of help!
I would like to plot a number of symmetric bars like these two, in which the width of the bar corresponds to the relative abundance of the variable through time. I could not find anything similar in R; any help is appreciated.
Are you looking for a violin plot?
As per your comment, the violin plot is not what you are after.
There are two approximate solutions, neither of them ideal but they get you a bit further:
library(dplyr)
library(tibble)
library(ggplot2)
set.seed(123)
data <- tibble(
Date = seq.Date(from = as.Date("2020/01/01"), length = 50, by = "day"),
Value = runif(50, min = 0, max = 10)
)
data <- data %>%
mutate(Value_plus = Value,
Value_min = -Value)
p <- ggplot(data = data, aes(fill = "red")) +
geom_step(aes(x = Date, y = Value_plus)) +
geom_step(aes(x = Date, y = Value_min))
p
p <- ggplot(data = data, ) +
geom_ribbon(aes(x = Date, ymin = Value_min, ymax = Value_plus))
p
The first plot has the steps that you suggest in your example but a fill for geom_step appears non-trivial. The second plot, using geom_ribbon gives you a fill but not the steps. There are several examples of solutions (e.g. here) on how to get to a filled step plot.
Using geom_step:
Using geom_ribbon:
I have a time series of monthly data for 10 years:
myts <- ts(rnorm(12*10), frequency = 12, start = 2001)
Now, I'd like to plot the data but with the x-axis restricted to a range/ticks from Jan - Dec (generic year). Thus, the whole time series should be broken in ten lines where each line starts at Jan and ends at Dec. So multiple lines should be overplotted each other which I'd like to use to visually compare different years. Is there a straight forward command to do that in R?
So far I came up with following solution using matplot which might not be the most sophisticated one:
mydf <- as.data.frame(matrix(myts, 12))
matplot(mydf,type="l")
Or even better would be a way to calculate an average value and the corresponding CI/standard deviation for each month and plot then the average from Jan - Dec as a line and the corresponding CI/standard deviation as a band around the line for the average.
Consider using ggplot2.
library(ggplot2)
library(ggfortify)
d <- fortify(myts)
d$year <- format(d$Index, "%Y")
d$month <- format(d$Index, "%m")
It's useful to start by reshaping the ts object into a long dataframe. Given the dataframe, it's straightforward to create the plots you have in mind:
ggplot(d, aes(x = month, y = Data, group = year, colour = year)) +
geom_line()
ggplot(d, aes(x = month, y = Data, group = month)) +
stat_summary(fun.data = mean_se, fun.args = list(mult = 1.96))
Result:
You can also summarise the data yourself, then plot it:
d_sum <- do.call(rbind, (lapply(split(d$Data, d$month), mean_se, mult = 1.96)))
d_sum$month <- rownames(d_sum)
ggplot(d_sum, aes(x = month, y = y, ymin = ymin, ymax = ymax)) +
geom_errorbar() +
geom_point() +
geom_line(aes(x = as.numeric(month)))
Result:
I need help for a R graphic issue with ggplot2.
Lets take an example :
date <- c("oct", "dec")
min.national <- c(17, 20)
min.international <- c(11, 12)
min.roaming <- c(5, 7)
mb.national <- c(115, 150)
mb.international <- c(72, 75)
mb.roaming <- c(30, 40)
df <- data.frame(min.national, min.international, min.roaming, mb.national, mb.international, mb.roaming)
What I want is to have two graphic one for the minutes and one for the megabytes sideline. And to get bars for the three variable (for the minutes in national, international and roaming for example) on the same graphic with fill = date ?
Is it clear for you ?
Thanks
I appreciate there may be a language challenge here, and it sounds like you're just getting started with ggplot2 so not sure how to get started on this, so I hope you find this useful.
It makes sense to treat the minutes and mb separately; they're different units. So I'll just use the minutes as an example. What I understand you're trying to achieve is easy with the right approach and the tidyr library.
library(tidyr)
library(ggplot2)
#first get your data in a data frame
min.df <- data.frame(national = min.national, international = min.international, roaming = min.roaming, month = date)
#now use the tidyr function to create a long data frame, you should recognize that this gives you a data structure readily suited to what you want to plot
min.df.long <- gather(min.df, "region", "minutes", 1:3)
ggplot(min.df.long) + geom_bar(aes(x = region, y = minutes, fill = month), stat = "identity")
If you want the months side by side, as I understand your question, then you could do:
ggplot(min.df.long) + geom_bar(aes(x = region, y = minutes, fill = factor(month, levels = c("oct", "dec"))), position = "dodge", stat = "identity") + labs(fill = "month")
The key parameter is the position keyword, the rest is just to make it neater.
df <- data.frame(date, min.national, min.international, min.roaming, mb.national, mb.international, mb.roaming)
df.stk <- tidyr::separate(melt(df), col="variable", into=c("min_byte", "type"), sep="\\.")
plt <- ggplot(df.stk, aes(type, value, fill = date)) +
geom_bar(stat = "identity") +
facet_grid(.~min_byte)
print(plt)
I am investigating a dataset with loan information from Prosper, specifically investor behavior.
The plot I would like to create would show investors on the y axis, and time on the x axis, binned to the average month. This would also be faceted by a Credit Grade. Ultimately, I would like each bin to show what percentage of total investors were allocated to each Credit Grade (the facet variable), per calculated month (or actual month, but calculated seems easier for binning).
I have tried ..density.., ..count../sum(..count..), geom_density, etc and seen plenty of posts that will sum each facet to 1 or the entire plot to 1. To re-iterate I am trying to sum each bin, among all the facets, to 1. I was also hoping to do this directly in ggplot, rather than alter the dataframe, but I'll take what I can get.
The following code shows two ways to display the investor counts (count per bin and percentage of entire plot per bin):
t1 <- ggplot(data = loans, aes(x=as.POSIXct(strptime(LoanOriginationDate, '%Y-%m-%d %H:%M:%S')))) +
geom_histogram(binwidth = 60*60*24*30.4375, aes(y = ..count../sum(..count..), group = Investors)) +
facet_wrap(~ProsperCreditGrade) +
scale_y_continuous()
t2 <- ggplot(loans,aes(x=as.POSIXct(strptime(LoanOriginationDate, '%Y-%m-%d %H:%M:%S')),fill=ProsperCreditGrade))+
geom_histogram(aes(y=2629800* ..count../sum(..count..)),
alpha=1,position='identity',binwidth=2629800) +
facet_wrap(~ProsperCreditGrade) +
stat_bin(aes(y = ..density..))
grid.arrange(t1,t2,ncol=1)
As you can see in the plot, total investors went up quite a bit toward the end of the time covered in the dataset. This does not show relative investment behavior over a given time, which is what I am trying to investigate.
What else can I try?
With help from Stephen of Udacity.com and dplyr, the final code is as follows:
loans$month <- month(as.POSIXct((round(as.numeric(as.POSIXct(loans$LoanOriginationDate))/2629800)*2629800), origin = "1969-12-31 19:00:00"))
loans$year <- year(as.POSIXct((round(as.numeric(as.POSIXct(loans$LoanOriginationDate))/2629800)*2629800), origin = "1969-12-31 19:00:00"))
loans$calculatedMonth <- ((loans$year-2005)*12)+loans$month
loanInvestors <- loans %>% group_by(calculatedMonth, ProsperCreditGrade) %>% summarise (n = n()) %>% mutate(proportion = n / sum(n))
ggplot(data = loanInvestors, aes(x = calculatedMonth, y = proportion, fill = proportion, width = 3)) +
geom_bar(stat = "identity") + facet_wrap(~ProsperCreditGrade) +
scale_y_sqrt() + geom_smooth(color = "red") +
scale_fill_gradient()
Investors per quarter by Credit Grade