I am trying to use spplot to visualize plots from different months. I'd like to change this figure so the same months are in the same columns to easily compare. I would like to push May 2016 5 panels in, so all the rest of the months are in line. I hope this makes sense.
click here for figure
I have missing data for Dec2017 for now which is why it's blacked out.
Here is my code:
stack_months <- stack(May2016, June2016, July2016, Aug2016, Sep2016, Oct2016, Nov2016, Dec2016, January2017, Febuary2017, March2017, April2017, May2017, June2017, July2017, July2017, Aug2017, Sep2017, Oct2017, Nov2017, Dec2017, January2018, Febuary2018, March2017, April2018, May2018, June2017, July2017, July2018, Aug2018, Sep2018, Oct2018, Nov2018, Dec2018, January2019, Febuary2019, March2019, April2019, May2019, June2019, July2019, July2019)
spplot(stack_months, col.regions=viridis(20), names.attr = c("May2016", "June2016", "July2016", "Aug2016", "Sep2016", "Oct2016", "Nov2016", "Dec2016",
"Jan2017", "Feb2017", "March2017", "April2017", "May2017", "June2017", "July2017", "July2017", "Aug2017", "Sep2017", "Oct2017", "Nov2017", "Dec2017",
"Jan2018", "Feb2018", "March2017", "April2018", "May2018", "June2017", "July2017", "July2018", "Aug2018", "Sep2018", "Oct2018", "Nov2018", "Dec2018",
"Jan2019", "Feb2019", "March2019", "April2019", "May2019", "June2019", "July2019", "July2019"), layout = c(12,4))
Is there an easy way to manipulate the panels?
Note that you have type some of the months twice, for example July2017 appeared 3 times and March2017, 2 times, June2017 2x and July2019 2x.
What I have below are complete months from May2016 to July2019, so that when you plot, the months will align.
library(raster)
library(sp)
library(viridis)
library(lattice)
months=c("May2016", "June2016", "July2016", "Aug2016", "Sep2016", "Oct2016",
"Nov2016", "Dec2016", "Jan2017", "Feb2017", "March2017", "April2017",
"May2017", "June2017", "July2017","Aug2017", "Sep2017",
"Oct2017", "Nov2017", "Dec2017","Jan2018", "Feb2018", "March2018",
"April2018", "May2018", "Jun2018", "July2018", "Aug2018",
"Sep2018", "Oct2018", "Nov2018", "Dec2018","Jan2019", "Feb2019",
"March2019", "April2019", "May2019", "June2019", "July2019")
I don't have your data, so I simulate something for the image:
r <- raster(system.file("external/test.grd", package="raster"))
stack_months = do.call(stack,lapply(months,function(i)runif(1)*r))
You defined layout to be 12,4 so you will have 48 entries which are filled by row. In your case, the first 4 will not be plotted and the last 5 will not be plotted:
SKIP = rep(FALSE,12*4)
SKIP[1:4] = TRUE
SKIP[44:48] = TRUE
Then we plot using the SKIP above:
spplot(stack_months, col.regions=viridis(20),
layout = c(12,4),
strip = strip.custom(par.strip.text = list(cex = 0.65)),
names.attr = months,
skip=SKIP
)
Related
I wrote a function to generate multiple graphs in plotly. In each of those graph, I am only adding annotations for last data point.
To plot all those graphs at once, I am using subplot function.
This however shows some extra arrows on the graph. I'm not sure what I am doing wrong, where they are coming from, or how do I turn them off.
(Turning them white wouldn't be a solution as they are also problamatic in the sense that their position stays relatively constant if eg Y axis is formatted as %- they just dwarf everything)
Really appreciate some assistance with this.
library(plotly)
library(tibble)
library(dplyr)
# A function to generate plots in the required format
plotbundlefunction<-function(data1,ttitle){
mypalette <- c("#4E79A7","#F28E2B","#E15759","#76B7B2","#59A14F","#EDC948","#B07AA1","#FF9DA7","#9C755F","BAB0AC") %>% head(ncol(data1)-1)
lineannot<-c()
for(i in 2:ncol(data1)){
lineannot[[i]]<-list(x = tail(na.omit(data1 %>% select('ID',i)),n=1L)[['ID']], y = tail(na.omit(data1[[i]]),n=1L), text = tail(na.omit(data1[[i]]),n=1L),
font=list(color=mypalette[i-1]),xanchor = "left", bgcolor="#D4D8DF", showarrow = F)
}
p <- plot_ly()
for(i in 2:ncol(data1)){
p<-add_trace(p,x=data1[['ID']],y=data1[[i]],name=colnames(data1)[i], type='scatter', mode='lines')
}
p %>% layout(colorway=mypalette, annotations = lineannot) %>% return()
}
# Numerous dataframe representing snapshot at a point in time for same data characteristics
dflist<-list(
KPI1 = data.frame(ID=c(1,2,3,4,5), Japan=c(100,98,97,95,94), Korea = c(100,97,94,91,87) , Laos=c(100,97,94,90,84)),
KPI2 = data.frame(ID=c(1,2,3,4,5), Japan=c(5,7,8,9,3) , Korea = c(6,8,7,9,5) , Laos=c(7,5,5,2,1)),
KPI3 = data.frame(ID=c(1,2,3,4,5), Japan=c(78,89,56,48,92) , Korea = c(42,49,85,99,72) , Laos=c(78,58,88,87,68))
)
#Iterate over a function that generates a separate graph for each columns across dataframes
mainplotset<-lapply(1:length(dflist),function(s){
plotbundlefunction(dflist[[names(dflist)[s]]],names(dflist)[s])
})
#Do a subplot to show all results
subplot(mainplotset,nrows = 1,margin=0.05)
Update based on your comment
As I pointed out in my comment after I already posted this answer, the actual solution is the revise the for statement that creates the annnotations. Instead of
for(i in 2:ncol(data1)){
lineannot[[i]] <- ...
It should be
for(i in 2:ncol(data1)){
lineannot[[i - 1]] <- ...
On to how I found the arrows...
I meant to include how I came up with the annotations traces, sorry about leaving that out!
I can't think of any way an arrow can get into a plot without annotations, so I knew where to start. So first, I set the subplot to an object and looked at whether showarrow was set to TRUE or FALSE.
plt <- subplot(mainplotset,nrows = 1,margin=0.05)
invisible(lapply(
1:length(plt$x$layout$annotations),
function(k) {
res <- plt$x$layout$annotations[[k]]$showarrow
message("arrow? ", k, " ", res)
}
))
The default for annotations is showarrow = TRUE, so that's why your plots were returned with arrows.
Original answer
Such an odd error! I'm not sure how to prevent this error. (I'm still trying to figure that out.) In the meantime, I thought I could give you a way to fix it.
I used lapply to find out what traces were creating these arrows.
plt <- subplot(mainplotset, nrows = 1, margin = 0.05)
# arrow? 1
# arrow? 2 FALSE
# arrow? 3 FALSE
# arrow? 4 FALSE
# arrow? 5
# arrow? 6 FALSE
# arrow? 7 FALSE
# arrow? 8 FALSE
# arrow? 9
# arrow? 10 FALSE
# arrow? 11 FALSE
# arrow? 12 FALSE
When I looked at the traces that didn't indicate true or false, there was nothing in the traces except xref and yref.
To remove them:
plt$x$layout$annotations <- plt$x$layout$annotations[c(-1, -5, -9)]
plt
I have 11 plots and used a looping function to plot them see my code below. However, I can't get them to fit in just 1 page or less. The plots are actually too big. I am using R software and writing my work in RMarkdown. I have spent almost an entire week trying to resolve this.
group_by(Firm_category) %>%
doo(
~ggboxplot(
data =., x = "Means.type", y = "means",
fill ="grey", palette = "npg", legend = "none",
ggtheme = theme_pubr()
),
result = "plots"
)
graph3
# Add statistical tests to each corresponding plot
Firm_category <- graph3$Firm_category
xx <- for(i in 1:length(Firm_category)){
graph3.i <- graph3$plots[[i]] +
labs(title = Firm_category[i]) +
stat_pvalue_manual(stat.test[i, ], label = "p.adj.signif")
print(graph3.i)
}
#output3.long data sample below as comments
#Firm_category billmonth Means.type means
#Agric 1 Before 38.4444
#Agric 1 After 51.9
Complete data is on my github: https://github.com/Fridahnyakundi/Descriptives-in-R/blob/master/Output3.csv
This code prints all the graphs but in like 4 pages. I want to group them into a grid. I have tried to add all these codes below just before my last curly bracket and none is working, please help me out.
library(cowplot)
print(plot_grid(plotlist = graph3.i[1:11], nrow = 4, ncol = 3))
library(ggpubr)
print(ggarrange(graph3.i[1:11], nrow = 4, ncol = 3))
I tried the gridExtra command as well (they all seem to do the same thing). I am the one with a mistake and I guess it has to do with my list. I read a lot of similar work here, some suggested
dev.new()
dev.off()
I still didn't get what they do. But adding either of them caused my code to stop.
I tried defining my 'for' loop function say call it 'XX', then later call it to make a list of graph but it returned NULL output.
I have tried defining an empty list (as I read in some answers here) then counting them to make a list that can be printed but I got so many errors.
I have done this for almost 3 days and will appreciate your help in resolving this.
Thanks!
I tried to complete your code ... and this works (but I don't have your 'stat.test' object). Basically, I added a graph3.i <- list() and replaced graph3.i in the loop ..
Is it what you wanted to do ?
library(magrittr)
library(dplyr)
library(rstatix)
library(ggplot2)
library(ggpubr)
data <- read.csv(url('http://raw.githubusercontent.com/Fridahnyakundi/Descriptives-in-R/master/Output3.csv'))
graph3 <- data %>% group_by(Firm_category) %>%
doo(
~ggboxplot(
data =., x = "Means.type", y = "means",
fill ="grey", palette = "npg", legend = "none",
ggtheme = theme_pubr()
),
result = "plots"
)
graph3
# Add statistical tests to each corresponding plot
graph3.i <- list()
Firm_category <- graph3$Firm_category
xx <- for(i in 1:length(Firm_category)){
graph3.i[[i]] <- graph3$plots[[i]] +
labs(title = Firm_category[i]) # +
# stat_pvalue_manual(stat.test[i, ], label = "p.adj.signif")
print(graph3.i)
}
library(cowplot)
print(plot_grid(plotlist = graph3.i[1:11], nrow = 4, ncol = 3))
So sorry I'm quite new to R and have been trying to do this by myself but have been struggling.
I'm trying to do some sort of barplot or histogram of the tag 'Amateur' over the years 2007 to 2013 to show how it's changed over time.
The data set was downloaded from: https://sexualitics.github.io/ specifically looking at the hamster.csv
Here is some of the initial preprocessing of the data below.
head(xhamster) # Need to change upload_date into a date column, then add new column containing year
xhamster$upload_date<-as.Date(xhamster$upload_date,format="%d/%m/%Y")
xhamster$Year<-year(ymd(xhamster$upload_date)) #Adds new column containing just the year
xhamster$Year<-as.integer(xhamster$Year) # Changing new Year variable into an interger
head(xhamster) # Check changes made correctly
The filter for the years:
Yr2007<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2007")))
Yr2008<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2008")))
Yr2009<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2009")))
Yr2010<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2010")))
Yr2011<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2011")))
Yr2012<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2012")))
Yr2013<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2013")))
For example, I want to create a plot for the tag 'Amateur' in the data. Here is some of the code I have already done:
Amateur<-grep("Amateur",xhamster$channels)
Amateur_2007<-grep("Amateur", Yr2007$channels)
Amateur_2008<-grep("Amateur", Yr2008$channels)
Amateur_2009<-grep("Amateur", Yr2009$channels)
Amateur_2010<-grep("Amateur", Yr2010$channels)
Amateur_2011<-grep("Amateur", Yr2011$channels)
Amateur_2012<-grep("Amateur", Yr2012$channels)
Amateur_2013<-grep("Amateur", Yr2013$channels)
Amateur_2007 <- length(Amateur_2007)
Amateur_2008 <- length(Amateur_2008)
Amateur_2009 <- length(Amateur_2009)
Amateur_2010 <- length(Amateur_2010)
Amateur_2011 <- length(Amateur_2011)
Amateur_2012 <- length(Amateur_2012)
Amateur_2013 <- length(Amateur_2013)
Plot:
Amateur <- cbind(Amateur_2007, Amateur_2008, Amateur_2009,Amateur_2010, Amateur_2011, Amateur_2012, Amateur_2013)
barplot((Amateur),beside=TRUE,col = c("red","orange"),ylim=c(0,90000))
title(main="Usage of 'Amateur' as a tag from 2007 to 2013")
title(xlab="Amateur")
title(ylab="Frequency")
Plot showing amateur tag over the years
However this isn't exactly a great plot. I'm looking for a way to plot using ggplot ideally and to have the names of each bar to be the year rather than 'Amateur_2010' etc. How do I do this?
An even better bonus if I can add 'nb_views' for each year with this tag usage or something like that.
There are lots of ways to approach this, here is how I would tackle it:
library(tidyverse)
library(lubridate)
library(vroom)
xhamster <- vroom("xhamster.csv")
xhamster$upload_date<-as.Date(xhamster$upload_date,format="%d/%m/%Y")
xhamster$Year <- year(ymd(xhamster$upload_date))
xhamster %>%
filter(Year %in% 2007:2013) %>%
filter(grepl("Amateur", channels)) %>%
ggplot(aes(x = Year, y = ..count..)) +
geom_bar() +
scale_x_continuous(breaks = c(2007:2013),
labels = c(2007:2013)) +
ylab(label = "Count") +
xlab(label = "Amateur") +
labs(title = "Usage of 'Amateur' as a tag from 2007 to 2013",
caption = "Data obtained from https://sexualitics.github.io/ under a CC BY-NC-SA 3.0 license") +
theme_minimal(base_size = 14)
As Jared said, there are lots of ways, but I want to solve it with your way, so that you can internalize the solution better.
I just changed your cbind in the plot:
Amateur <- cbind("2007" = Amateur_2007,"2008" = Amateur_2008,"2009" = Amateur_2009, "2010" =Amateur_2010, "2011" = Amateur_2011, "2012" = Amateur_2012, "2013" = Amateur_2013)
As you can see, you can give names to your columns into cbind function like that :)
I want to make a histogram for each column. Each Column has three values (Phase_1_Mean, Phase_2_Mean and Phase_3_Mean)
The output should be:
12 histograms (because we have 12 rows), and per histogram the 3 values showed in a bar (Y axis = value, X axis = Phase_1_Mean, Phase_2_Mean and Phase_3_Mean).
Stuck: When I search the internet, almost everyone is making a "long" data frame. That is not helpful with this example (because than we will generate a value "value". But I want to keep the three "rows" separated.
At the bottom you can find my data. Appreciated!
I tried this (How do I generate a histogram for each column of my table?), but here is the "long table" problem, after that I tried Multiple Plots on 1 page in R, that solved how we can plot multiple graphs on 1 page.
dput(Plots1)
structure(list(`0-0.5` = c(26.952381, 5.455598, 28.32947), `0.5-1` =
c(29.798635,
25.972696, 32.87372), `1-1.5` = c(32.922764, 41.95935, 41.73577
), `1.5-2` = c(31.844156, 69.883117, 52.25974), `2-2.5` = c(52.931034,
128.672414, 55.65517), `2.5-3` = c(40.7, 110.1, 63.1), `3-3.5` =
c(73.466667,
199.533333, 70.93333), `3.5-4` = c(38.428571, 258.571429, 95),
`4-4.5` = c(47.6, 166.5, 233.4), `4.5- 5` = c(60.846154,
371.730769, 74.61538), `5-5.5` = c(7.333333, 499.833333,
51), `5.5-6` = c(51.6, 325.4, 82.4), `6-6.5` = c(69, 411.5,
134)), class = "data.frame", .Names = c("0-0.5", "0.5-1",
"1-1.5", "1.5-2", "2-2.5", "2.5-3", "3-3.5", "3.5-4", "4-4.5",
"4.5- 5", "5-5.5", "5.5-6", "6-6.5"), row.names = c("Phase_1_Mean",
"Phase_2_Mean", "Phase_3_Mean"))
Something which is showed in this example (which didn't worked for me, because it is Python) https://www.google.com/search?rlz=1C1GCEA_enNL765NL765&biw=1366&bih=626&tbm=isch&sa=1&ei=Yqc8XOjMLZDUwQLp9KuYCA&q=multiple+histograms+r&oq=multiple+histograms+r&gs_l=img.3..0i19.4028.7585..7742...1.0..1.412.3355.0j19j1j0j1......0....1..gws-wiz-img.......0j0i67j0i30j0i5i30i19j0i8i30i19j0i5i30j0i8i30j0i30i19.j-1kDXNKZhI#imgrc=L0Lvbn1rplYaEM:
I think you have to reshape to long to make this work, but I don't see why this is a problem. I think this code achieves what you want. Note that there are 13 plots because you have 13 (not 12) columns in the dataframe you posted.
# Load libraries
library(reshape2)
library(ggplot2)
Plots1$ID <- rownames(Plots1) # Add an ID variable
Plots2 <- melt(Plots1) # melt to long format
ggplot(Plots2, aes(y = value, x = ID)) + geom_bar(stat = "identity") + facet_wrap(~variable)
Below is the resulting plot. I've kept it basic, but of course you can make it pretty by adding further layers.
I ran a Pig job on a Hadoop cluster that crunched a bunch of data down into something R can handle to do a cohort analysis. I have the following script, and as of the second to last line I have the data in the format:
> names(data)
[1] "VisitWeek" "ThingAge" "MyMetric"
VisitWeek is a Date. ThingAge and MyMetric are integers.
The data looks like:
2010-02-07 49 12345
The script I have so far is:
# Load ggplot2 for charting
library(ggplot2);
# Our file has headers - column names
data = read.table('weekly_cohorts.tsv',header=TRUE,sep="\t");
# Print the names
names(data)
# Convert to dates
data$VisitWeek = as.Date(data$VisitWeek)
data$ThingCreation = as.Date(data$ThingCreation)
# Fill in the age column
data$ThingAge = as.integer(data$VisitWeek - data$ThingCreation)
# Filter data to thing ages lt 10 weeks (70 days) + a sanity check for gt 0, and drop the creation week column
data = subset(data, data$ThingAge <= 70, c("VisitWeek","ThingAge","MyMetric"))
data = subset(data, data$ThingAge >= 0)
print(ggplot(data, aes(x=VisitWeek, y=MyMetric, fill=ThingAge)) + geom_area())
This last line does not work. I've tried lots of variations, bars, histograms, but as usual R docs defeat me.
I want it to show a standard Excel style stacked area chart - one time series for each ThingAge stacked across the weeks in the x axis, with the date on the y axis. An example of this kind of chart is here: http://upload.wikimedia.org/wikipedia/commons/a/a1/Mk_Zuwanderer.png
I've read the docs here: http://had.co.nz/ggplot2/geom_area.html and http://had.co.nz/ggplot2/geom_histogram.html and this blog http://chartsgraphs.wordpress.com/2008/10/05/r-lattice-plot-beats-excel-stacked-area-trend-chart/ but I can't quite make it work for me.
How can I achieve this?
library(ggplot2)
set.seed(134)
df <- data.frame(
VisitWeek = rep(as.Date(seq(Sys.time(),length.out=5, by="1 day")),3),
ThingAge = rep(1:3, each=5),
MyMetric = sample(100, 15))
ggplot(df, aes(x=VisitWeek, y=MyMetric)) +
geom_area(aes(fill=factor(ThingAge)))
gives me the image below. I suspect your problem lies in correctly specifying the fill mapping for the area plot: fill=factor(ThingAge)
ggplot(data.set, aes(x = Time, y = Value, colour = Type)) +
geom_area(aes(fill = Type), position = 'stack')
you need to give the geom_area a fill element and also stack it (though that might be a default)
found here http://www.mail-archive.com/r-help#r-project.org/msg84857.html
I was able to get my result with this:
I loaded the stackedPlot() function from https://stat.ethz.ch/pipermail/r-help/2005-August/077475.html
The function (not mine, see link) was:
stackedPlot = function(data, time=NULL, col=1:length(data), ...) {
if (is.null(time))
time = 1:length(data[[1]]);
plot(0,0
, xlim = range(time)
, ylim = c(0,max(rowSums(data)))
, t="n"
, ...
);
for (i in length(data):1) {
# Die Summe bis zu aktuellen Spalte
prep.data = rowSums(data[1:i]);
# Das Polygon muss seinen ersten und letzten Punkt auf der Nulllinie haben
prep.y = c(0
, prep.data
, 0
)
prep.x = c(time[1]
, time
, time[length(time)]
)
polygon(prep.x, prep.y
, col=col[i]
, border = NA
);
}
}
Then I reshaped my data to wide format. Then it worked!
wide = reshape(data, idvar="ThingAge", timevar="VisitWeek", direction="wide");
stackedPlot(wide);
Turning integers into factors and using geom_bar rather than geom_area worked for me:
df<-expand.grid(x=1:10,y=1:6)
df<-cbind(df,val=runif(60))
df$fx<-factor(df$x)
df$fy<-factor(df$y)
qplot(fy,val,fill=fx,data=df,geom='bar')