GTrendsR + ggplot2? - r

I want to generate a plot of interest over time using GTrendsR and ggplot2
The plot I want (generated with google trends) is this:
Any help will be much appreciated.
Thanks!
This is the best I was able to get:
library(ggplot2)
library(devtools)
library(GTrendsR)
usr = "my.email"
psw = "my.password"
ch = gConnect(usr, psw)
location = "all"
query = "MOOCs"
MOOCs_trends = gTrends(ch, geo = location, query = query)
MOOCs<-MOOCs_trends[[1]]
MOOCs$moocs<-as.numeric(as.character(MOOCs$moocs))
MOOCs$Week <- as.character(MOOCs$Week)
MOOCs$start <- as.Date(MOOCs$Week)
ggplot(MOOCs[MOOCs$moocs!=0,], aes(start, moocs)) +
geom_line(colour = "blue") +
ylab("Trends") + xlab("") + theme_bw()
I think that to match the graph generated by google I would need to aggregate the data to months instead of weeks... not sure how to do that yet

The object returned by gtrendsR is a list, of which the trend element in a data.frame that you would want to plot.
usr = "my.email"
psw = "my.password"
gconnect(usr, psw)
MOOCs_trends = gtrends('MOOCs')
MOOCsDF <- MOOCs_trends$trend
ggplot(data = MOOCsDF) + geom_line(aes(x=start, y=moocs))
This gives:
Now if you want to aggregate by month, I would suggest using the floor_date function from the lubridate package, in combination with dplyr (note that I am using the chain operator %>% which dplyr re-exports from the magrittr package).
usr = "my.email"
psw = "my.password"
gconnect(usr, psw)
MOOCs_trends = gtrends('MOOCs')
MOOCsDF <- MOOCs_trends
MOOCsDF$start <- floor_date(MOOCsDF$start, unit = 'month')
MOOCsDF %>%
group_by(start) %>%
summarise(moocs = sum(moocs)) %>%
ggplot() + geom_line(aes(x=start, y=moocs))
This gives:
Note 1: The query MOOCs was changed to moocs, by gtrendsR, this is reflected in the y variable that you're plotting.
Note 2: some of the cases of functions have changed (e.g. gtrendsR not GTrendsR), I am using current versions.

This will get you most of the way there. The plot doesn't look quite right, but that's more of a function of the data being a bit different. Here's the necessary conversions to numeric and to dates.
MOOCs<-MOOCs_trends[[1]]
library(ggplot2)
library(plyr)
## Convert to string
MOOCs$Week <- as.character(MOOCs$Week)
MOOCs$moocs <- as.numeric(MOOCs$moocs)
# split the string
MOOCs$start <- unlist(llply(strsplit(MOOCs$Week," - "), function(x) return(x[2])))
MOOCs$start <- as.POSIXlt(MOOCs$start)
ggplot(MOOCs,aes(x=start,y=moocs))+geom_point()+geom_path()
Google might do some smoothing, but this will plot the data you have.

Related

ggplot2 - Barchart ot Histogram in R - plotting more than one variable

So sorry I'm quite new to R and have been trying to do this by myself but have been struggling.
I'm trying to do some sort of barplot or histogram of the tag 'Amateur' over the years 2007 to 2013 to show how it's changed over time.
The data set was downloaded from: https://sexualitics.github.io/ specifically looking at the hamster.csv
Here is some of the initial preprocessing of the data below.
head(xhamster) # Need to change upload_date into a date column, then add new column containing year
xhamster$upload_date<-as.Date(xhamster$upload_date,format="%d/%m/%Y")
xhamster$Year<-year(ymd(xhamster$upload_date)) #Adds new column containing just the year
xhamster$Year<-as.integer(xhamster$Year) # Changing new Year variable into an interger
head(xhamster) # Check changes made correctly
The filter for the years:
Yr2007<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2007")))
Yr2008<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2008")))
Yr2009<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2009")))
Yr2010<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2010")))
Yr2011<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2011")))
Yr2012<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2012")))
Yr2013<-xhamster%>%
filter_at(vars(Year),any_vars(.%in%c("2013")))
For example, I want to create a plot for the tag 'Amateur' in the data. Here is some of the code I have already done:
Amateur<-grep("Amateur",xhamster$channels)
Amateur_2007<-grep("Amateur", Yr2007$channels)
Amateur_2008<-grep("Amateur", Yr2008$channels)
Amateur_2009<-grep("Amateur", Yr2009$channels)
Amateur_2010<-grep("Amateur", Yr2010$channels)
Amateur_2011<-grep("Amateur", Yr2011$channels)
Amateur_2012<-grep("Amateur", Yr2012$channels)
Amateur_2013<-grep("Amateur", Yr2013$channels)
Amateur_2007 <- length(Amateur_2007)
Amateur_2008 <- length(Amateur_2008)
Amateur_2009 <- length(Amateur_2009)
Amateur_2010 <- length(Amateur_2010)
Amateur_2011 <- length(Amateur_2011)
Amateur_2012 <- length(Amateur_2012)
Amateur_2013 <- length(Amateur_2013)
Plot:
Amateur <- cbind(Amateur_2007, Amateur_2008, Amateur_2009,Amateur_2010, Amateur_2011, Amateur_2012, Amateur_2013)
barplot((Amateur),beside=TRUE,col = c("red","orange"),ylim=c(0,90000))
title(main="Usage of 'Amateur' as a tag from 2007 to 2013")
title(xlab="Amateur")
title(ylab="Frequency")
Plot showing amateur tag over the years
However this isn't exactly a great plot. I'm looking for a way to plot using ggplot ideally and to have the names of each bar to be the year rather than 'Amateur_2010' etc. How do I do this?
An even better bonus if I can add 'nb_views' for each year with this tag usage or something like that.
There are lots of ways to approach this, here is how I would tackle it:
library(tidyverse)
library(lubridate)
library(vroom)
xhamster <- vroom("xhamster.csv")
xhamster$upload_date<-as.Date(xhamster$upload_date,format="%d/%m/%Y")
xhamster$Year <- year(ymd(xhamster$upload_date))
xhamster %>%
filter(Year %in% 2007:2013) %>%
filter(grepl("Amateur", channels)) %>%
ggplot(aes(x = Year, y = ..count..)) +
geom_bar() +
scale_x_continuous(breaks = c(2007:2013),
labels = c(2007:2013)) +
ylab(label = "Count") +
xlab(label = "Amateur") +
labs(title = "Usage of 'Amateur' as a tag from 2007 to 2013",
caption = "Data obtained from https://sexualitics.github.io/ under a CC BY-NC-SA 3.0 license") +
theme_minimal(base_size = 14)
As Jared said, there are lots of ways, but I want to solve it with your way, so that you can internalize the solution better.
I just changed your cbind in the plot:
Amateur <- cbind("2007" = Amateur_2007,"2008" = Amateur_2008,"2009" = Amateur_2009, "2010" =Amateur_2010, "2011" = Amateur_2011, "2012" = Amateur_2012, "2013" = Amateur_2013)
As you can see, you can give names to your columns into cbind function like that :)

transform "mFilter" object (list of Time-Series) to plot with ggplot2

I'm working with the hpfilter from the mFilter package and I can't seem to find a simple way to convert the list of Time-Series objects by hpfilter to a format I can use with ggplot2. I realize I can take it all apart and put it back together, but I imagine there's some simple way I have overlooked? I tried the code suggested in the SO discussion R list to data frame. However I couldn't find a way to convert the list of Time-Series objects to a data.frame in any simple way. The final goal is to reproduce the default plot produced by the mFilter object (see below)
Here's some example code
# install.packages(c("mFilter"), dependencies = TRUE)
library(mFilter)
data(unemp)
unemp.hp <- hpfilter(unemp, type=c("lambda"), freq = 1606)
# str(unemp.hp)
class(unemp.hp)
# [1] "mFilter"
plot(unemp.hp)
Hit <Return> to see next plot:
Also, why am I asked to " Hit <Return>" to see the plot?
The plot function calls plot.mFilter which has parameter ask=interactive() and it is set as TRUE for interactive sessions,
you could disable this by ask=FALSE in call for plot
plot(unemp.hp,ask=FALSE)
Data:
library(mFilter)
library(ggplot2)
library(gridExtra)
# library(zoo)
data(unemp)
unemp.hp <- hpfilter(unemp, type=c("lambda"), freq = 1606)
# str(unemp.hp)
class(unemp.hp)
# [1] "mFilter"
plot(unemp.hp,ask=FALSE)
To check for slots of object unemp.hp
names(unemp.hp)
# [1] "cycle" "trend" "fmatrix" "title" "xname" "call" "type" "lambda" "method"
#[10] "x"
The relevant objects are x (the main unemp series) , trend and cycle. All three objects are of class ts, we first convert them to
data.frame using custom function and plot using ggplot and gridExtra (for grid.arrange)
objectList = list(unemp.hp$x,unemp.hp$trend,unemp.hp$cycle)
names(objectList) = c("unemp","trend","cycle")
sapply(objectList,class)
#unemp trend cycle
# "ts" "ts" "ts"
Conversion from ts to data.frame:
fn_ts_to_DF = function(x) {
DF = data.frame(date=zoo::as.Date(time(objectList[[x]])),tseries=as.matrix(objectList[[x]]))
colnames(DF)[2]=names(objectList)[x]
return(DF)
}
DFList=lapply(seq_along(objectList),fn_ts_to_DF)
names(DFList) = c("unemp","trend","cycle")
seriesTrend = merge(DFList$unemp,DFList$trend,by="date")
cycleSeries = DFList$cycle
Plots:
gSeries = ggplot(melt(seriesTrend,"date"),aes(x=date,y=value,color=variable)) + geom_line() +
ggtitle('Hodrick-Prescot Filter for unemp') +
theme(legend.title = element_blank(),legend.justification = c(0.1, 0.8), legend.position = c(0, 1),
legend.direction = "horizontal",legend.background = element_rect(fill="transparent",size=.5, linetype="dotted"))
gCycle = ggplot(cycleSeries,aes(x=date,y=cycle)) + geom_line(color="#619CFF") + ggtitle("Cyclical component (deviations from trend)")
gComb = grid.arrange(gSeries,gCycle,nrow=2)
I tried to use the prior answer, didn't worked for me.
I was getting the trend and cycle from a GDP quarterly series.
This data was a time series, so I did this, and worked for me:
list <- list(gdp_ln$x, gdp_ln$trend, gdp_ln$cycle)
names(list)=c("gdp","trend","cycle")
gdp<- data.frame((sapply(list,c)))
Data:
> dput(gdp_ln)
structure(c(16.0275785360442, 16.0477176062761, 16.0718936895007,
16.0899963371452, 16.0875707712141, 16.0981391378223, 16.0988601288276,
16.1110815092797, 16.1244321329861, 16.1384685077996, 16.1451472350838,
16.148178781735, 16.161163569502, 16.1418894206861, 16.1634877625667,
16.1965372621761, 16.2216815829736, 16.2387677536829, 16.249412380526,
16.2690521777631, 16.2812185880068, 16.2951024427095, 16.2964024092233,
16.3127733881018, 16.3233290487177, 16.3369922768377, 16.3486515031696,
16.3489275708763, 16.3451264371757, 16.3524856433069, 16.3666338513045,
16.3801691039135, 16.3959993202765, 16.4135937981601, 16.4321203154987,
16.4488104165345, 16.4344524213544, 16.4302554348621, 16.4240722287677,
16.425087582257, 16.4350803035092, 16.4507216431126, 16.4670532627455,
16.4985227751756, 16.5094864456079, 16.5352746165004, 16.5504689966469,
16.5594976247513, 16.5754312535087, 16.592641573353, 16.6003340665324,
16.6063100774853, 16.6163655606058, 16.6370227688187, 16.6564363783854,
16.6577160570216, 16.6543595214556, 16.6773721241902, 16.6911082706925,
16.6935398489076, 16.6956102943815, 16.6798673418354, 16.6772670544553,
16.6678707780266, 16.6606889172344, 16.6678398460835, 16.6668473810049,
16.676020524389, 16.6775934319312, 16.6882821147755, 16.6957985899994,
16.7032334217472, 16.6926036544774, 16.7027214366522, 16.7103625977254,
16.7105344224572, 16.7042504851486, 16.7063913529457, 16.7100598555556,
16.6960591147037, 16.686477079594, 16.5740423808036, 16.6181175035946
), .Tsp = c(2000, 2020.5, 4), class = "ts")

plotly not getting geom_text in R / ggplot

I have a ggplot that works fine by its own. But when I try to import it in to the plotly api system, the geom_text seems to not work - everything else works. Can anyone help me?
Here's my R version - R version 3.1.2 (2014-10-31)
and plotly version - 0.5.23
The data that I am using is in file.csv and looks like:
Province,Community,General Shelters,General Beds,Mens Shelters,Mens Beds,Womens Shelters,Womens Beds,Youth Shelters,Youth Beds,Family Shelters,Family Beds,Total Shelters,Total Beds
New Brunswick,Saint John,0,0,1,35,1,10,0,0,0,0,2,45
Quebec,Montréal,7,114,9,916,12,259,17,197,1,7,45,"1,493"
Quebec,Québec City,3,49,2,102,1,12,2,15,0,0,8,178
Ontario,Toronto,4,250,13,"1,483",10,572,10,416,4,496,41,"3,217"
British Columbia,Vancouver,13,545,7,291,9,238,7,90,2,30,38,"1,194"
British Columbia,Victoria,1,84,1,21,1,25,1,10,1,5,5,145
And here's my full code:
library(ggplot2)
library(zoo)
library(DAAG)
library(mapdata) #for canada map from worldhires database
library(ggmap)
library("plotly") # for plotly
homeless <- function()
{
allcit <- NULL
#read csv
allcittmp <- read.csv("file.csv", sep=",", header=TRUE, colClasses="character")
#cast data to proper format from character for both data frames
allcittmp[,1] <- as.character(allcittmp[,1])
allcittmp[,2] <- as.character(allcittmp[,2])
allcittmp[,13] <- as.integer(allcittmp[,13])
allcittmp[,14] <- as.integer(gsub(",","",allcittmp[,14]))
#get only relevant columns to a new data frame
allcit <- allcittmp[,c(1,2,13,14)]
#delete temp data frames for hygiene
allcittmp <- NULL
#give better colnames
colnames(allcit) <- c("prov","community","totshelters","totbeds")
#concatenate col2,1 to get city, province
allcit$hcity <- paste(allcit$community,allcit$prov, sep=", ")
#clean up NA's
allcit <- na.omit(allcit)
plmap <- mapcit3(allcit$hcity, allcit$totshelters, allcit$community)
#the following two lines commented out makes plotly graph
#everything is fine except that the city names don't show up
#py <- plotly()
#py$ggplotly(plmap)
}
mapcit3 <- function(citiesM, indM, cityname)
{
#concatenate Canada to city names, to be safe and not pick up similar US cities:
citiesM <- paste(citiesM,", Canada", sep="")
freqM <- data.frame(citiesM, indM, cityname) #make dataframe
lonlat <- geocode(citiesM) #courtesy of google, logitude, lattitude (gives two var's lon, lat among others)
citiesC <- cbind(freqM,lonlat) #make new df with long/lat
mappts2 <- ggplot(citiesC, aes(lon, lat)) +
borders(regions="canada", name="borders") +
coord_equal() +
geom_point(aes(text=cityname, size=indM), colour="red", alpha=1/2, name="cities", label=citiesC$cityname) +
geom_text(size=2, aes(label=cityname),hjust=0, vjust=0)
return(mappts2)
}
Attached as map1_without_plotly.png is the version without plotly:
And the map with plotly that appears on the plotly site as an API: (yes, the plotly version has more cities, but that is because I stripped down the csv file for stack overflow, so it is easily reproducible)
But basically the plotly version is missing the geom_text (city names) that are in the non-plotly version.
Okay, I spotted several shortcomings in the ggplotly conversion. For now, I can suggest the following workaround:
mappts2 <- ggplot(citiesC, aes(x=lon, y=lat)) +
geom_text(size=10, aes(label=cityname), hjust=0, vjust=0) +
borders(regions="canada", name="borders") +
coord_equal() +
geom_point(aes(text=cityname, size=indM), colour="red", alpha=0.5,
name="cities", label=citiesC$cityname)
# Take a look
mappts2
# Yes, text is too big in ggplot2
first_version <- py$ggplotly(mappts2, kwargs=list(filename="map_text",
fileopt="overwrite"))
# Has the labels, misses the markers
my_account <- "marianne2" # Replace with yours
account_url <- paste0("https://plot.ly/~", my_account, "/")
plot_number <- as.integer(gsub(account_url, "", first_version$response$url))
text_marker <- py$get_figure(my_account, plot_number)
text_marker$data[[1]]$mode
# Says "text"
text_marker$data[[1]]$mode <- "text+markers"
final_version <- py$plotly(text_marker$data,
kwargs=list(layout=text_marker$layout,
fileopt="overwrite",
filename="text_markers_mode"))
# Visit final_version$url
Size conversion is not perfect, hence my replacement of size=2 with size=10.
Unfortunately arguments hjust and vjust are not supported (ignored here).
When geom_text and geom_point are used on the same data, ggplotly should set mode="text+markers", which is not currently the case in the R "plotly" package (version 0.5.25).
read.csv() has defaults header=TRUE, sep="," so you don't need to specify these.
If you have run allcittmp <- read.csv("file.csv", colClasses="character") you don't need to do the
for (i in c(1, 2)) {
allcittmp[, i] <- as.character(allcittmp[, i])
}
because that's precisely what colClasses="character" takes care of.
I'm not too fond of the mapcit3() function, which seems to be doing some processing and then some plotting(?!).

Annotation Track in Gviz

Does anyone with experience using the bioconductor package: Gviz know how to add an AnnotationTrack directly over a DataTrack?
For example, in ggplot2 I can add to a prexitsting plot using + geom_text, but I haven't been able to locate a similar feature for Gviz
Thanks!
Although it's not exactly what you want, one possible solution is to add a HighlightTrack that covers your region of interest. Although this won't specifically label / add the key elements on to your DataTrack, it will help highlight the alignment between differing DataTracks.
Example:
library(Gviz)
library(GenomicRanges)
data(geneModels)
data(cpgIslands)
gen <- genome(cpgIslands)
chr <- 1
start <- 120005434
end <- 129695434
itrack <- IdeogramTrack(genome = gen, chromosome = chr)
gtrack <- GenomeAxisTrack()
grtrack <- GeneRegionTrack(geneModels, genome = gen, chromosome = chr, name = "foo")
htrack <- HighlightTrack(trackList = list(gtrack, grtrack), start = 121535434, end = 124535434, chromosome = chr)
plotTracks(list(itrack, htrack), from = start, to = end)
Graphical output
Although the grtrack is empty, it demonstrates how the HighlightTrack will span the specified DataTracks (in this case, grtrack and gtrack).
See the GViz documentation for more on info.

Making simple R GUI with tcltk package

I'm trying to make very simple GUI for my script. In nutshell problem looks like that :
dataset is dataframe, I would like to plot one column as the time and use simple GUI for choosing next/previus column.
dataset <-data.frame(rnorm(10), rnorm(10), rnorm(10))
columnPlot <- function(dataset, i){
plot(dataset[, i])
}
how to use tcltk for calling fplot with different i's ?
Not what you asked for (not tcltkrelated), but I would advise you to have a look at the new shiny package from RStudio.
Are you particularly attached to the idea of using tcltk? I've been working on something similar using the gWidgets package and have had some success. According to it's CRAN site, "gWidgets provides a toolkit-independent API for building interactive GUIs". This package uses tcltk or GTK2 and I've been using the GTK2 portion. Here's a quick example of a GUI with a spinbutton for changing i. I also added a little fanciness to your function because you mentioned you would be plotting time series, so I made the x axis Time.
data<-data.frame(rnorm(11),rnorm(11),rnorm(11))
i = 1
fplot <- function(i, data = data){
library(ggplot2)
TimeStart <- as.Date('1/1/2012', format = '%m/%d/%Y')
plotdat <- data.frame(Value = data[ ,i], Time = seq(TimeStart,TimeStart + nrow(data) - 1, by = 1))
myplot <- ggplot(plotdat, aes(x = Time, y = Value))+
geom_line()
print(myplot)
}
library(gWidgets)
options(guiToolkit = 'RGtk2')
window <- gwindow ("Time Series Plots", visible = T)
notebook <- gnotebook (cont = window)
group1 <- ggroup(cont = notebook, label = "Choose i", horizontal=F)
ichooser <- gspinbutton(cont = group1, from = 1, to = ncol(data), by = 1, value = i, handler = function(h,...){
i <<- svalue(h$obj)})
plotbutton <- gbutton('Plot', cont = group1, handler=function(h,...){
fplot(i, data)})
graphicspane1 <- ggraphics(cont = group1)

Resources