Reprex
install.packages("tidyverse") ## general use and data cleaning
library(tidyverse)
library(purrr)
library(lubridate)
library(here)
library(hms)
library(scales)
library(rstudioapi)
install.packages("ggmap")
library(ggmap)
ggplot(data = bikeshare_v8) +
geom_point(mapping = aes(x = ride_id,
y = ride_length))
Task
Analyzing the divvy bikeshare data for the google data analytics course, finding differences between members and casuals.
Problem
Though previously the data could be plotted without incident, ggplot will no longer work. The code chunk still has the left side green bar, and the spinning progress circle is still present in the Rmarkdown output below the code chunk. While this is happening, RAM use keeps accumulating, but even after minutes nothing displayed and the code does not complete.
Solutions Attempted
running with only tidyverse installed and loaded - no change
running without the project being loaded - just markdown - no change
making sure windows defender is excluded from R files - no change
Why did this work before, but not now?
How can I get ggplot to work again?
Found the solution - wrong datatype
Lubridate was used to convert the time information to the duration datatype, which ggplot did not like.
Converting to numeric datatype using as.numeric solved the issue.
Related
I have been conducting some exercises from OpenIntro statistics to start getting familiar with R and RStudio.
I have completed all the exercises, I run my code in R studio and I get all of the tables and graphs that I have generated without a problem inside RStudio.
However, when it is time to knit the data, I get an error (that I believe I should not be getting given that I was able to run my code in RStudio without any errors and my tables and graphs are generated accurately).
The knitting bugs at exercise 3 where I am told to generate a plot of the proportion of boys that were born over time. Here is a sample of my code (lines 53 to 58)
```{r plot-prop-boys-arbuthnot}
mutate (arbuthnot, boy_ratio = boys / total)
ggplot(data = arbuthnot, aes(x = year, y = boy_ratio)) +
geom_line()
```
However, then I get a big error message that I do not understand. It says that total was not found. I tried defining the total by inserting :
total <- boys + girls
or by inserting :
total <- arbuthnot$boys + arbuthnot$girls
It just does not seem to work no matter what I do. For instance, even if I successfully define the total, it will bug again and give me another error when I need to knit the lab report. Sometimes I switched the way I write the mutate code. For instance, I also used
arbuthnot <- arbuthnot %>%
mutate(boy_ratio = boys / total)
However, even when I use this code in combination with the solutions I tried for defining the total, it still does not work.
I am not sure what to do at this point because the graph is displayed in RStudio. The ratio is accurate, it also shows up in a table that I have generated.
The variable total is in that table. I tried re-starting and re-running all the chunks of code in R. All of my tables and graphs come out perfectly and then when I try to knit my lab report again it bugs at line 54.
I have been trying to solve this for 2 days now and I am not sure what I should do.
I hope the community here will be able to give me a couple of pointers on how to solve this problem :) ! If you need more information or a bit more code let me know :) !
Wishing everyone a wonderful day !
To help others help you, consider making a minimal working example (MWE), for example using the reprex package. Without more details, it is near impossible to know exactly what when wrong.
The error message states that there is no total in the environment and that arbuthnot does not contain a column total, so possibly the latter was created but not assigned. It may be that the variable is in your environment when you run the code interactively and created the column or the variable at some point (using the code you provided). However, note that the script compiles in a new environment from scratch when knitting the .Rmd file, in which case it cannot find the variable and aborts.
To debug your code, consider replacing the code chunk lines 53-58 by a print statement, like head(arbuthnot), to see what comes out in the output file and confirm that the tibble indeed contains total.
Alternatively, debug by running the code chunk by chunk until you get the error message in a new environment. In RStudio, try Ctrl + Shift + F10 (equivalent to Session > Restart R) to clear everything and start afresh.
The following code chunk should work
library(openintro)
library(tidyverse)
data(arbuthnot)
arbuthnot <- arbuthnot %>% # note assignment (write over database)
mutate(total = boys + girls, # define total first
boy_ratio = boys / total)
ggplot(data = arbuthnot,
mapping = aes(x = year, y = boy_ratio)) +
geom_line()
Thank you #lbelzile for the great tips.
In the future, I will use the minimal working example to better inform other contributors on stack overflow. I thought that the evidence I had provided was sufficient.
That being said, thank to the bits of code you sent me, I was able to solve the problem.
Following parts of your instructions, here is the code that worked :
head(arbuthnot)
library(tidyverse)
library(openintro)
data(arbuthnot)
arbuthnot <-arbuthnot %>%
mutate (total = boys + girls, boy_ratio = boys / total)
ggplot(data = arbuthnot, aes(x = year, y = boy_ratio)) +
geom_line()
After inserting this code, the file was able to get stitched and my lab report was generated.
I would like to thank you for taking the time to help me :) !
Have a great week.
I am trying to replicate these:
https://radacad.com/interactive-map-using-r-and-power-bi-create-custom-visual-part-1
Is it possible to use R Plotly library in R Script Visual of Power BI?
The issue with Plotly is that for my dataset it is so slow even when I am compiling that in R. takes several minutes. So, I have decided to replace it with googleVis which is really fast (I am open any other interactive Gantt Chart in R).
Here is my code in R:
df <- data.frame(Values)
library("googleVis")
#df$Project.Name <- toString(df$Project.Name)
df$Processed_start_date_cut <- as.Date(df$Processed_start_date_cut)
df$Processed_End_date <- as.Date(df$Processed_End_date)
#df$Milestone <- toString(df$Milestone)
g <- gvisTimeline(data=df,
rowlabel="Project.Name",
barlabel="Milestones",
start="Processed_start_date_cut",
end="Processed_End_date",
options=list(timeline="{rowLabelStyle:{fontName:'Helvetica',
fontSize:10, color:'#603913'},
barLabelStyle:{fontName:'Garamond',
fontSize:12}}",
backgroundColor='#ffd',
height=350 ))
cat(g$html$chart, file="out.html")
I have tried it in R and it works great. In BI this works for the first time, but when I change any filters, nothing shows up in this newly developed pbivis item unless I go to another tab of my report and then come back to this tab which has this newly developed pbivis (this was the reason I thought it is not working at first, sorry).
See the screenshot
I have also noticed that if I maximize this item (pbivis), then the chart disappears (i.e. shows nothing).
I guess I need a kind of code for refreshing the visuals which possibly can come before df <- data.frame(Values), maybe something like F5 in IE.
Also tried this and did not work:
if (file.exists("out.html"))
#Delete file if it exists
file.remove("out.html")
As user3867743 suggested, the issue is related to
cat(g$html$chart, file="out.html")
After replacing it by
print(g, file="out.html")
It has started working fine.
Using dplyr in R (microsoft R Open 3.5.3 to be precise). I'm having a slight problem with dplyr whereby I'm sometimes seeing lots of additional information in the data frame I create. For example, for these lines of code:
claims_frame_2 <- left_join(claims_frame,
select(new_policy_frame, c(Lookup_Key_4, Exposure_Year, RowName)),
by = c("Accident_Year" = "Exposure_Year", "Lookup_Key_4" = "Lookup_Key_4")
)
claims_frame_3 <- claims_frame_2 %>% group_by(Claim.Number) %>% filter(RowName == max(RowName))
No problem with the left_join command, but when I do the second command (group by/filter), the data structure of the claims_frame_3 object is different to that of the claims_frame_2 object. Seems to suddenly have lots of attributes (something I know little about) attached to the RowName field. See the attached photo.
Does anyone know why this happens and how I can stop it?
I had hoped to put together a small chunk of reproducible code that demonstrated this happening, but so far I haven't been successful. I will continue. In the mean time, I'm hoping someone might see this code (from a real project) and immediately know why this is happening!
Grateful for any advice.
Thanks
Alan
I have previously written a script to create a colored map of the US, with each state colored based on some simulated data. The idea is to later be able to replace the simulated data with some measure. It was written to be self-contained and originally ran just fine, but now crashes when the fortify {ggplot2} command is run.
I believe this is due to a problem with the fortify command, as it returns a fatal error and restarts R at that point. Here is the code up to the point of the fatal error:
###Load libraries
library(maptools)
library(ggplot2)
library(ggmap)
library(rgdal)
library(dplyr)
#Set working directory to where you want your files to exist (or where they already exist)
#Download, read and translate coord data for shape file of US States
if(!file.exists('tl_2014_us_state.shp')){
download.file('ftp://ftp2.census.gov/geo/tiger/TIGER2014/STATE/tl_2014_us_state.zip',
'tl_2014_us_state.zip')
files <- unzip('tl_2014_us_state.zip')
tract <- readOGR(".","tl_2014_us_state") %>% spTransform(CRS("+proj=longlat +datum=WGS84"))
} else {
tract <- readShapeSpatial("./tl_2014_us_state.shp") #%>% spTransform(CRS("+proj=longlat +datum=WGS84"))
}
# shape<-readShapeSpatial("./fao/World_Fao_Zones.shp")
#Download reference data for state names and abbreviations - a matter of convenience if there are
#states for which you have no data
if(!file.exists('states.csv')){
download.file('http://www.fonz.net/blog/wp-content/uploads/2008/04/states.csv',
'states.csv')
states <- read.csv('states.csv')
} else {
states <- read.csv('states.csv')
}
#simulated data for plotting values of some 'characteristic'
mydata <- data.frame(rnorm(51, 0, 1)) #51 "states" in the state dataset
names(mydata)[1] <- 'value' #give the simulated column of data a name
#Turn geo data into R dataframe
tract_geom<-fortify(tract,region="STUSPS") #STUSPS is the state abbreviation which will act as a key for merge
The script stops working at the line above and crashes R with a fatal error. I have tried a workaround described in another post, in which you place an explicit "id" column in the spatial dataframe, which fortify then uses as the key by default. With this modification the lines:
tract#data$id <- tract#data$STUSPS
tract_geom <- fortify(tract)
would replace tract_geom<-fortify(tract,region="STUSPS") in the previous code,
where STUSPS is the key for a later data merge.
Unfortunately, when I then fortify the tract data, the id column is not the state abbreviation as expected, but is instead a vector of characters between "0" and "55" (56 unique values). It appears that the state abbreviations (of which there are 56) are somehow being transformed into numbers and then into characters.
I am working on figuring out why this is happening and looking for a fix. If the fortify function worked with the region argument, that would be ideal, but if I can get the workaround to work, that would be great too. Any help would be greatly appreciated. I have looked at the documentation and at solutions to various similar problems and have come up short (even tried ArcGIS).
Try:
readOGR(..., stringsAsFactors=FALSE, ...)
I was able to solve my own question by running update.packages(). I'm not entirely sure which package was the culprit, but it could have been maptools, rgdal, or sp, as these were among the packages to be updated that may have influenced the problem.
In the end, after updates, the script runs in its original form with the line tract_geom<-fortify(tract,region="STUSPS") intact. Thank you to those who helped me work through this problem.
I am doing java and R integration using JRI.
Please find below script
String path = "C:\\Users\\hrpatel\\Desktop\\CSVs\\DataNVOCT.csv";
rengine.eval("library(tseries)");
rengine.eval(String.format("mydata <- read.csv('%s')",path.replace('\\', '/')));
String exportFilePath= "C:\\Users\\hrpatel\\Desktop\\CSVs\\arima3.jpg";
rengine.eval("Y <- NewVisits");
rengine.eval("t <- Day.Index");
rengine.eval("summary(Y)");
rengine.eval("adf.test(Y, alternative='stationary')");
rengine.eval("adf.test(Y, alternative='stationary', k=0)");
rengine.eval("acf(Y)");
rengine.eval("pacf(Y)");
rengine.eval("mydata.arima101 <- arima(Y,order=c(1,0,1))");
rengine.eval("mydata.pred1 <- predict(mydata.arima101, n.ahead=1000)");
rengine.eval(String.format("jpeg('%s')",exportFilePath.replace('\\', '/')));
rengine.eval("plot(t,Y)");
rengine.eval("lines(mydata.pred1$pred, col='blue',size=10)");
rengine.eval("lines(mydata.pred1$pred+1*mydata.pred1$se, col='red')");
rengine.eval("lines(mydata.pred1$pred-1*mydata.pred1$se, col='red')");
rengine.eval("dev.off()");
In above codebase when i tried plot(t,Y) or plot(Y). it export a blank image, while in case of plot(mydata) it is working file.
One more thing when i run above code in R it creates the image(using JRI it shows blank image).
I have spend 1 day to solve this but i dont found any solution.
Please suggest if you have any alternatives.
Your help is needed.
Thanks in Advance
if i understand correctly, you have a data set named mydata, that has two columns, NewVisits, and Day.Index, in that case you need to change:
rengine.eval("Y <- NewVisits");
to
rengine.eval("Y <- mydata$NewVisits");
and
rengine.eval("t <- Day.Index");
to
rengine.eval("t <- mydata$Day.Index");
This also explains why plot(mydata) works for you - because R recognizes it.
if this isn't the solution, then i cant see where you are reading NewVisits and Day.Index from
BTW i stongly recommend to plot using the ggplot package