I have been conducting some exercises from OpenIntro statistics to start getting familiar with R and RStudio.
I have completed all the exercises, I run my code in R studio and I get all of the tables and graphs that I have generated without a problem inside RStudio.
However, when it is time to knit the data, I get an error (that I believe I should not be getting given that I was able to run my code in RStudio without any errors and my tables and graphs are generated accurately).
The knitting bugs at exercise 3 where I am told to generate a plot of the proportion of boys that were born over time. Here is a sample of my code (lines 53 to 58)
```{r plot-prop-boys-arbuthnot}
mutate (arbuthnot, boy_ratio = boys / total)
ggplot(data = arbuthnot, aes(x = year, y = boy_ratio)) +
geom_line()
```
However, then I get a big error message that I do not understand. It says that total was not found. I tried defining the total by inserting :
total <- boys + girls
or by inserting :
total <- arbuthnot$boys + arbuthnot$girls
It just does not seem to work no matter what I do. For instance, even if I successfully define the total, it will bug again and give me another error when I need to knit the lab report. Sometimes I switched the way I write the mutate code. For instance, I also used
arbuthnot <- arbuthnot %>%
mutate(boy_ratio = boys / total)
However, even when I use this code in combination with the solutions I tried for defining the total, it still does not work.
I am not sure what to do at this point because the graph is displayed in RStudio. The ratio is accurate, it also shows up in a table that I have generated.
The variable total is in that table. I tried re-starting and re-running all the chunks of code in R. All of my tables and graphs come out perfectly and then when I try to knit my lab report again it bugs at line 54.
I have been trying to solve this for 2 days now and I am not sure what I should do.
I hope the community here will be able to give me a couple of pointers on how to solve this problem :) ! If you need more information or a bit more code let me know :) !
Wishing everyone a wonderful day !
To help others help you, consider making a minimal working example (MWE), for example using the reprex package. Without more details, it is near impossible to know exactly what when wrong.
The error message states that there is no total in the environment and that arbuthnot does not contain a column total, so possibly the latter was created but not assigned. It may be that the variable is in your environment when you run the code interactively and created the column or the variable at some point (using the code you provided). However, note that the script compiles in a new environment from scratch when knitting the .Rmd file, in which case it cannot find the variable and aborts.
To debug your code, consider replacing the code chunk lines 53-58 by a print statement, like head(arbuthnot), to see what comes out in the output file and confirm that the tibble indeed contains total.
Alternatively, debug by running the code chunk by chunk until you get the error message in a new environment. In RStudio, try Ctrl + Shift + F10 (equivalent to Session > Restart R) to clear everything and start afresh.
The following code chunk should work
library(openintro)
library(tidyverse)
data(arbuthnot)
arbuthnot <- arbuthnot %>% # note assignment (write over database)
mutate(total = boys + girls, # define total first
boy_ratio = boys / total)
ggplot(data = arbuthnot,
mapping = aes(x = year, y = boy_ratio)) +
geom_line()
Thank you #lbelzile for the great tips.
In the future, I will use the minimal working example to better inform other contributors on stack overflow. I thought that the evidence I had provided was sufficient.
That being said, thank to the bits of code you sent me, I was able to solve the problem.
Following parts of your instructions, here is the code that worked :
head(arbuthnot)
library(tidyverse)
library(openintro)
data(arbuthnot)
arbuthnot <-arbuthnot %>%
mutate (total = boys + girls, boy_ratio = boys / total)
ggplot(data = arbuthnot, aes(x = year, y = boy_ratio)) +
geom_line()
After inserting this code, the file was able to get stitched and my lab report was generated.
I would like to thank you for taking the time to help me :) !
Have a great week.
Related
Reprex
install.packages("tidyverse") ## general use and data cleaning
library(tidyverse)
library(purrr)
library(lubridate)
library(here)
library(hms)
library(scales)
library(rstudioapi)
install.packages("ggmap")
library(ggmap)
ggplot(data = bikeshare_v8) +
geom_point(mapping = aes(x = ride_id,
y = ride_length))
Task
Analyzing the divvy bikeshare data for the google data analytics course, finding differences between members and casuals.
Problem
Though previously the data could be plotted without incident, ggplot will no longer work. The code chunk still has the left side green bar, and the spinning progress circle is still present in the Rmarkdown output below the code chunk. While this is happening, RAM use keeps accumulating, but even after minutes nothing displayed and the code does not complete.
Solutions Attempted
running with only tidyverse installed and loaded - no change
running without the project being loaded - just markdown - no change
making sure windows defender is excluded from R files - no change
Why did this work before, but not now?
How can I get ggplot to work again?
Found the solution - wrong datatype
Lubridate was used to convert the time information to the duration datatype, which ggplot did not like.
Converting to numeric datatype using as.numeric solved the issue.
It's me again, quite the beginner at R but somehow fumbling my way through it for my thesis. I've run a bunch of regressions and made them into tables using Stargazer. Now I need to share all these results (the glm models/their summaries/the coefficients and confidence intervals and the stargazer tables ... basically everything in my console) with a friend of mine to discuss, but I figure there's got to be a more efficient way to do this than 1) screenshot-ing the hell out of my console or 2) copy and pasting the console and thus botching the formatting. Does anyone have any advice for this?
Some of my code (the rest is just variations on the same stuff) is below in case that's helpful!
Mod4 <- glm(`HC Annual Total` ~ `state population`
+ Year + `Trump Presidency`, data = thesis.data, family = poisson())
summary(Mod4)
#pulling the coefs out, then add exp for what reason I don't remember
exp(coef(Mod4))
#finding the confidence intervals
exp(confint(Mod4))
#Using stargazer to turn Mod4 into a cleaner table
library(stargazer)
stargazer(Mod4, type="text", dep.var.labels = c("Hate Crimes"),
covariate.labels = c("State Population", "Year", "Trump Presidency"),
out = "models.txt")
When you need it fast and without art, you could send console output to a simple text file using sink.
sink(file="./my_code.txt") ## open sink connection
timestamp()
(s <- summary(fit <- lm(mpg ~ hp, mtcars)))
cat('\n##', strrep('~', 77), '\n')
texreg::screenreg(fit, override.se=s$coe[,3], override.pvalues=s$coe[,4])
cat('\n# Note:
We could report t-values
instead of SEs\n')
cat('\n##', strrep('~', 77), '\n')
cat('\nCheers!\nJ')
sink() ## close it!
file.show("./my_code.txt") ## look at it
Note, that you can easily create a mess with unclosed sinks and no output is shown on the console. Try closeAllConnections() in this case or perhaps milder solutions. Also consider rmarkdown as suggested in comments.
savehistory() is your friend:
savehistory(file = "my-code.txt")
You can then edit the code at will.
Or, in RStudio, you can use the history pane and copy and paste relevant bits to a text file.
If a full rmarkdown document is overkill, you could try using knitr::spin() to compile your code.
Lastly:
In future, always write scripts.
Reproducibility is best thought of at the start of a project, not as an add-on at the end. It's much easier to run a carefully-written script at the console, than it is to turn your meandering console input into a useful script. A good workflow is to try a few things at the console, then once you know what you are doing, add a line to your script.
I think reprex is an intermediate solution between rmarkdown and sink. At first, make sure your R script can be executed without any errors. Then use the following code:
library(reprex)
r.file <- "path/to/Rscript/test.R" ## path to your R script file
reprex(input = r.file, outfile = NA)
There will be four files created in the directory of your R script file, i.e.
test_reprex.R
test_reprex.html
test_reprex.md
test_reprex.utf8.md
The html and md files contain codes in your original R script and their output. You can share with someone and they can see the output without running any codes.
The html file looks like:
Using dplyr in R (microsoft R Open 3.5.3 to be precise). I'm having a slight problem with dplyr whereby I'm sometimes seeing lots of additional information in the data frame I create. For example, for these lines of code:
claims_frame_2 <- left_join(claims_frame,
select(new_policy_frame, c(Lookup_Key_4, Exposure_Year, RowName)),
by = c("Accident_Year" = "Exposure_Year", "Lookup_Key_4" = "Lookup_Key_4")
)
claims_frame_3 <- claims_frame_2 %>% group_by(Claim.Number) %>% filter(RowName == max(RowName))
No problem with the left_join command, but when I do the second command (group by/filter), the data structure of the claims_frame_3 object is different to that of the claims_frame_2 object. Seems to suddenly have lots of attributes (something I know little about) attached to the RowName field. See the attached photo.
Does anyone know why this happens and how I can stop it?
I had hoped to put together a small chunk of reproducible code that demonstrated this happening, but so far I haven't been successful. I will continue. In the mean time, I'm hoping someone might see this code (from a real project) and immediately know why this is happening!
Grateful for any advice.
Thanks
Alan
The following is a segment of code that I need to perform so that in the later stages I can perform other functions to make histograms of the dataset. My problem is that even after correctly importing the dataset, it does not recognize the "Aboard" Variable.
This clearly shows that the variable exists and that the dataset has been imported properly but whenever i try and run the chunks the second chunk comes up with an error saying that it does not recognise the variable. I have also tried to do this with another variable and it comes up with the same error. I do not know why it is happening and if it it because i have missed out a step. I tried to fix this by putting the linecolnames(before) in but it did no good.
setwd("~/Uni Y2/Stats/Group Project")
Before <- read.csv("before_911_no_summary.csv",header = TRUE)
After <- read.csv("after_911_no_summary.csv",header = TRUE)
colnames(Before) <-c("Date","Location","Aboard","Fatalities","Ground",
"Total.dead")
colnames(After) <-c("Date1","Location1","Aboard1","Fatalities1","Ground1",
"Total.dead1")
```
```{r}
Survivors<- (Aboard-Fatalities)
Survivors1<- (Aboard1-Fatalities1)
```
This one is strange. In an R markdown document, every single code cell displays its output without error, but when I try to knit the document into html, I get an error:
Error: stat_bin() must not be used with a y aesthetic. Execution halted
The closest code I could find to the line number and the last cell name to flash by before the error occurred was this:
g + geom_histogram() # default: bins=30 (for diamonds: 5.01 - 0.2 / 30)
g <- ggplot(data = diamonds, aes(x = carat))
g + geom_histogram(binwidth = 1) # not fine grained enough
g + geom_histogram(binwidth = 0.1)
g + geom_histogram(binwidth = 0.01) # too fine grained
A confusing aspect of the RStudio environment is that things can be loaded in memory that no longer reflect the current state of the code.
In the example given, g was changed in an earlier cell, but its clean perfect output continued to display in the later cell. Once all code errors were tracked down. The document then knitted correctly.
Among the things that needed to be addressed:
All packages in use need an explicit declaration as in
library(dplyr). Some were in memory but not included in any of the
markdown cells.
eval cannot be FALSE on any cell whose code effects later
markdown cells, but include can be FALSE if the goal is to leave
that cell out of the final knitted document.
Code loading data from files needed to get paths checked and included
because the working directory got changed from what it was when files
were loaded.
These are some of the things that can throw off the knit process, but once addressed, then the document should knit fine. Know any more things to check? Feel free to edit this post and add them in.
Thought about deleting this post after I caught my mistakes but decided to write this up in case helpful for anyone else. Best wishes.