I wasn't able to find much pertinent to this on Stack-Overflow, or the web.
I'm getting this error:
> library(knitr)
> knit2html("pa1_template.rmd")
Error in knit2html("pa1_template.rmd") :
It seems you should call rmarkdown::render() instead of knitr::knit2html() because pa1_template.rmd appears to be an R Markdown v2 document.
I just ran it with rmarkdown::render(), and it created the HTML file. However, my assignment wants me to run it through knit2html() and create an md file.
When I run the Rmd file through the RStudio "Knit HTML" menu option, it creates the HTML file fine.
Any pointers appreciated.
Here is the content of the rmd file:
## Loading and preprocessing the data
Read the data file in.
```{r readfile}
steps<-read.csv("activity.csv",header=TRUE, sep=",")
steps_good<-subset(steps, !is.na(steps))
```
Sum the number of steps per day
```{r summarize/day}
steps_day<-aggregate(steps~date, data=steps_good, sum)
```
Create a histogram of the results
```{r histogram}
hist(steps_day$steps, main="Frequency of Steps/day", xlab="Steps/Day", border="blue", col="orange")
```
# What is the mean total number of steps taken per day?
Calculate the mean of the steps per day
```{r means_steps/day}
mean_steps<-mean(steps_day$steps)
mean_steps
```
Calculate the median of the steps per day
```{r median_steps/day}
med_steps<-median(steps_day$steps)
med_steps
```
#What is the average daily activity pattern?
Get the average steps per 5 minute interval
```{r avg_5_min}
step_5min<-aggregate(steps~interval, data=steps_good, mean)
```
Plot steps against time interval, averaged across all days
```{r plot_interval}
plot(step_5min$interval,step_5min$steps, type="l", main="steps per time interval",ylab="Steps",xlab="Interval")
```
On average, which interval during the day has the most steps.
```{r max_interval}
step_5min$interval[which.max(step_5min$steps)]
```
#Imputing missing values
How many NAs are there in the original table?
```{r NAs}
steps_na<-which(is.na(steps))
length(steps_na)
```
Merge 5 minute interval with original steps table
```{r merge}
steps_filled<-merge(steps, step_5min,by="interval")
```
Replace NA values with mean of steps values for that time interval
```{r replace_na}
steps_na<-which(is.na(steps_filled$steps.x))
steps_filled$steps.x[steps_na]<-steps_filled$steps.y[steps_na]
```
Create a histogram of the results
```{r new_hist}
steps_day_new<-aggregate(steps.x~date, data=steps_filled, sum)
hist(steps_day_new$steps.x, main="Frequency of Steps/day", xlab="Steps/Day", border="blue", col="orange")
```
It looks like the imputing of NA values increases the middle bar (mean/median) height, but other bars seem unchanged.
Calculate the new mean of the steps per day
```{r new_means_steps/day}
mean_steps<-mean(steps_day_new$steps.x)
mean_steps
```
Calculate the new median of the steps per day
```{r new_median_steps/day}
med_steps<-median(steps_day_new$steps.x)
med_steps
```
It looks like the mean did not change, but the median took on the value of the mean, now that some non-integer values were plugged in.
#Are there differences in activity patterns between weekdays and weekends?
Regenerate steps_filled, and flag whether a date is a weekend or a weekday.
Convert resulting column to factor.
```{r fill_weekdays}
steps_filled<-merge(steps, step_5min,by="interval")
steps_filled$steps.x[steps_na]<-steps_filled$steps.y[steps_na]
steps_filled<-cbind(steps_filled, wkday=weekdays(as.Date(steps_filled$date)))
steps_filled<-cbind(steps_filled, day_type="", stringsAsFactors=FALSE)
for(i in 1:nrow(steps_filled)){
if(steps_filled$wkday[i] %in% c("Saturday","Sunday"))
steps_filled$day_type[i]="Weekend"
else
steps_filled$day_type[i]="Weekday"
}
steps_filled$day_type<-as.factor(steps_filled$day_type)
```
Get average steps per interval and day_type
```{r plot_interva_day_type}
steps_interval_day<-aggregate(steps_filled$steps.x,by=list(steps_filled$interval,steps_filled$day_type),mean)
```
Plot the weekend and weekday results in a panel plot.
```{r day_type_plot}
weekday_intervals<-subset(steps_interval_day, steps_interval_day$Group.2=="Weekday",select=c("Group.1","x"))
weekend_intervals<-subset(steps_interval_day, steps_interval_day$Group.2=="Weekend",select=c("Group.1","x"))
par(mfrow=c(1,2))
plot(weekday_intervals$Group.1,weekday_intervals$x,type="l",xlim=c(0,2400), ylim=c(0,225),main="Weekdays",xlab="Intervals",ylab="Mean Steps/day")
plot(weekend_intervals$Group.1,weekend_intervals$x,type="l",xlim=c(0,2400), ylim=c(0,225),main="Weekends",xlab="Intervals",ylab="")
In RStudio, you can add keep_md: true in your YAML header:
---
title: "Untitled"
output:
html_document:
keep_md: true
---
With this option, you get both HTML and md files.
It worked with knit(), instead of knit2html()
try this:
setwd("working_directory")
library(knitr)
knit("PA1_template.Rmd", output = NULL)
adding output=NULL" was key for me.
Good luck!
Related
I create an Rmarkdown document where I would like to create a plot at the start of the document, and then print it at the end of the document.
I thought the best way to achieve this would be to save the plot in the environment and then recall it later, I save this as follows:
plot(1:5, 1:5) ; plot1 <- recordPlot() # I create a plot and save it as plot1
This plot is saved under "Data" in the environment.
If I enter plot1 into the console, my plot is reproduced, but when I try to display it directly in Rmarkdown as follows I get the following error:
plot(plot1)
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' is a list, but does not have components 'x' and 'y'
How I can take the plot that I saved into Data and print it anywhere I would like in my Rmarkdown document?
p.s. I know it's tempting to say to repeat the plot again later in the document, but the parameters that build the plot are subsequently altered for another part of my analysis.
Re-producible example:
x = 1
plot_later <- function() {
plot(x)
}
plot_later()
x = -10
plot_later()
X starts at 1 then changes to -10 on the Y axis, I want it to stay at the initial value of 1.
Solution based on https://bookdown.org/yihui/rmarkdown-cookbook/reuse-chunks.html :
---
title: plot now, render later
output: html_document
---
We put some plot expression here to evaluate it later:
```{r, deja-vu, eval=FALSE}
x = 1
plot(x)
```
Here we change `x` - but only within the corresponding chunk's scope:
```{r}
x = 10
```
... moving on
Here, we evaluate and plot the expression defined earlier; x is taken from that chunk's scope, so it still evaluates to `1`:
```{r, deja-vu, eval=TRUE}
```
One option could be saving the plot as grob object using as.grob function from ggplotify and then print it elsewhere.
---
title: "Saving A Plot"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
```{r}
library(ggplotify)
library(grid)
show_captured_plot <- function(grb) {
grid::grid.newpage()
grid::grid.draw(grb)
}
```
```{r}
x <- 1
p <- as.grob(~plot(x))
```
Now we can plot the figure here.
```{r}
x <- 10
show_captured_plot(p)
```
here, check out this link: https://bookdown.org/yihui/rmarkdown-cookbook/fig-chunk.html
It has a lot of instructions on how to get started with rmarkdown.
Specifically answering your question:
We generate a plot in this code chunk but do not show it:
```{r cars-plot, dev='png', fig.show='hide'}
plot(cars)
```
After another paragraph, we introduce the plot:
![A nice plot.](`r knitr::fig_chunk('cars-plot', 'png')`)
Basically you have to save your graph as a variable and then call on it using the knitr::fig_chunk() function.
I'm analyzing some data and would like to do a Simpsons paradox on R. I've installed the Simpsons package and loaded the library. Here is an example based on the package documentation:
---
output: html_document
---
```{r}
library(Simpsons)
#generating data
Coffee1=rnorm(100,100,15)
Neuroticism1=(Coffee1*.8)+rnorm(100,15,8)
g1=cbind(Coffee1, Neuroticism1)
Coffee2=rnorm(100,170,15)
Neuroticism2=(300-(Coffee2*.8)+rnorm(100,15,8))
g2=cbind(Coffee2, Neuroticism2)
Coffee3=rnorm(100,140,15)
Neuroticism3=(200-(Coffee3*.8)+rnorm(100,15,8))
g3=cbind(Coffee3, Neuroticism3)
data2=data.frame(rbind(g1,g2,g3))
colnames(data2) <- c("Coffee","Neuroticism")
example <- Simpsons(Coffee,Neuroticism,data=data2)
plot(example)
```
This is returning a plot with 3 clusters (exactly what I need). However, when I Knit the Rmd file to HTML, I'm getting a lot of equals signs (======) with a percentage next to it like a loading grid which I would like to remove from my final output.
You can suppress any output messages in R by setting the knitr chunk option. If we wish to hide all code output other than plots, we can use the following solution:
---
output: html_document
---
```{r echo=FALSE, results='hide', fig.keep='all', message = FALSE}
library(Simpsons)
#generating data
Coffee1=rnorm(100,100,15)
Neuroticism1=(Coffee1*.8)+rnorm(100,15,8)
g1=cbind(Coffee1, Neuroticism1)
Coffee2=rnorm(100,170,15)
Neuroticism2=(300-(Coffee2*.8)+rnorm(100,15,8))
g2=cbind(Coffee2, Neuroticism2)
Coffee3=rnorm(100,140,15)
Neuroticism3=(200-(Coffee3*.8)+rnorm(100,15,8))
g3=cbind(Coffee3, Neuroticism3)
data2=data.frame(rbind(g1,g2,g3))
colnames(data2) <- c("Coffee","Neuroticism")
example <- Simpsons(Coffee,Neuroticism,data=data2)
plot(example)
```
I would note that this package seems to print out a lot more content that most packages, and therefore the combination of options are quite long.
An easier method would probably be to move the plot to a separate chunk and have all the analysis run before it. The include argument can be used to suppress all outputs, but this includes plots, hence why we must use two chunks:
```{r, include = FALSE}
# your code to build model
```
```{r}
plot(example)
```
Check out the full list of knitr chunk options here
When i run my code everyhing is on, when i try to cenvert it in PDF it gives me an error. DO I need to install some other packages?
##Normality of the variables
#QQ-plot
```{r}
par(mfrow=c(2,2))
with(Melanoma, qqPlot(age, dist="norm", id.method="y", id.n=2,
labels=rownames(Melanoma), main="qq-plot Age"))
with(Melanoma, qqPlot(thickness, dist="norm", id.method="y", id.n=2,
labels=rownames(Melanoma), main="qq-plot Thickness"))
with(Melanoma, qqPlot(time, dist="norm", id.method="y", id.n=2,
labels=rownames(Melanoma), main="qq-plot Time"))
```
The variable "age" seems to follow a normal distribution, we can't say the same thing for the ariables "thickness" and "time".
#Saphiro-Wilk test
In order to be more precise I performed a Saphiro-Wilk test
```{r}
normalityTest(~age, test="shapiro.test", data=Melanoma)
```
```{r}
normalityTest(~thickness, test="shapiro.test", data=Melanoma)
```
```{r}
normalityTest(~time, test="shapiro.test", data=Melanoma)
```
We can see that, only the variables "time" and "age" seems to follow a normal distribution.
I need to use results = "asis" for reasons stated here: https://stackoverflow.com/a/36381976/
However, using that chunk option means other outputs render non-ideally. Specifically I'm having issues outputting prop.test results, but I'm sure this would happen for other data types.
I've provided 4 options in the example below, all of which fall short in some way:
---
title: "R Notebook"
output:
html_document:
df_print: paged
---
```{r, echo=F, message=F, warning=F, results="asis"}
library(knitr)
library(pander)
out <- prop.test(c(10,30), c(20,40))
cat("# Header \n")
cat(" \n## Straight output\n")
out # Only properly renders first line
cat(" \n## Print\n")
print(out) # Only properly renders first line
cat(" \n## Kable\n")
#kable(out) # Will fail: Error in as.data.frame.default(x) : cannot coerce class ""htest"" to a data.frame
kable(unlist(out)) # Renders everything but in an ugly way
cat(" \n## Pander\n")
pander(out) # Misses confidence interval.
cat(" \n As you can see, Pander misses some information, such as the confidence interval")
```
Pander gets it closest to a nice display but misses some information (confidence interval). Perhaps there's a way to make it display all?
How can I nicely display the output of prop.test and similar?
One option is to return to results = "markup" (the default) and replace your cat calls with asis_output (from the knitr package).
---
title: "R Notebook"
output:
html_document:
df_print: paged
---
```{r, echo=F, message=F, warning=F}
library(knitr)
library(pander)
out <- prop.test(c(10,30), c(20,40))
asis_output("# Header \n")
asis_output(" \n## Straight output\n")
out # Only properly renders first line
asis_output(" \n## Print\n")
print(out) # Only properly renders first line
asis_output(" \n## Kable\n")
#kable(out) # Will fail: Error in as.data.frame.default(x) : cannot coerce class ""htest"" to a data.frame
kable(unlist(out)) # Renders everything but in an ugly way
asis_output(" \n## Pander\n")
pander(out) # Misses confidence interval.
asis_output(" \n As you can see, Pander misses some information, such as the confidence interval")
```
You can use formattable like this
library(knitr)
library(formattable)
out <- prop.test(c(10,30), c(20,40))
cat("# Header \n")
cat(" \n## Straight output\n")
out # Only properly renders first line
cat(" \n## Print\n")
print(out) # Only properly renders first line
cat(" \n## Kable\n")
#kable(out) # Will fail: Error in as.data.frame.default(x) : cannot coerce class ""htest"" to a data.frame
kable(unlist(out)) # Renders everything but in an ugly way
cat(" \n## Pander\n")
df <- data.frame(value = unlist(out))
tdf <- as.data.frame(t(df))
formattable(tdf)
You can keep the columns you want, update the column names as all of these are in data frame. A rough example of how it looks is here
I have a dataframe dataof nobservations of several numeric and factor variables. I would like to produce a html report in which class and describe are reported and a histogram (qplotor ggplot) is plotted for every variable.
How can I do that?
Is it possible in R Markdown to produce an automatic header preceding every variable analysis?
Thank you for your help.
Corrado
You can put a loop in your R chunks in Markdown files. Something like that for example :
```{r, echo=FALSE}
library(ggplot2)
```
This is an introductory sentence with absolutely no interest.
```{r, results="asis", eval=TRUE, echo=FALSE}
data(cars)
for (varname in names(cars)) {
var <- cars[,varname]
cat(paste0("<h2>",varname,"</h2>"))
cat(paste0("Class : <pre>",class(var),"</pre>"))
cat("Summary : <pre>")
print(summary(var))
cat("</pre>")
if (is.numeric(var)) print(qplot(var, binwidth=diff(range(var))/30))
}
```
This is an astonishing conclusion.
Which gives the following result : http://rpubs.com/juba/mdloop