Cannot plot all data in box-plotting in R - r

I wanted to make a box plot. I have more than 1000 rows but when I am plotting them, it shows only a few entries.
Dataset:
https://www.dropbox.com/s/tgaqfgm2gkl7i3r/maintenance_data_updated.csv
#Start of Box plot Temperature
training_data <- read.csv("C:/Users/akhan/Documents/maintenance_data_updated_2.csv", stringsAsFactors = TRUE)
library(dplyr)
dt_temperature <- select(training_data, Runtime, Defect, Machine, Temperature, Plant)
dt_temperature$Machine_Plant = paste(dt_temperature$Machine,dt_temperature$Plant,sep = "_")
attach(dt_temperature)
class(Temperature)
class(Defect)
class(Runtime)
class(Machine)
?boxplot
boxplot(Temperature ~ Machine_Plant)
Current output: https://www.dropbox.com/s/7nv5n80en1vpkyt/Rplot01.png
Can anyone please give a hint what is the solution ?

What do you mean saying 'it shows only a few entries'? If your problem is about having only 4 boxplots annotated on X-axis, solution could be like this:
boxplot(Temperature ~ Machine_Plant, las=3)
Type
?par
for more information about las parameter.

Related

Plotting multiple csv's on one graph in R

I need to plot two sets of data on one graph, and then use locator() to draw a vertical line at a given date. I have the code below and it works until after the plot function. The lines function is not adding the second dataset to my graph.
Can someone please help me understand what I'm doing incorrectly?
Thank you in advance
LMT <- read.csv("LMT.csv",header = T)
JNJ <- read.csv("JNJ.csv",header = T)
LMT$Date <- as.Date(LMT$Date)
JNJ$Date <- as.Date(JNJ$Date)
plot(y=LMT$Adj.Close, x=LMT$Date, ylim = c(0,500), type = "l")
lines(JNJ$Adj.Close,type = "l")

How do I use the group argument for the plot_summs() function from the jtools package?

I am plotting my coefficient estimates using the function plot_summs() and would like to divide my coefficients into two separate groups.
The function plot_summs() has an argument groups, however, when I try to use it as explained in the documentation, I do not get any results nor error. Can someone give me an example of how I can use this argument please?
This is the code I currently have:
plot_summs(model.c, scale = TRUE, groups = list(pane_1 = c("AQI_average", "temp_yearly"), pane_2 = c("rain_1h_yearly", "snow_1h_yearly")), coefs = c("AQI Average"= "AQI_average", "Temperature (in Farenheit)" = "temp_yearly","Rain volume in mm" = "rain_1h_yearly", "Snow volume in mm" = "snow_1h_yearly"))
And the image below is what I get as a result. What I would like to get is to have two panes separate panes. One which would include "AQI_average" and "temp_yearly" and the other one that would have "rain_1h_yearly" and "snow_1h_yearly". Event though I use the groups argument, I do not get this.
Output of my code
By minimal reproducible example, markus is refering to a piece of code that enables others to exactly reproduce the issue you are refering to on our respective computers, as described in the link that they provided.
To me, it seems the problem is that the groups function does not seem to work in plot_summs - it seems someone here also pointed it out.
If plot_summs is replaced by plot_coef, the groups function work for me. However, the scale function does not seem to be available. A workaround might be:
r <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data = iris)
y <- plot_summs(r, scale = TRUE) #Plot for scaled version
t <- plot_coefs(r, #Plot for unscaled versions but with facetting
groups =
list(
pane_1 = c("Sepal.Width", "Petal.Length"),
pane_2 = c("Petal.Width"))) + theme_linedraw()
y$data$group <- t$data$group #Add faceting column to data for the plot
t$data <- y$data #Replace the data with the scaled version
t
I hope this is what you meant!

Create horizon chart on R using ggplot2: show percentage change

I am a beginner at this and am really lost about it.
I would like to create a horizon chart that shows the percentage change in sales for the different towns using ggplot2 and R. Would anyone guide me in the approach I can take to create the chart?
The data that I have looks like this.
This is the type of chart I would like to do.
(source: https://harmoniccode.blogspot.com/2017/11/friday-fun-li-horizon-charts.html)
Thanks in advance for any help given!
Edit: here's a sample code of the data:
x <- data.frame(
"town" =c('sad','sad','sad','sad','happy','happy','happy','happy'),
"month"=c("2017-01","2017-02","2017-03","2017-04","2017-01","2017-02","2017-03","2017-04"),
"median_sales" = c(336500,355000,375000,395000,359000,361500,36000,375000),
"percentage_change" = c(NA,5.4977712,5.6338028,5.3333333,NA,0.6963788,-0.4149378, 4.1666667
))
x <-
x %>%
mutate(month = floor_date(as_date(as.yearmon(month)), "month"))
It would be helpful to give an example that will result in a reasonable plot, and to provide your example data as data rather than an image.
If you google 'horizon plot' the first answer should give you what you need.
Here is a simple example based on the data you gave:
library(latticeExtra)
sales.ts <- ts(matrix(sales$median_sales, ncol=2), names = c("sad", "happy"),
start = c(2017, 1), frequency = 365)
horizonplot(sales.ts)
I think this is correctly presenting your results, but again hard to tell as you haven't given a realistic dataset.
UPDATE: based on the data provided, this is the answer. Again, as you've only provided one time point a horizonplot is probably not what you want. They are designed to plot time series.
x.ts <- ts(matrix(x$median_sales, ncol=2), names = c("sad", "happy"),
start = c(2015, 1), frequency = 12)
horizonplot(x.ts)

Using multiple datasets for one graph

I have 2 csv data files. Each file has a "date_time" column and a "temp_c" column. I want to make the x-axis have the "date_time" from both files and then use 2 y-axes to display each "temp_c" with separate lines. I would like to use plot instead of ggplot2 if possible. I haven't been able to find any code help that works with my data and I'm not sure where to really begin. I know how to do 2 separate plots for these 2 datasets, just not combine them into one graph.
plot(grewl$temp_c ~ grewl$date_time)
and
plot(kbll$temp_c ~ kbll$date_time)
work separately but not together.
As others indicated, it is easy to add new data to a graph using points() or lines(). One thing to be careful about is how you format the axes as they will not be automatically adjusted to fit any new data you input using points() and the like.
I've included a small example below that you can copy, paste, run, and examine. Pay attention to why the first plot fails to produce what you want (axes are bad). Also note how I set this example up generally - by making fake data that showcase the same "problem" you are having. Doing this is often a better strategy than simply pasting in your data since it forces you to think about the core component of the problem you are facing.
#for same result each time
set.seed(1234)
#make data
set1<-data.frame("date1" = seq(1,10),
"temp1" = rnorm(10))
set2<-data.frame("date2" = seq(8,17),
"temp2" = rnorm(10, 1, 1))
#first attempt fails
#plot one
plot(set1$date1, set1$temp1, type = "b")
#add points - oops only three showed up bc the axes are all wrong
lines(set2$date2, set2$temp2, type = "b")
#second attempt
#adjust axes to fit everything (set to min and max of either dataset)
plot(set1$date1, set1$temp1,
xlim = c(min(set1$date1,set2$date2),max(set1$date1,set2$date2)),
ylim = c(min(set1$temp1,set2$temp2),max(set1$temp1,set2$temp2)),
type = "b")
#now add the other points
lines(set2$date2, set2$temp2, type = "b")
# we can even add regression lines
abline(reg = lm(set1$temp1 ~ set1$date1))
abline(reg = lm(set2$temp2 ~ set2$date2))

Visualize data using histogram in R

I am trying to visualize some data and in order to do it I am using R's hist.
Bellow are my data
jancoefabs <- as.numeric(as.vector(abs(Janmodelnorm$coef)))
jancoefabs
[1] 1.165610e+00 1.277929e-01 4.349831e-01 3.602961e-01 7.189458e+00
[6] 1.856908e-04 1.352052e-05 4.811291e-05 1.055744e-02 2.756525e-04
[11] 2.202706e-01 4.199914e-02 4.684091e-02 8.634340e-01 2.479175e-02
[16] 2.409628e-01 5.459076e-03 9.892580e-03 5.378456e-02
Now as the more cunning of you might have guessed these are the absolute values of some model's coefficients.
What I need is an histogram that will have for axes:
x will be the number (count or length) of coefficients which is 19 in total, along with their names.
y will show values of each column (as breaks?) having a ylim="" set, according to min and max of those values (or something similar).
Note that Janmodelnorm$coef simply produces the following
(Intercept) LON LAT ME RAT
1.165610e+00 -1.277929e-01 -4.349831e-01 -3.602961e-01 -7.189458e+00
DS DSA DSI DRNS DREW
-1.856908e-04 1.352052e-05 4.811291e-05 -1.055744e-02 -2.756525e-04
ASPNS ASPEW SI CUR W_180_270
-2.202706e-01 -4.199914e-02 4.684091e-02 -8.634340e-01 -2.479175e-02
W_0_360 W_90_180 W_0_180 NDVI
2.409628e-01 5.459076e-03 -9.892580e-03 -5.378456e-02
So far and consulting ?hist, I am trying to play with the code bellow without success. Therefore I am taking it from scratch.
# hist(jancoefabs, col="lightblue", border="pink",
# breaks=8,
# xlim=c(0,10), ylim=c(20,-20), plot=TRUE)
When plot=FALSE is set, I get a bunch of somewhat useful info about the set. I also find hard to use breaks argument efficiently.
Any suggestion will be appreciated. Thanks.
Rather than using hist, why not use a barplot or a standard plot. For example,
## Generate some data
set.seed(1)
y = rnorm(19, sd=5)
names(y) = c("Inter", LETTERS[1:18])
Then plot the cofficients
barplot(y)
Alternatively, you could use a scatter plot
plot(1:19, y, axes=FALSE, ylim=c(-10, 10))
axis(2)
axis(1, 1:19, names(y))
and add error bars to indicate the standard errors (see for example Add error bars to show standard deviation on a plot in R)
Are you sure you want a histogram for this? A lattice barchart might be pretty nice. An example with the mtcars built-in data set.
> coef <- lm(mpg ~ ., data = mtcars)$coef
> library(lattice)
> barchart(coef, col = 'lightblue', horizontal = FALSE,
ylim = range(coef), xlab = '',
scales = list(y = list(labels = coef),
x = list(labels = names(coef))))
A base R dotchart might be good too,
> dotchart(coef, pch = 19, xlab = 'value')
> text(coef, seq(coef), labels = round(coef, 3), pos = 2)

Resources