Sorting of categorical variables in ggplot - r

Good day, I wish to produce a graphic using ggplot2, but not using its default sorting of the categorical variable (alphabetically, in script: letters), but using the associated value of a continuous variable (in script: number) .
Here is an example script:
library(ggplot2)
trial<-data.frame(letters=letters, numbers=runif(n=26,min=1,max=26))
trial<-trial[sample(1:26,26),]
trial.plot<-qplot(x=numbers, y=letters, data=trial)
trial.plot
trial<-trial[order(trial$numbers),]
trial.plot<-qplot(x=numbers, y=letters, data=trial)
trial.plot
trial.plot+stat_sort(variable=numbers)
The last line does not work.

I'm pretty sure stat_sort does not exist, so it's not surprising that it doesn't work as you think it should. Luckily, there's the reorder() function which reorders the level of a categorical variable depending on the values of a second variable. I think this should do what you want:
trial.plot <- qplot( x = numbers, y = reorder(letters, numbers), data = trial)
trial.plot

If you could be more specific about how you want it to look, I think the community could make improvements on my answer, regardless is this what you are looking for:
qplot(numbers, reorder(letters, numbers), data=trial)

Related

Aggregate data and plot in a bar plot in R

i have a data set with parameter_variations and a score. This score has four scales: like, anth, comf and ueq.
The bargraph.CI function accepts raw data, not aggregated data. So try the following:
bargraph.CI(parameter_variants, response=score, group=scale, data=dat,
main="likeability", legend=TRUE)
This should give you one "two-way" plot. If you don't like the look of it, there are many arguments that make superficial adjustments. Check the help page for details.
To obtain separate plots for each of the four scales, I think you can do something like this:
library(dplyr)
dat %>%
filter(scale=="like") %>% # change the value here.
bargraph.CI(parameter_variants, response=score, data=., main="likeability")
Base R solution:
with(subset(dat, subset=scale=="like"),
bargraph.CI(parameter_variants, response=score, main="likeability")
)

Plotting in ggplot after converting to data.frame with a single column?

I'm trying to convert some simple data into a form I thought ggplot2 would accept.
I snag some simple stock data and now I just want to plot, later I want to plot say a 10-day moving average or a 30-day historical volatility period to go with it, which is I'm using ggplot.
I thought it would work something like this line of pseudocode
ggplot(maindata)+geom_line(moving average)+geom_line(30dayvol)
library(quantmod)
library(ggplot2)
start = as.Date("2008-01-01")
end = as.Date("2019-02-13")
start
tickers = c("AMD")
getSymbols(tickers, src = 'yahoo', from = start, to = end)
closing_prices = as.data.frame(AMD$AMD.Close)
ggplot(closing_prices, aes(y='AMD.Close'))
But I can't even get this to work. The problem of course appears to be that I don't have an x-axis. How do I tell ggplot to use the index column as a. Can this not work? Do I have to create a new "date" or "day" column?
This line for instance using the Regular R plot function works just fine
plot.ts(closing_prices)
This works without requiring me to enter a hard x-axis, and produces a graph, however I haven't figured out how to layer other lines onto this same graph, evidently ggplot is better so I tried that.
Any advice?
as.Date(rownames(df)) will get you the rownames and parse it as a date. You also need to specify a geom_line()
library(quantmod)
library(ggplot2)
start = as.Date("2008-01-01")
end = as.Date("2019-02-13")
start
tickers = c("AMD")
getSymbols(tickers, src = 'yahoo', from = start, to = end)
closing_prices = as.data.frame(AMD$AMD.Close)
ggplot(closing_prices, aes(x = as.Date(rownames(closing_prices)),y=AMD.Close))+
geom_line()
Edit
Thought it would be easier to explain in the answers as opposed to the comments.
ggplot and dplyr have two methods of evaluation. Standard and non standard evaluation. Which is why in ggplot you have both aes and aes_(). The former being non standard evaluation and the later being standard evaluation. In addition there is also aes_string() which is also standard evaluation.
How are these different?
Its easy to see when we explore all the methods,
#Cleaner to read, define every operation in one step
#Non Standard Evaluation
closing_prices%>%
mutate(dates = as.Date(rownames(.)))%>%
ggplot()+
geom_line(aes(x = dates,y = AMD.Close))
#Standard Evaluation
closing_prices%>%
mutate(dates = as.Date(rownames(.)))%>%
ggplot()+
geom_line(aes_(x = quote(dates),y = quote(AMD.Close)))
closing_prices%>%
mutate(dates = as.Date(rownames(.)))%>%
ggplot()+
geom_line(aes_string(x = "dates",y = "AMD.Close"))
Why are there so many different ways of doing the same thing? In most cases its okay to use non standard evaluation. However if we want to wrap these plots in functions and dynamically change the column to plot based on function parametrs passed as strings. It is helpful to plot using the aes_ and aes_string.

R: Problems while plotting sampled values from a curve

I am trying to simulate a signal in order to apply some methods of non-linear fittings, but I have some problems when plotting it.
x<-sample(seq(0,1,length.out = 1000),200)
y<-2*sin(4*pi*x)-6*abs(x-0.4)^(0.3)+2*exp(-30*(4*x-2)^2)+8*x+rnorm(200,0,0.5)
s<-2*sin(4*pi*x)-6*abs(x-0.4)^(0.3)+2*exp(-30*(4*x-2)^2)+8*x
plot(x,y)
lines(x,s,col="red")
The idea I want to have 200 observations uniformly sampled with an additive white noise term, and the I would like to plot this "perturbed" signal together with the original signal. (y and s respectively).
The fact is that if I use the code that I wrote I obtain as result something like:
Probably is such a simple thing, but I'm kinda stuck with this.
Any hint or suggestion will be greatly appreciated.
Lines are plotted sequentially, and you decided to randomly draw your X values, so x values sitting next to each other in x are not next to each other on the axis - hence the mess. Just sort it:
x<-sort(sample(seq(0,1,length.out = 1000),200))
y<-2*sin(4*pi*x)-6*abs(x-0.4)^(0.3)+2*exp(-30*(4*x-2)^2)+8*x+rnorm(200,0,0.5)
s<-2*sin(4*pi*x)-6*abs(x-0.4)^(0.3)+2*exp(-30*(4*x-2)^2)+8*x
plot(x,y)
lines(x,s,col="red")
Another way to do this on the fly mentioned by mickey is:
ord = order(x)
lines(x[ord], s[ord], col = 'red')
You need to reorder the x observations order in ascending order, you can do that by storing everything in a dataframe object and then ordering it:
x<-sample(seq(0,1,length.out = 1000),200)
df_p= data.frame(x)
df_p$y<-2*sin(4*pi*df_p$x)-6*abs(df_p$x-0.4)^(0.3)+2*exp(-30*(4*df_p$x-2)^2)+8*df_p$x+rnorm(200,0,0.5)
df_p$s<-2*sin(4*pi*df_p$x)-6*abs(df_p$x-0.4)^(0.3)+2*exp(-30*(4*df_p$x-2)^2)+8*df_p$x
df_p = df_p[order(df_p$x),]
plot(df_p$x,df_p$y)
lines(df_p$x, df_p$s,col="red")
Also if you want to avoid this step you can use the ggplot2 library:
p <- ggplot(df_p) + geom_point(aes(x = x,y= y)) + geom_line(aes(x=x,y=s,color='red'))
plot(p)

R : Bad graphic of ordered boxplot according to median

Here is what I am trying to do : I have a data.frame (data) of 160 rows with 2 variables (fact (8 groups) and response) and I want to do a boxplot of response ~ fact, ordered in increasing order of the medians.
Code :
data <- read.table("box.txt",header=T)
attach(data)
index <- order(tapply(response,fact,median))
ordered <- factor(rep(index,rep(20,8)))
boxplot(response~ordered,notch=T,names=as.character(index),xlab="treatments",ylab="response")
but on the graphic the boxes are badly plotted (not in the right order and with "false" Min, Max, etc...).
I'm using RStudio with R 3.0.2 on Windows 7.
Any clue about what does that mean?
One reproducible and seemingly correct answer would be :
set.seed(1)
data <- data.frame(response=10*rnorm(160), fact=factor(rep(1:8), labels=letters[1:8]))
data$fact <- reorder(data$fact, data$response, median)
boxplot(response~fact, data=data, notch=TRUE, xlab="treatments", ylab="response")
Names on the ticks of the x axis are correct, without further ado.
No idea why it looks 'bad', but the order is wrong because you use order instead of rank to find the index. For the other issues you probably have to make a reproducible example.
The reproducible example is as follows, with two boxplots to compare. In my case the plot (possibly) looks bad because of the devil's ears. Regarding the OP's question, I interpret his phrasing as bad referring to the fact that using order() instead of rank() resulted in other mishap as well (although I wouldn't know why).
data <- data.frame(response=rnorm(160), fact=factor(rep(1:8), labels=letters[1:8]))
boxplot(response~fact, data=data, notch=TRUE, xlab="treatments", ylab="response")
data$ordered <- rank(tapply(data$response, data$fact, median))
boxplot(response~ordered, data=data, notch=TRUE, xlab="treatments", ylab="response")

How to produce leverage stats?

I know how to produce the plots using leveragePlot(), but I can not find a way to produce a statistic for leverage for each observation like in megastat output.
I think you're looking for the hat values.
Use hatvalues(fit). The rule of thumb is to examine any observations 2-3 times greater than the average hat value. I don't know of a specific function or package off the top of my head that provides this info in a nice data frame but doing it yourself is fairly straight forward. Here's an example:
fit <- lm(hp ~ cyl + mpg, data=mtcars) #a fake model
hatvalues(fit)
hv <- as.data.frame(hatvalues(fit))
mn <-mean(hatvalues(fit))
hv$warn <- ifelse(hv[, 'hatvalues(fit)']>3*mn, 'x3',
ifelse(hv[, 'hatvalues(fit)']>2*mn, 'x3', '-' ))
hv
For larger data sets you could use subset and/or orderto look at just certain values ranges for the hat values:
subset(hv, warn=="x3")
subset(hv, warn%in%c("x2", "x3"))
hv[order(hv['hatvalues(fit)']), ]
I actually came across a nice plot function that does this in the book R in Action but as this is a copyrighted book I will not display Kabacoff's intellectual property. But that plot would work even better for mid sized data sets.
Here is a decent hat plot though that you may also want to investigate:
plot(hatvalues(fit), type = "h")

Resources