How to manage parameters with different length in R - r

I have to data sets (80211 and mine) as follows: each file has one column of data.
80211
1
2
3
4
5
mine
1
2
3
I need to read these two files and plot the cdf with ggplot2 but it says that the length of parameters are different.
The code is here.
library(ggplot2)
data1 <- read.csv('80211')
data2 <- read.csv('mine')
df <- data.frame(x = c(data1, data2), ggg=factor(rep(1:2, c(5,3))))
ggplot(df, aes(x, colour = ggg)) +
stat_ecdf()+
scale_colour_hue(name="my legend", labels=c('80211','mine'))

#Here this seems to work:
require(ggplot2)
data1 <- 1:5
data2 <- 1:3
df <- data.frame(x = c(data1, data2), ggg=factor(rep(1:2, c(5,3))))
ggplot(df, aes(x, colour = ggg)) +
stat_ecdf()+
scale_colour_hue(name="my legend", labels=c('80211','mine'))

Related

ggplot: adding a label to a geom_line aes_string

I have a for loop plotting 3 geom_lines, how do I add a label/legend so they won't all be 3 indiscernible black lines?
methods.list <- list(rwf,snaive,meanf)
cv.list <- lapply(methods.list, function(method) {
taylor%>% tsCV(forecastfunction = method, h=48)
})
gg <- ggplot(NULL, aes(x))
for (i in seq(1,3)){
gg <- gg + geom_line(aes_string( y=sqrt(colMeans(cv.list[[i]]^2, na.rm=TRUE))))
}
gg + guides(colour=guide_legend(title="Forecast"))
If I don't use a loop, I can use aes instead of that horrible aes_string and then everything works, but I have to write the same code 3 times and replace the loop with this:
gg <- gg + geom_line(aes(y=sqrt(colMeans(cv.list[[1]]^2, na.rm=TRUE)), colour=names(cv.list)[1]))
gg <- gg + geom_line(aes(y=sqrt(colMeans(cv.list[[2]]^2, na.rm=TRUE)), colour=names(cv.list)[2]))
gg <- gg + geom_line(aes(y=sqrt(colMeans(cv.list[[3]]^2, na.rm=TRUE)), colour=names(cv.list)[3]))
and then there are nice automatic colors and legend. What am I missing? Why is r being so noob-unfriendly?
The example is not reproducible, (there is no data!) but it seems you have some information in a list cv.list which contains multiple data.frames, and you want to plot some summary statistic of each against a common varaible stored in x.
The simplest method is simply to create a data.frame and plot using the data.frame.
#Create 3 data.frames with data (forecast?)
df <- lapply(1:3, function(group){
summ_stat <- sqrt(colMeans(cv.list[[i]]^2, na.rm=TRUE))
group <- group
data.frame(summ_stat, group, x = x)
})
#bind the data.frames into a single data.frame
df <- do.call(rbind, df)
#Create the plot
ggplot(data = df, aes(x = x, y = summ_stat, colour = group)) +
geom_line() +
labs(colour = "Forecast")
Note the change of label in the labs argument. This is changing the label of colour which is part of aes.

Drawing a multiple line ggplot figure

I am working on a figure which should contain 3 different lines on the same graph. The data frame I am working on is the follow:
I would like to be able to use ind(my data point) on x axis and then draw 3 different lines using the data coming from the columns med, b and c.
I only managed to obtain draw one line.
Could you please help me? the code I am using now is
ggplot(data=f, aes(x=ind, y=med, group=1)) +
geom_line(aes())+ geom_line(colour = "darkGrey", size = 3) +
theme_bw() +
theme(plot.background = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank())
The key is to spread columns in question into a new variable. This happens in the gather() step in the below code. The rest is pretty much boiler plate ggplot2.
library(ggplot2)
library(tidyr)
xy <- data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10),
ind = 1:10)
# we "spread" a and b into a a new variable
xy <- gather(xy, key = myvariable, value = myvalue, a, b)
ggplot(xy, aes(x = ind, y = myvalue, color = myvariable)) +
theme_bw() +
geom_line()
With melt and ggplot:
df$ind <- 1:nrow(df)
head(df)
a b med c ind
1 -87.21893 -84.72439 -75.78069 -70.87261 1
2 -107.29747 -70.38214 -84.96422 -73.87297 2
3 -106.13149 -105.12869 -75.09039 -62.61283 3
4 -93.66255 -97.55444 -85.01982 -56.49110 4
5 -88.73919 -95.80307 -77.11830 -47.72991 5
6 -86.27068 -83.24604 -86.86626 -91.32508 6
df <- melt(df, id='ind')
ggplot(df, aes(ind, value, group=variable, col=variable)) + geom_line(lwd=2)

Plot two functions in ggplot2 with different x range limits

I have plotted linear functions with ggplot as follow:
ggplot(data.frame(x=c(0,320)), aes(x)) +
stat_function(fun=function(x)60.762126*x-549.98, geom="line", colour="black") +
stat_function(fun=function(x)-0.431181333*x+2.378735e+02, geom="line", colour="black")+
ylim(-600,600)
However, I want the 1st function to be plotted for x ranging from 0 to 12 and the 2nd function to be plotted for x ranging from 12 to max(x).
Does anyone know how to do it?
It's easiest to just calculate the data you need outside of the ggplot call first.
fun1 <- function(x) 60.762126 * x - 549.98
dat1 <- data.frame(x = c(0, 12), y = NA)
dat1$y <- fun1(dat1$x)
fun2 <- function(x) -0.431181333 * x + 2.378735e+02
dat2 <- data.frame(x = c(12, 320), y = NA)
dat2$y <- fun2(dat2$x)
ggplot(mapping = aes(x, y)) +
geom_line(data = dat1) +
geom_line(data = dat2)
Or you can join the data for the lines first (as suggested by #Heroka), resulting in an identical plot:
dat.com <- rbind(dat1, dat2)
dat.com$gr <- rep(1:2, c(nrow(dat1), nrow(dat2)))
ggplot(dat.com, aes(x, y, group = gr)) +
geom_line()

How to plot three point lines using ggplot2 instead of the default plot in R

I have three matrix and I want to plot the graph using ggplot2. I have the data below.
library(cluster)
require(ggplot2)
require(scales)
require(reshape2)
data(ruspini)
x <- as.matrix(ruspini[-1])
w <- matrix(W[4,])
df <- melt(data.frame(max_Wmk, min_Wmk, w, my_time = 1:10), id.var = 'my_time')
ggplot(df, aes(colour = variable, x = my_time, y = value)) +
geom_point(size = 3) +
geom_line() +
scale_y_continuous(labels = comma) +
theme_minimal()
I want to add the three plots into one plot using a beautiful ggplot2.
Moreover, I want to make the points with different values have different colors.
I'm not quite sure what you're after, here's a guess
Your data...
max <- c(175523.9, 33026.97, 21823.36, 12607.78, 9577.648, 9474.148, 4553.296, 3876.221, 2646.405, 2295.504)
min <- c(175523.9, 33026.97, 13098.45, 5246.146, 3251.847, 2282.869, 1695.64, 1204.969, 852.1595, 653.7845)
w <- c(175523.947, 33026.971, 21823.364, 5246.146, 3354.839, 2767.610, 2748.689, 1593.822, 1101.469, 1850.013)
Slight modification to your base plot code to make it work...
plot(1:10,max,type='b',xlab='Number',ylab='groups',col=3)
points(1:10,min,type='b', col=2)
points(1:10,w,type='b',col=1)
Is this what you meant?
If you want to reproduce this with ggplot2, you might do something like this...
# ggplot likes a long table, rather than a wide one, so reshape the data, and add the 'time' variable explicitly (ie. my_time = 1:10)
require(reshape2)
df <- melt(data.frame(max, min, w, my_time = 1:10), id.var = 'my_time')
# now plot, with some minor customisations...
require(ggplot2); require(scales)
ggplot(df, aes(colour = variable, x = my_time, y = value)) +
geom_point(size = 3) +
geom_line() +
scale_y_continuous(labels = comma) +
theme_minimal()
UPDATE after the question was edited and the example data changed, here's an edit to suit the new example data:
Here's your example data (there's scope for simplification and speed gains here, but that's another question):
library(cluster)
require(ggplot2)
require(scales)
require(reshape2)
data(ruspini)
x <- as.matrix(ruspini[-1])
wss <- NULL
W=matrix(data=NA,ncol=10,nrow=100)
for(j in 1:100){
k=10
for(i in 1: k){
wss[i]=kmeans(x,i)$tot.withinss
}
W[j,]=as.matrix(wss)
}
max_Wmk <- matrix(data=NA, nrow=1,ncol=10)
for(i in 1:10){
max_Wmk[,i]=max(W[,i],na.rm=TRUE)
}
min_Wmk <- matrix(data=NA, nrow=1,ncol=10)
for(i in 1:10){
min_Wmk[,i]=min(W[,i],na.rm=TRUE)
}
w <- matrix(W[4,])
Here's what you need to do to make the three objects into vectors so you can make the data frame as expected:
max_Wmk <- as.numeric(max_Wmk)
min_Wmk <- as.numeric(min_Wmk)
w <- as.numeric(w)
Now reshape and plot as before...
df <- melt(data.frame(max_Wmk, min_Wmk, w, my_time = 1:10), id.var = 'my_time')
ggplot(df, aes(colour = variable, x = my_time, y = value)) +
geom_point(size = 3) +
geom_line() +
scale_y_continuous(labels = comma) +
theme_minimal()
And here's the result:

ggplot2 Scatter Plot Labels

I'm trying to use ggplot2 to create and label a scatterplot. The variables that I am plotting are both scaled such that the horizontal and the vertical axis are plotted in units of standard deviation (1,2,3,4,...ect from the mean). What I would like to be able to do is label ONLY those elements that are beyond a certain limit of standard deviations from the mean. Ideally, this labeling would be based off of another column of data.
Is there a way to do this?
I've looked through the online manual, but I haven't been able to find anything about defining labels for plotted data.
Help is appreciated!
Thanks!
BEB
Use subsetting:
library(ggplot2)
x <- data.frame(a=1:10, b=rnorm(10))
x$lab <- letters[1:10]
ggplot(data=x, aes(a, b, label=lab)) +
geom_point() +
geom_text(data = subset(x, abs(b) > 0.2), vjust=0)
The labeling can be done in the following way:
library("ggplot2")
x <- data.frame(a=1:10, b=rnorm(10))
x$lab <- rep("", 10) # create empty labels
x$lab[c(1,3,4,5)] <- LETTERS[1:4] # some labels
ggplot(data=x, aes(x=a, y=b, label=lab)) + geom_point() + geom_text(vjust=0)
Subsetting outside of the ggplot function:
library(ggplot2)
set.seed(1)
x <- data.frame(a = 1:10, b = rnorm(10))
x$lab <- letters[1:10]
x$lab[!(abs(x$b) > 0.5)] <- NA
ggplot(data = x, aes(a, b, label = lab)) +
geom_point() +
geom_text(vjust = 0)
Using qplot:
qplot(a, b, data = x, label = lab, geom = c('point','text'))

Resources