I want to add multiple vertical lines to a plot.
Normally you would specify abline(v=x-intercept) but my x-axis is in the form Jan-95 - Dec-09. How would I adapt the abline code to add a vertical line for example in Feb-95?
I have tried abline(v=as.Date("Jan-95")) and other variants of this piece of code.
Following this is it possible to add multiple vertical lines with one piece of code, for example Feb-95, Feb-97 and Jan-98?
An alternate solution could be to alter my plot, I have a column with month information and a column with the year information, how do I collaborate these to have a year month on the X-axis?
example[25:30,]
Year Month YRM TBC
25 1997 1 Jan-97 136
26 1997 2 Feb-97 157
27 1997 3 Mar-97 163
28 1997 4 Apr-97 152
29 1997 5 May-97 151
30 1997 6 Jun-97 170
The first note: your YRM column is probably a factor, not a datetime object, unless you converted it manually. I assume we do not want to do that and our plot is looking fine with YRM as a factor.
In that case
vline_month <- function(s) abline(v=which(s==levels(df$YRM)))
# keep original order of levels
df$YRM <- factor(df$YRM, levels=unique(df$YRM))
plot(df$YRM, df$TBC)
vline_month(c("Jan-97", "Apr-97"))
Disclaimer: this solution is a quick hack; it is neither universal nor scalable. For accurate representation of datetime objects and extensible tools for them, see packages zoo and xts.
I see two issues:
a) converting your data to a date/POSIX element, and
b) actually plotting vertical lines at specific rows.
For the first, create a proper date string then use strptime().
The second issue is resolved by converting the POSIX date to numeric using as.numeric().
# dates need Y-M-D
example$ymd <- paste(example$Year, '-', example$Month, '-01', sep='')
# convet to POSIX date
example$ymdPX <- strptime(example$ymd, format='%Y-%m-%d')
# may want to define tz otherwise system tz is used
# plot your data
plot(example$ymdPX, example$TBC, type='b')
# add vertical lines at first and last record
abline(v=as.numeric(example$ymdPX[1]), lwd=2, col='red')
abline(v=as.numeric(example$ymdPX[nrow(example)]), lwd=2, col='red')
Related
I have a dataframe like this
geo 2001 2002
Spain 21 23
Germany 34 50
Italy 57 89
France 19 13
As the names of 2nd an 3rd column are considered as number I'm not able to get a bar chart wth ggplot2. Is there any solution to set the column names to be considered as text?
data
pivot_dat <- read.table(text="geo 2001 2002
Spain 21 23
Germany 34 50
Italy 57 89
France 19 13",strin=F,h=T)
pivot_dat <- setNames(pivot_dat,c("geo","2001","2002"))
Here's how to do it :
library(ggplot2)
ggplot(pivot_dat, aes(x = geo, y = `2002`)) + geom_col()+ coord_flip()
by using ticks instead of quotes/double quotes you make sure you pass a name to the function and not a string.
If you use quotes, ggplot will convert this character value to a factor and recycle it, so all bars will have the same length of 1, and a label of value "2002".
Note 1 :
You might want to learn the difference between geom_col and geom_bar :
?ggplot2::geom_bar
In short geom_col is geom_bar with stat = "identity", which is what you want here since you want to show on your plot the raw values from your table.
Note 2:
aes_string can be used to give string instead of names but here it doesn't work as "2002" is evaluated as a number :
ggplot(pivot_dat, aes_string(x = "geo", y = "2002")) +
geom_col()+ coord_flip() # incorrect output
ggplot(pivot_dat, aes_string(x = "geo", y = "`2002`")) +
geom_col()+ coord_flip() # correct output
Without an example to see exactly what your problem is, and what you want, it is hard to give you a perfect answer. But here's the thing.
You can do a geom_bar with numeric data. There are 3 possible ways I see that you could have problems (but I may not be able to guess every way.
First, let's set up the r for plotting.
library(readr)
library(ggplot2)
test <- read_csv("geo,2001,2002
Spain,21,23
Germany,34,50
Italy,57,89
France,19,13")
Next, let's make the first mistake...incorrectly calling the column name. In the next example I will tell ggplot to make a bar of the number 2001. Not the column 2001! r has to guess whether we mean 2001 or whether we mean the object 2001. By default it always picks the number instead of the column.
ggplot(test) +
geom_bar(aes(x=2001))
Ok, that just gives you a bar at 2001...because you gave it a single number input instead of a column. Let's fix that. Use the right facing quotes `` to identify the column name 2001 instead of the number 2001.
ggplot(test) +
geom_bar(aes(x=`2001`))
This creates a perfectly workable bar chart. But maybe you don't want the spaces? That's the only possible reason you would use text instead of a number. But you want text so I'm going to show you how to use as.factor to do something similar (and more powerful).
ggplot(test) +
geom_bar(aes(x=as.factor(`2001`)))
Here I have a data frame which looks like following way,with the first column "POSIXct" and second "latitude"
> head(b)
sample_time latitude
3813442 2015-05-21 19:02:41 39.92770
3813483 2015-05-21 19:03:16 39.92770
3813485 2015-05-21 19:14:30 39.92433
3813515 2015-05-21 19:14:59 39.92469
3813550 2015-05-21 19:15:30 39.92520
3813585 2015-05-21 19:16:00 39.92585
Now,I want to plot latitude vs sample_time, with x axis representing 24 hours timestamp within a single day and group latitude by different days.
Any help will be appreciated!Many thanks.
First, you need to define "day", as opposed to the full time. Then you need to figure out what you mean by "group" ... let's just say you want to aggregate and take the daily mean. Third, you need to make the plot.
b$day <- round.Date(b[,"sample_time"], units="days")
b_agg <- aggregate(list(sample_time=b[,"sample_time"]), by=list(day=b[,"day"]), FUN=mean)
plot(b_agg)
Edit:
Just an additional thought, if you didn't want to aggregate, you could skip the second step, and change the third to plot(b[,"day"], b[,"latitude"]. Alternatively, you may even want something like boxplot(latitude~day, data=b).
I've seen several questions about the order of x axis marks but still none of them could solve my problem.
I'm trying to do a density plot which shows the distribution of people by percentile within each score given like this
library(dplyr); library(ggplot2); library(ggtheme)
ggplot(KA,aes(x=percentile,group=kscore,color=kscore))+
xlab('Percentil')+ ylab('Frecuencia')+ theme_tufte()+ ggtitle("Prospectos")+
scale_color_brewer(palette = "Greens")+geom_density(size=3)
but the x axis mark gets ordered like 1,10,100,11,12,..,2,20,21,..,99 instead of just 1,2,3,..,100 which is my desired output
I fear this affects the whole plot not just the labels
I'll turn my comment to an answer so this can be marked resolved:
Your x variable is (almost certainly) a factor. You probably want it to be numeric.
KA$percentile = as.numeric(as.character(KA$percentile))
When you're seeing weird stuff, it's good to check on your data. Running str(KA) is a good way to see what's there. If you just want to see classes, sapply(KA, class) is a nice summary.
And it's a common R quirk that if you're converting from factor to numeric, go by way of character or you risk ending up with just the level numbers:
year_fac = factor(1998:2002)
as.numeric(year_fac) # not so good
# [1] 1 2 3 4 5
as.numeric(as.character(year_fac)) # what you want
# [1] 1998 1999 2000 2001 2002
I have a data frame that looks like this (though thousands of times larger).
df<-data.frame(sample(1:100,10,replace=F),sample(1:100,10,replace=F),runif(10,0,1),runif(10,0,1),runif(10,0,1), rep(c("none","summer","winter","sping","allyear"),2))
names(df)<-c("Mother","ID","Wavelength1","Wavelength2","Wavelength3","WaterTreatment")
df
Mother ID Wavelength1 Wavelength2 Wavelength3 WaterTreatment
1 2 34 0.9143670 0.03077356 0.82859497 none
2 24 75 0.6173382 0.05958151 0.66552338 summer
3 62 77 0.2655572 0.63731302 0.30267893 winter
4 30 98 0.9823510 0.45690437 0.40818031 sping
5 4 11 0.7503750 0.93737900 0.24909228 allyear
6 55 76 0.6451885 0.60138475 0.86044856 none
7 97 21 0.5711019 0.99732068 0.04706894 summer
8 87 14 0.7699293 0.81617911 0.18940531 winter
9 92 30 0.5855559 0.70152698 0.73375917 sping
10 93 44 0.1040359 0.85259166 0.37882469 allyear
I want to plot wavelength values on the y axis, and wavelength on the x. I have two ways of doing this:
First method which works, but uses base plot and requires more code than should be necessary:
colors=c("red","blue","green","orange","yellow")
plot(0,0,xlim=c(1,3),ylim=c(0,1),type="l")
for (i in 1:10) {
if (df$WaterTreatment[i]=="none"){
a<-1
} else if (df$WaterTreatment[i]=="allyear") {
a<-2
}else if (df$WaterTreatment[i]=="summer") {
a<-3
}else if (df$WaterTreatment[i]=="winter") {
a<-4
}else if (df$WaterTreatment[i]=="spring") {
a<-5
}
lines(seq(1,3,1),df[i,3:5],type="l",col=colors[a])
}
Second method: I attempt to melt the data to put it in long form, then use ggplot2. The plot it produces is not correct because there is a line for each water treatment, rather than a line for each "Mother" "ID" (the unique identifier, what were the rows in the original data frame).
require(reshape2)
require(data.table)
df_m<-melt(df,id.var=c("Mother","ID","WaterTreatment"))
df_m$variable<-as.numeric(df_m$variable) #sets wavelengths to numeric
qplot(x=df_m$variable,y=df_m$value,data=df_m,color=df_m$WaterTreatment,geom = 'line')
There is probably something simple I'm missing about ggplot2 that would fix the plotting of the lines. I'm a newbie with ggplot, but am working to get more familiar with it and would like to use it in this application.
But more broadly, is there an efficient way to plot this type of wide form data in ggplot2? The time it takes to transform/melt the data is enormous and I'm wondering if it is worth it, or if there is some kind of work around that can eliminate the redundant cells created when melting.
Thanks for your help, if you need more clarity on this question please let me know and I can edit.
I'd like to point out that you are basically re-inventing an existing base plotting function, namely matplot. This could replace your plot and for-loop:
matplot(1:3, t( df[ ,3:5] ), type="l",col=colors[ as.numeric(df$WaterTreatment)] )
With that in mind you might want to search SO for: [r] matplot ggplot2, as I did, and see if this see if this or any of the other hits are effective.
It looks like you want a separate line for each ID, but you want the lines colored based on the value of WaterTreatment. If so, you can do it like this in ggplot:
ggplot(df_m, aes(x=variable, y=value, group=ID, colour=WaterTreatment)) +
geom_line() + geom_point()
You can also use faceting to make it easier to see the different levels of WaterTreatment
ggplot(df_m, aes(x=variable, y=value, group=ID, colour=WaterTreatment)) +
geom_line() + geom_point() +
facet_grid(WaterTreatment ~ .)
To answer your general question: ggplot is set up to work most easily and powerfully with a "long" (i.e., melted) data frame. I guess you could work with a "wide" data frame and plot separate layers for each combination of factors you want to plot. But that would be a lot of extra work compared to a single melt command to get your data into the right format.
I am trying to plot histograms with long term (several years) mean precipitation (pp) for each day of the month from a series of files. Each file has data collected from a different place (and has a different code). Each of my files looks like this:
X code year month day pp
1 2867 1945 1 1 0.0
2 2867 1945 1 2 0.0
...
And I am using the following code:
files <- list.files(pattern=".csv")
par(mfrow=c(4,6))
for (i in 1:24) {
obs <- read.table(files[i],sep=",", header=TRUE)
media.dia <- ddply(obs, .(day), summarise, daily.mean<-mean(pp))
codigo <- unique(obs$code)
hist(daily.mean, main=c("hist per day of month", codigo))
}
I get 24 histograms with 24 different codes in the title, but instead of 24 DIFFERENT histograms from 24 different locations, I get the same histogram 24 times (with 24 different titles). Can anybody tell me why? Thanks!
There are at least two errors I can see in your code.
There is an error in your ddply statement.
You are passing the wrong variable to hist, thus plotting something that may or may not exist depending on previous session actions.
The problem in your ddply statement is that you are doing an invalid assign (using <- ). Fix this by using =:
media.dia<- ddply(obs, .(day),summarise, daily.mean = mean(pp))
Then edit your hist statement:
hist(media.dia$daily.mean,main=c("hist per day of month",codigo))
I suspect the problem is that you are not passing the correct parameter to hist. The reason that your code actually produces a plot at all, is because in some previous step in your session you must have created a variable called daily.mean (as Brandon points out in the comment.)
I think the daily.mean calculated in the ddply function is assigned in a separate environment, and does not exist in an environment hist can see.
Try daily.mean<<-mean(pp)