I want to plot my .csv data (I named it p) on R using ggplot2 but I am having difficulties.
Time d d d d c m m m m c c c........... (top row of data p)
there are 14 rows and there are 304 columns. First column is time and rest are d c m so on ......I want to plot Time on x axis against rest 303 on y axis on a single plot window and these 303 graph lines to be distinguished by Color.
the top row has letters like d c m.. theese are my 3 forest Groups: coniferous, deciduous, mixed. so i want all graph lines with 'd' to be grouped in one particular Color. then 'c' in another Color and 'm' in some other.
I found a way to do that using ggplot
ggplot(p, aes(x = Time, group = 1)) +
geom_line(aes(y = d), colour="blue") +
geom_line(aes(y = c), colour = "red") +
geom_line(aes(y = m), colour = "green") +
ylab(label="NDVI") + xlab("Time")
but from 303 i have 117 columns of d
77 for c
109 for m
What code should I use so R would plot all columns by giving all ds , cs and ms different Color?
Please help I have been stuck on this for days.
Here is a proposition I hope will suit your needs:
require(ggplot2)
require(reshape2)
time <- 1:14 # your 14 rows
# here data with only 3 forest types, consider using yours instead !
data.test <- data.frame(time=time, d=runif(14, max=1)*time, c=runif(14, max=2)*time, m=runif(14, max=3)*time)
#reshape your data for it to suit the ggploting
data.test <- melt(data.test, id.vars="time", variable.name="forest", value.name="y")
# add a numeric version of the factor forest (for gradient color)
data.test$numeric.forest <- rep(1:3, each=14) # replace 1:3 by 1:304
# first plot, each forest type got a color ... should not be readable with your dimensions (304 lines)
ggplot(data.test) + geom_line(aes(x=time, y=y, group=forest, color=forest))
# if you have 304 forest types, you might consider using color gradient,
# but you need to identify a logical order in your forest, for the gradient to be informative ...
ggplot(data.test) + geom_line(aes(x=time, y=y, group=forest, color=numeric.forest), alpha=.9)
# consider playing with alpha (transparency) for your lines to be readable
Related
I am trying to show different growing season lengths by displaying crop planting and harvest dates at multiple regions.
My final goal is a graph that looks like this:
which was taken from an answer to this question. Note that the dates are in julian days (day of year).
My first attempt to reproduce a similar plot is:
library(data.table)
library(ggplot2)
mydat <- "Region\tCrop\tPlanting.Begin\tPlanting.End\tHarvest.Begin\tHarvest.End\nCenter-West\tSoybean\t245\t275\t1\t92\nCenter-West\tCorn\t245\t336\t32\t153\nSouth\tSoybean\t245\t1\t1\t122\nSouth\tCorn\t183\t336\t1\t153\nSoutheast\tSoybean\t275\t336\t1\t122\nSoutheast\tCorn\t214\t336\t32\t122"
# read data as data table
mydat <- setDT(read.table(textConnection(mydat), sep = "\t", header=T))
# melt data table
m <- melt(mydat, id.vars=c("Region","Crop"), variable.name="Period", value.name="value")
# plot stacked bars
ggplot(m, aes(x=Crop, y=value, fill=Period, colour=Period)) +
geom_bar(stat="identity") +
facet_wrap(~Region, nrow=3) +
coord_flip() +
theme_bw(base_size=18) +
scale_colour_manual(values = c("Planting.Begin" = "black", "Planting.End" = "black",
"Harvest.Begin" = "black", "Harvest.End" = "black"), guide = "none")
However, there's a few issues with this plot:
Because the bars are stacked, the values on the x-axis are aggregated and end up too high - out of the 1-365 scale that represents day of year.
I need to combine Planting.Begin and Planting.End in the same color, and do the same to Harvest.Begin and Harvest.End.
Also, a "void" (or a completely uncolored bar) needs to be created between Planting.Begin and Harvest.End.
Perhaps the graph could be achieved with geom_rect or geom_segment, but I really want to stick to geom_bar since it's more customizable (for example, it accepts scale_colour_manual in order to add black borders to the bars).
Any hints on how to create such graph?
I don't think this is something you can do with a geom_bar or geom_col. A more general approach would be to use geom_rect to draw rectangles. To do this, we need to reshape the data a bit
plotdata <- mydat %>%
dplyr::mutate(Crop = factor(Crop)) %>%
tidyr::pivot_longer(Planting.Begin:Harvest.End, names_to="period") %>%
tidyr::separate(period, c("Type","Event")) %>%
tidyr::pivot_wider(names_from=Event, values_from=value)
# Region Crop Type Begin End
# <chr> <fct> <chr> <int> <int>
# 1 Center-West Soybean Planting 245 275
# 2 Center-West Soybean Harvest 1 92
# 3 Center-West Corn Planting 245 336
# 4 Center-West Corn Harvest 32 153
# 5 South Soybean Planting 245 1
# ...
We've used tidyr to reshape the data so we have one row per rectangle that we want to draw and we've also make Crop a factor. We can then plot it like this
ggplot(plotdata) +
aes(ymin=as.numeric(Crop)-.45, ymax=as.numeric(Crop)+.45, xmin=Begin, xmax=End, fill=Type) +
geom_rect(color="black") +
facet_wrap(~Region, nrow=3) +
theme_bw(base_size=18) +
scale_y_continuous(breaks=seq_along(levels(plotdata$Crop)), labels=levels(plotdata$Crop))
The part that's a bit messy here that we are using a discrete scale for y but geom_rect prefers numeric values, so since the values are factors now, we use the numeric values for the factors to create ymin and ymax positions. Then we need to replace the y axis with the names of the levels of the factor.
If you also wanted to get the month names on the x axis you could do something like
dateticks <- seq.Date(as.Date("2020-01-01"), as.Date("2020-12-01"),by="month")
# then add this to you plot
... +
scale_x_continuous(breaks=lubridate::yday(dateticks),
labels=lubridate::month(dateticks, label=TRUE, abbr=TRUE))
I'm currently trying to develop a surface plot that examines the results of the below data frame. I want to plot the increasing values of noise on the x-axis and the increasing values of mu on the y-axis, with the point estimate values on the z-axis. After looking at ggplot2 and ggplotly, it's not clear how I would plot each of these columns in surface or 3D plot.
df <- "mu noise0 noise1 noise2 noise3 noise4 noise5
1 1 0.000000 0.9549526 0.8908646 0.919630 1.034607
2 2 1.952901 1.9622004 2.0317115 1.919011 1.645479
3 3 2.997467 0.5292921 2.8592976 3.034377 3.014647
4 4 3.998339 4.0042379 3.9938346 4.013196 3.977212
5 5 5.001337 4.9939060 4.9917115 4.997186 5.009082
6 6 6.001987 5.9929932 5.9882173 6.015318 6.007156
7 7 6.997924 6.9962483 7.0118066 6.182577 7.009172
8 8 8.000022 7.9981131 8.0010066 8.005220 8.024569
9 9 9.004437 9.0066182 8.9667536 8.978415 8.988935
10 10 10.006595 9.9987245 9.9949733 9.993018 10.000646"
Thanks in advance.
Here's one way using geom_tile(). First, you will want to get your data frame into more of a Tidy format, where the goal is to have columns:
mu: nothing changes here
noise: need to combine your "noise0", "noise1", ... columns together, and
z: serves as the value of the noise and we will apply the fill= aesthetic using this column.
To do that, I'm using dplyr and gather(), but there are other ways (melt(), or pivot_longer() gets you that too). I'm also adding some code to pull out just the number portion of the "noise" columns and then reformatting that as an integer to ensure that you have x and y axes as numeric/integers:
# assumes that df is your data as data.frame
df <- df %>% gather(key="noise", value="z", -mu)
df <- df %>% separate(col = "noise", into=c('x', "noise"), sep=5) %>% select(-x)
df$noise <- as.integer(df$noise)
Here's an example of how you could plot it, but aesthetics are up to you. I decided to also include geom_text() to show the actual values of df$z so that we can see better what's going on. Also, I'm using rainbow because "it's pretty" - you may want to choose a more appropriate quantitative comparison scale from the RColorBrewer package.
ggplot(df, aes(x=noise, y=mu, fill=z)) + theme_bw() +
geom_tile() +
geom_text(aes(label=round(z, 2))) +
scale_fill_gradientn(colors = rainbow(5))
EDIT: To answer OP's follow up, yes, you can also showcase this via plotly. Here's a direct transition:
p <- plot_ly(
df, x= ~noise, y= ~mu, z= ~z,
type='mesh3d', intensity = ~z,
colors= colorRamp(rainbow(5))
)
p
Static image here:
A much more informative way to show this particular set of information is to see the variation of df$z as it relates to df$mu by creating df$delta_z and then using that to plot. (you can also plot via ggplot() + geom_tile() as above):
df$delta_z <- df$z - df$mu
p1 <- plot_ly(
df, x= ~noise, y= ~mu, z= ~delta_z,
type='mesh3d', intensity = ~delta_z,
colors= colorRamp(rainbow(5))
)
Giving you this (static image here):
ggplot accepts data in the long format, which means that you need to melt your dataset using, for example, a function from the reshape2 package:
dfLong = melt(df,
id.vars = "mu",
variable.name = "noise",
value.name = "meas")
The resulting column noise contains entries such as noise0, noise1, etc. You can extract the numbers and convert to a numeric column:
dfLong$noise = with(dfLong, as.numeric(gsub("noise", "", noise)))
This converts your data to:
mu noise meas
1 1 0 1.0000000
2 2 0 2.0000000
3 3 0 3.0000000
...
As per ggplot documentation:
ggplot2 can not draw true 3D surfaces, but you can use geom_contour(), geom_contour_filled(), and geom_tile() to visualise 3D surfaces in 2D.
So, for example:
ggplot(dfLong,
aes(x = noise
y = mu,
fill = meas)) +
geom_tile() +
scale_fill_gradientn(colours = terrain.colors(10))
Produces:
I have two data frames z (1 million observations) and b (500k observations).
z= Tracer time treatment
15 0 S
20 0 S
25 0 X
04 0 X
55 15 S
16 15 S
15 15 X
20 15 X
b= Tracer time treatment
2 0 S
35 0 S
10 0 X
04 0 X
20 15 S
11 15 S
12 15 X
25 15 X
I'd like to create grouped boxplots using time as a factor and treatment as colour. Essentially I need to bind them together and then differentiate between them but not sure how. One way I tried was using:
zz<-factor(rep("Z", nrow(z))
bb<-factor(rep("B",nrow(b))
dumB<-merge(z,zz) #this won't work because it says it's too big
dumB<-merge(b,zz)
total<-rbind(dumB,dumZ)
But z and zz merge won't work because it says it's 10G in size (which can't be right)
The end plot might be similar to this example: Boxplot with two levels and multiple data.frames
Any thoughts?
Cheers,
EDIT: Added boxplot
I would approach it as follows:
# create a list of your data.frames
l <- list(z,b)
# assign names to the dataframes in the list
names(l) <- c("z","b")
# bind the dataframes together with rbindlist from data.table
# the id parameter will create a variable with the names of the dataframes
# you could also use 'bind_rows(l, .id="id")' from 'dplyr' for this
library(data.table)
zb <- rbindlist(l, id="id")
# create the plot
ggplot(zb, aes(x=factor(time), y=Tracer, color=treatment)) +
geom_boxplot() +
facet_wrap(~id) +
theme_bw()
which gives:
Other alternatives for creating your plot:
# facet by 'time'
ggplot(zb, aes(x=id, y=Tracer, color=treatment)) +
geom_boxplot() +
facet_wrap(~time) +
theme_bw()
# facet by 'time' & color by 'id' instead of 'treatment'
ggplot(zb, aes(x=treatment, y=Tracer, color=id)) +
geom_boxplot() +
facet_wrap(~time) +
theme_bw()
In respons to your last comment: to get everything in one plot, you use interaction to distinguish between the different groupings as follows:
ggplot(zb, aes(x=treatment, y=Tracer, color=interaction(id, time))) +
geom_boxplot(width = 0.7, position = position_dodge(width = 0.7)) +
theme_bw()
which gives:
The key is you do not need to perform a merge, which is computationally expensive on large tables. Instead assign a new variable and value (source c(b,z) in my code below) to each dataframe and then rbind. Then it becomes straight forward, my solution is very similar to #Jaap's just with different faceting.
library(ggplot2)
#Create some mock data
t<-seq(1,55,by=2)
z<-data.frame(tracer=sample(t,size = 10,replace = T), time=c(0,15), treatment=c("S","X"))
b<-data.frame(tracer=sample(t,size = 10,replace = T), time=c(0,15), treatment=c("S","X"))
#Add a variable to each table to id itself
b$source<-"b"
z$source<-"z"
#concatenate the tables together
all<-rbind(b,z)
ggplot(all, aes(source, tracer, group=interaction(treatment,source), fill=treatment)) +
geom_boxplot() + facet_grid(~time)
So I have a data frame which I will call R. Looks something like this:
zep SEX AGE BMI
1 O F 3.416667 16.00000
2 O F 3.833333 14.87937
3 O G 3.416667 14.80223
4 O F 4.000000 15.09656
5 N G 3.666667 16.50000
6 O G 4.000000 16.49102
7 N G 3.916667 16.02413
With this data frame I want to plot multiple histograms comparing different aspects like how gender effects BMI. Like so:
par(mfrow=c(1,3)
boxplot(DF$BMI ~ DF$ZEP)
boxplot(DF$BMI ~ DF$GENDER)
boxplot(DF$BMI ~ ~ DF$AGE)
But for some reason the columns are made to be in characters instead of factors.
Now I pose this, is there a way to plot these if they are characters? If not,what can I do?
Also is there a way maybe to change zep and sex into a vector of logical factors? Maybe like in zep if O then true (1) if not then false (0), and the same thing for SEX. If G then true (1) if not then false (0).
I have to plot categorical variables for me advanced data analysis class. I can help you out. beedstands for border entry and employment data, don't steal my research plz.
The code I use to create factors is for example: (I have a column called portname that is dummy variables in a column, to create a column with factor variables (the names) This is how I would make the logical you describe. I've added that code with the larger code chunk below.
beed$portdisc <- as.numeric(beed$portname)
beed$portdisc[beed$portdisc==0] <- "Columbus Port of Entry"
beed$portdisc[beed$portdisc==1] <- "Santa Teresa Port of Entry"
beed$portdisc[beed$portdisc==2] <- "New Mexico All Ports Aggregate"
So what I've done here is taken by dataframe beed and used the specific column containing my portname variables. I add a new column to my dataframe called beed$portdisc then using the [ ] I define what I want to label as what.
In your case I think this should work (think, but I've tested by using the data you provided).
I have a hard time making the labels come out right with discrete variables. My apologies but this gets you very close.
library(ggplot2)
DF$SEX.factor <- as.character(DF$SEX)
DF$SEX.factor[DF$SEX.factor== "G"] <- "0"
DF$SEX.factor[DF$SEX.factor== "F"] <- "1"
DF$SEX.factor <- as.factor(DF$SEX.factor)
bar <- ggplot()
bar <- bar + geom_bar(data = DF$Sex.factor, aes(x=DF$SEX.factor),binwidth = .5)+ xlab("Sex")
bar <- bar + scale_x_discrete(limits = c(0,1,2), breaks= c(0,1,2), labels = c(" ","Male" ,"Female"))
bar
# DF.BMI5 = cut(DF$BMI,pretty(DF$BMI,5)) # Creates close to 5 integer ranges as factors, actomatically chooses pretty scales.
# This would be good to compair say age and BMI, best with one discreate and one continious variable
p <- ggplot(DF, aes(x = SEX.factor, y = BMI))
p <- p + geom_boxplot(width = 0.25, alpha = 0.4)
p <- p + geom_jitter(position = position_jitter(width = 0.1), alpha = .35, color = "blue")
# diamond at mean for each group
p <- p + stat_summary(fun.y = mean, geom = "point", shape = 18, size = 6,
colour = "red", alpha = 0.8)
p <- p + scale_x_discrete(limits = c(0,1,2), breaks= c(0,1,2), labels = c(" ","Male" ,"Female")) + xlab("Sex")
p
Here is what I got when I ran this code on my own data. I think this is what you're looking to create, I've included the code above. It'll work with anything where x is a discrete variable, just use the at.factor() and set y as type continuous. function/
If you need any more help just let me know, I like to help out people on here because it helps me hone my R skills. I'm more of an Visual Studio kind of guy, VBA is my friend.
Hope this helps!
If you ever need to change a character to a factor, you can always use as.factor('A'), for instance.
I have data conditioned on two variables, one major condition, one minor condition. I want a xyplot (lattice) with points and lines (type='b'), in one panel so that the major condition determines the color and the minor condition is used for drawing the lines.
Here is an example that is representative of my problem (see the code below to produce the data frame). d is the major condition, and c is the minor condition.
> dat
x y c d
1 1 0.9645269 a A
2 2 1.4892217 a A
3 3 1.4848654 a A
....
10 10 2.4802803 a A
11 1 1.5606218 b A
12 2 1.5346806 b A
....
98 8 2.0381943 j B
99 9 2.0826099 j B
100 10 2.2799917 j B
The way to get the connecting lines to be conditioned on c is to use groups=c in the plot. Then the way to tell them apart is to use a formula conditioned on d:
xyplot(y~x|d, data=dat, type='b', groups=c)
However, I want the plots in the same panel. Removing the formula condition on d produces one panel, but when group=d is specified, there are "retrace" lines drawn:
xyplot(y~x, data=dat, type='b', groups=d, auto.key=list(space='inside'))
What I want looks very like the above plot, only without these "retrace" lines.
It's possible to set the colors explicitly in this example, as I know that there are five lines of category 'A' followed by five of category 'B', but this won't easily work for my real problem. In addition, auto.key is useless when setting the colors this way:
xyplot(y~x, data=dat, type='b', groups=c, col=rep(5:6, each=5))
The data:
set.seed(1)
dat <- do.call(
rbind,
lapply(1:10,
function(x) {
firsthalf <- x < 6
data.frame(x=1:10, y=log(1:10 + rnorm(10, .25) + 2 * firsthalf),
c=letters[x],
d=LETTERS[2-firsthalf]
)
}
)
)
The default graphical parameters are obtained from the superpose.symbol and superpose.line. One solution s to set them using par.settings argument.
## I compute the color by group
col <-by(dat,dat$c,
FUN=function(x){
v <- ifelse(x$d=='A','darkgreen','orange')
v[1] ## I return one parameter , since I need one color
}
)
xyplot(y~x, data=dat, type='b', groups=c,
auto.key = list(text =levels(dat$d),points=F),
par.settings=
list(superpose.line = list(col = col), ## color of lines
superpose.symbol = list(col=col), ## colors of points
add.text = list(col=c('darkgreen','orange')))) ## color of text in the legend
Does it have to be lattice? In ggplot it is rather easy:
library(ggplot2)
ggplot(dat, aes(x=x,y=y,colour=d)) + geom_line(aes(group=c),size=0.8) + geom_point(shape=1)
This is a quick and dirty example. You can customize the colour of the lines, the legend , the axis, the background,...