I have been trying to plot a 3d plot of my data but I cannot figure out how to overcome some errors. Any help is highly appreciated.
>head(d1) #produced through the melt function as seen below
Date variable value
1 2007 Q2 0.890 1.1358560
2 2007 Q3 0.890 1.1560433
3 2007 Q4 0.890 0.3747925
4 2008 Q1 0.890 0.3866533
5 2008 Q2 0.890 0.3872620
6 2008 Q3 0.890 0.3844887
I have successfully managed to plot a heatmap using this:
d1<-melt(mydata,id.vars = "Date")
P1 <- ggplot(data=d1, aes(x=Date, y=variable, fill=value)) +
geom_tile() +
ggtitle("My heatmap") +scale_fill_gradientn(colors=colorRampPalette(c("lightgray","royalblue","seagreen","orange","red","brown"))(500),name="Variable") +
labs(x = "Quarter",y="Alpha") +
theme_bw()
ggplotly(P1)
*Don't know how to automatically pick scale for object of type yearqtr. Defaulting to continuous.*
However, I want to create a 3d plot.
open3d()
rgl.surface(x=d1$variable, y=d1$Date,
coords=c(1,3,2),z=d1$value,
color=colorzjet[ findInterval(d1$value, seq(min(d1$value), max(d1$value), length=100))] )
axes3d()
Error in rgl.surface(x = d1$variable, y = d1$Date, coords = c(1, 3, 2), :
'y' length != 'x' rows * 'z' cols
plot_ly(x=d1$Date,y=d1$variable,z=d1$value,type="surface",colors=colors)
Error: `z` must be a numeric matrix
I have tried to use as.matrix(apply(d1,2,as.numeric)), but this returns NAs to the date argument.
Could it be the nature of the Quarterly dates that messes up the graph? (because even the heat map doesn't show the dates as Quarterly. Any tips?
dput(d1) output here: dput(d1) output
The file you uploaded is a CSV file, not dput output. But you can read it and plot it like this:
d1csv <- read.csv("dput_output.csv")
year <- as.numeric(sub(" .*", "", d1csv$Date))
quarter <- as.numeric(sub(".*Q", "", d1csv$Date))
Date <- matrix(year + (quarter - 1)/4, 55)
variable <- matrix(d1csv$variable, 55)
value <- matrix(d1csv$value, 55)
persp3d(Date, variable, value, col = "red")
This gives the following plot:
Related
I have a dataframe containing 2479 peptides with their sequence, p-value and logfold change.
# A tibble: 6 x 3
Sequence p log2fold
<chr> <dbl> <dbl>
1 FLENEDR 0.343 1.21
2 DTEEEDFHVDQATTVK 0.270 0.771
3 DTEEEDFHVDQATTVK 0.112 1.18
4 SCRASQSVSSSF 0.798 0.139
5 RLSCTTSGF 0.739 0.110
6 SCRASQSVSSSY 0.209 0.375
I'm trying to make a volcano plot while labelling the up and downregulated peptides. However, for some reason, ggplot only uses 6 labels. I have no idea why.
I have trying loads of different things. I tried using up and downregulation in expression column, I tried increasing and decreasing my cut-off values to check if this was a problem. I used ggrepel to try and center them out more. Nothing seems to be working. My latest tries with the code is in this code.
Basically as a last resort I made a new group and only took the significant and fold change peptides with me, resulting in 39 peptides. Then I used this as header and matched peptides between the two dataframes.
Another problem that arises is in my legend, a character appears since using geom_text_repel. I have no idea how or why this is happening.
library(ggplot2)
library(ggrepel)
library(tidyverse)
Volc <- R_volcano
expression <- ifelse(Volc$p < 0.05 & abs (Volc$log2fold) >=1, ifelse(Volc$log2fold>1, 'up', 'down'), 'stable')
Volc <- cbind(Volc, expression)
colnames(Volc)[1] <- 'Sequencenames'
Volc["group"] <- "NotSignificant"
Volc[which(Volc['p'] < 0.05 & abs(Volc['log2fold']) < 1 ),"group"] <- "Significant"
Volc[which(Volc['p'] > 0.05 & abs(Volc['log2fold']) > 1 ),"group"] <- "FoldChange"
Volc[which(Volc['p'] < 0.05 & abs(Volc['log2fold']) > 1 ),"group"] <- "Significant&FoldChange"
VolcFilter <- Volc %>% filter(group=="Significant&FoldChange")
p <- ggplot(data = Volc, aes(x = log2fold, y = -log10(p), colour=expression, label='Sequencenames')) +
geom_point(alpha=0.4, size=2) +
scale_color_manual(values=c("blue", "grey","red"))+
xlim(c(-4.5, 4.5)) +
geom_vline(xintercept=c(-1,1),lty=4,col="black",lwd=0.8) +
geom_hline(yintercept = 1.301,lty=4,col="black",lwd=0.8) +
geom_text_repel(data=head(VolcFilter), aes(label=Sequencenames))+
labs(x="log2(fold change)",
y="-log10 (p-value)",
title="Differential expression") +
theme_bw()+
theme(plot.title = element_text(hjust = 0.5),
legend.position="right",
legend.title = element_blank())
p
Any help is much appreciated. Fairly new to R.
I have the following df + code example: (my df is from 01/2016 - 04/2020)
Date A01_Price A02_Price A03_Price A04_Price A05_Price A06_Price A07_Price A08_Price A09_Price A10_Price A11_Price A12_Price A13_Price A14_Price A15_Price
1 2016-01-04 49.5010 21.6400 90.0100 93.676 81.6110 27.6450 28.4600 44.1930 33.0140 216.5460 36.201 41.7360 25.495 16.200 69.197
2 2016-01-05 49.7855 21.9870 88.5695 92.329 82.4590 28.2235 28.2790 44.8180 34.0180 218.5370 35.929 40.8520 25.431 16.157 69.828
3 2016-01-06 49.0595 21.5060 87.4735 88.601 81.4320 28.1725 27.4720 43.5065 36.2670 207.5960 35.256 39.9220 25.076 16.061 66.947
4 2016-01-07 47.7785 20.8415 82.8735 83.725 78.9340 26.7820 26.3485 39.0500 34.3570 203.4775 33.967 38.2115 24.062 15.738 64.135
5 2016-01-08 47.7435 20.2600 82.9275 82.609 79.0000 26.4495 26.1980 41.0055 33.1400 199.9250 33.361 37.3630 23.993 15.641 63.434
6 2016-01-09 47.8160 20.3800 83.0530 83.503 79.3925 26.4900 26.2460 41.0680 33.1910 200.9835 33.447 37.4530 24.110 15.734 63.510
7 2016-01-10 47.7770 20.4750 82.9860 83.325 79.6450 26.4680 26.2200 41.0340 33.1640 201.3680 33.401 37.4050 23.998 15.713 63.469
8 2016-01-11 48.8095 20.8440 83.0320 83.513 78.6720 26.1275 26.5150 40.3455 33.9770 202.6630 33.516 37.5030 23.947 15.753 65.583
9 2016-01-12 48.9545 21.0340 83.7325 85.732 81.0900 27.0205 27.5920 41.3570 35.6450 205.9610 34.443 38.0980 24.004 15.939 66.032
10 2016-01-13 48.0195 20.4640 82.6305 81.151 81.1780 27.1925 26.8050 39.2130 35.3825 197.9070 33.023 37.5945 23.423 15.737 64.682
I would like to have the long and short exponential moving average for each "Axy_Price" displayed in the plot for my function. Like in this graphic. It would be great if I could somehow implement it in ggplot, because I want to show more variables in the plot. Anybody got an idea? I've tried this so far:
library(plyr)
library(tidyverse)
library(quantmod)
library(TTR)
library(zoo)
library(PerformanceAnalytics)
library(xts)
#### EMA function ####
EMAf <- function (price,n){
ema <- c()
ema[1:(n-1)] <- NA
ema[n]<- mean(price[1:n])
beta <- 2/(n+1)
for (i in (n+1):length(price)){
ema[i]<-beta * price[i] +
(1-beta) * ema[i-1]
}
ema <- reclass(ema,price)
return(ema)
}
#
df_A01 <- df %>%
dplyr::select(Date, A01_Price)
#
EMAf_A01 <- ddply(df_A01, "A01", f)
#
library(ggplot2)
ggplot(df_A01, aes(x=Year, y=Value)) +
geom_line(mapping=aes(shape=Type), size=0.5) +
theme_bw() +
geom_line(data = madf, mapping=aes(x = Date, y = EMAf_A01, linetype=Type,
color = A01), size = 1) +
ylab(expression(paste("mean",mu, "g",C~L^{-1}, day^{-1}))) +
theme(legend.key = element_blank())
I also tried it with ChartSeries but i get a zoo error:
#
xts_stock01L <- as.xts(df_A01[, c(2)], order.by=df_A01[[1]])
#
chartSeries(xts_stock01L,
subset="2016-01::2020-04",
theme=chartTheme("white"))
addEMA(n=30,on=1,col = "blue")
addEMA(n=200,on=1,col = "red")
Maybe I need to approach this whole thing differently
You can add lines one by one and use TTR::EMA function.
Code:
library(TTR)
library(quantmod)
getSymbols("AAPL")
df = as.data.frame(AAPL)
library(ggplot2)
df$EMA_short = EMA(df$AAPL.Close,1000)
df$EMA_long = EMA(df$AAPL.Close,100)
df$time = rownames(df) %>% as.POSIXct()
df = na.omit(df)
ggplot(df) + geom_line(aes(y=EMA_short,x =time),col="green") + geom_line(aes(y=EMA_long,x=time),col="red") + geom_line(aes(y=AAPL.Close,x=time))
If you want to plot lines by one command you should melt df, there some nice posts how to do that.
I think the problem might be that are using multiple data frame. Let all the variables be in one file.
Using the following code:
library("ggplot2")
require(zoo)
args <- commandArgs(TRUE)
input <- read.csv(args[1], header=F, col.names=c("POS","ATT"))
id <- args[2]
prot_len <- nrow(input)
manual <- prot_len/100 # 4.3
att_name <- "Entropy"
att_zoo <- zoo(input$ATT)
att_avg <- rollapply(att_zoo, width = manual, by = manual, FUN = mean, align = "left")
autoplot(att_avg, col="att1") + labs(x = "Positions", y = att_name, title="")
With data:
> str(input)
'data.frame': 431 obs. of 2 variables:
$ POS: int 1 2 3 4 5 6 7 8 9 10 ...
$ ATT: num 0.652 0.733 0.815 1.079 0.885 ...
I do:
I would like to upload input2 which has different lenght (therefore, different x-axis) and overlap the 2 curves in the same plot (I mean overlap because I want the two curves in the same plot size, so I will "ignore" the overlapped axis labels and tittles), I would like to compare the shape, regardles the lenght of input.
First I've tried by generating toy input2 changing manual value, so that I have att_avg2 in which manual equals e.g. 7. In between original autoplot and new autoplot-2 I add par(new=TRUE), but this is not my expected output. Any hint on how doing this? Maybe it's better to save att_avg from zoo series to data.frame and not use autoplot? Thanks
UPDATE, response to G. Grothendieck:
If I do:
[...]
att_zoo <- zoo(input$ATT)
att_avg <- rollapply(att_zoo, width = manual, by = manual, FUN = mean, align = "left") #manual=4.3
att_avg2 <- rollapply(att_zoo, width = 7, by = 7, FUN = mean, align = "left")
autoplot(cbind(att_avg, att_avg2), facet=NULL) +
labs(x = "Positions", y = att_name, title="")
I get
and a warning message:
Removed 1 rows containing missing values (geom_path).
par is used with classic graphics, not for ggplot2. If you have two zoo series just cbind or merge the series together and autoplot them using facet=NULL:
library(zoo)
library(ggplot2)
z1 <- zoo(1:3) # length 3
z2 <- zoo(5:1) # length 5
autoplot(cbind(z1, z2), facet = NULL)
Note: The question omitted input2 so there could be some additional considerations from aspects not shown.
Suppose I have following data for a student's score on a test.
set.seed(1)
df <- data.frame(question = 0:10,
resp = c(NA,sample(c("Correct","Incorrect"),10,replace=TRUE)),
score.after.resp=50)
for (i in 1:10) {
ifelse(df$resp[i+1] == "Correct",
df$score.after.resp[i+1] <- df$score.after.resp[i] + 5,
df$score.after.resp[i+1] <- df$score.after.resp[i] - 5)
}
df
.
question resp score.after.resp
1 0 <NA> 50
2 1 Correct 55
3 2 Correct 60
4 3 Incorrect 55
5 4 Incorrect 50
6 5 Correct 55
7 6 Incorrect 50
8 7 Incorrect 45
9 8 Incorrect 40
10 9 Incorrect 35
11 10 Correct 40
I want to get following graph:
library(ggplot2)
ggplot(df,aes(x = question, y = score.after.resp)) + geom_line() + geom_point()
My problem is: I want to color segments of this line according to student response. If correct (increasing) line segment will be green and if incorrect response (decreasing) line should be red.
I tried following code but did not work:
ggplot(df,aes(x = question, y = score.after.resp, color=factor(resp))) +
geom_line() + geom_point()
Any ideas?
I would probably approach this a little differently, and use geom_segment instead:
df1 <- as.data.frame(with(df,cbind(embed(score.after.resp,2),embed(question,2))))
colnames(df1) <- c('yend','y','xend','x')
df1$col <- ifelse(df1$y - df1$yend >= 0,'Decrease','Increase')
ggplot(df1) +
geom_segment(aes(x = x,y = y,xend = xend,yend = yend,colour = col)) +
geom_point(data = df,aes(x = question,y = score.after.resp))
A brief explanation:
I'm using embed to transform the x and y variables into starting and ending points for each line segment, and then simply adding a variable that indicates whether each segment went up or down. Then I used the previous data frame to add the original points themselves.
Alternatively, I suppose you could use geom_line something like this:
df$resp1 <- c(as.character(df$resp[-1]),NA)
ggplot(df,aes(x = question, y = score.after.resp, color=factor(resp1),group = 1)) +
geom_line() + geom_point(color = "black")
By default ggplot2 groups the data according to the aesthetics that are mapped to factors. You can override this default by setting group explicitly,
last_plot() + aes(group=NA)
I have a dataset with multiple response variables and three treatments. Treatment2 is nested within treatment1 and treatment3 is nested within treatment 2. I have shown only three response variables for the sake of simplicity. I would like to run this over 22 response variable of which 3 are shown in the demo table.
My objective:
To visualize how the response variable(s) change based on the treatment combination. I have created a script to perform this on one response variable. I am copy pasting this code to run through other columns which to me is an extremely crude way to do it. Which leads to my second objective.
Automate or modify the following script so that it can automatically loops through the column and produce desired table and graphs.
Demo data:
demo.table
Here is my script:
library(doBy)
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
attach (demo)
cdataNA <- summaryBy(tyr ~ spp + wat + ins, data=demo, FUN=c(length2,mean,sd), na.rm=TRUE)
# Rename column change.length to just N
names(cdataNA)[names(cdataNA)=="tyr.length2"] <- "N"
# Calculate standard error of the mean
cdataNA$tyr.SE <- cdataNA$tyr.sd / sqrt(cdataNA$N)
cdataNA
# Now create a barplot using ggplot2
library(ggplot2)
a <- ggplot(cdataNA, aes(x = wat, y = tyr.mean, fill = ins))
b <- a + geom_bar(stat = "identity", position = "dodge") + facet_grid (~ spp)
# Now put errorbars.
c <- b + geom_errorbar(aes(ymin=tyr.mean-tyr.SE, ymax=tyr.mean+tyr.SE),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
xlab ("wat") +
ylab ("tyr (PA/PA std)")
c
## esc
library(doBy)
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
cdataNA1 <- summaryBy(esc ~ spp + wat + ins, data=demo, FUN=c(length2,mean,sd), na.rm=TRUE)
# Rename column change.length to just N
names(cdataNA1)[names(cdataNA1)=="esc.length2"] <- "N"
# Calculate standard error of the mean
cdataNA1$esc.SE <- cdataNA1$esc.sd / sqrt(cdataNA1$N)
cdataNA1
# Now create a barplot using ggplot2
library(ggplot2)
a1 <- ggplot(cdataNA1, aes(x = wat, y = esc.mean, fill = ins))
b1 <- a1 + geom_bar(stat = "identity", position = "dodge") + facet_grid (~ spp)
# Now put errorbars.
c1 <- b1 + geom_errorbar(aes(ymin=esc.mean-esc.SE, ymax=esc.mean+esc.SE),
width=.2, # Width of the error bars
position=position_dodge(.9)) +
xlab ("wat") +
ylab ("esc (PA/PA std)")
c1
Resulting table for tyr:
spp wat ins N tyr.mean tyr.sd tyr.SE
1 Bl High No 4 0.305325 0.034102041 0.017051020
2 Bl High Yes 5 0.186140 0.045165894 0.020198802
3 Bl Low No 5 0.310540 0.061810096 0.027642315
4 Bl Low Yes 5 0.202840 0.029034944 0.012984822
5 Man High No 4 0.122725 0.075867005 0.037933503
6 Man High Yes 5 0.081800 0.013463469 0.006021046
7 Man Low No 5 0.079880 0.009569587 0.004279650
8 Man Low Yes 4 0.083550 0.018431947 0.009215973
Resulting graph for esc:
demo figure for esc
So the whole thing works but still requires considerable manual labor which impedes the work-flow. it would be great to achieve automation.
Thanks in advance.
You can organize the data in just two lines:
melt.dta <- melt(dta, id.vars = c("spp", "wat", "ins"), measure.vars = "tyr")
cast(melt.dta, spp + wat + ins ~ .,
function (x) c("N" = sum(!is.na(x)),
"mean" = mean(x, na.rm = TRUE),
"sd" = sd(x, na.rm = TRUE),
"se" = sd(x, na.rm = TRUE)/sqrt(sum(!is.na(x)))))
It returns:
spp wat ins N mean sd se
1 Bl High No 4 0.3053 0.03410 0.01705
2 Bl High Yes 5 0.1861 0.04517 0.02020
3 Bl Low No 5 0.3105 0.06181 0.02764
4 Bl Low Yes 5 0.2028 0.02903 0.01298
5 Man High No 4 0.1227 0.07587 0.03793
6 Man High Yes 5 0.0818 0.01346 0.00602
7 Man Low No 5 0.0799 0.00957 0.00428
8 Man Low Yes 4 0.0835 0.01843 0.00922