I have the following data in R.
> dat
algo taxi.d taxi.s hanoi.d. hanoi.s ep
1 plain VI 7.81 9.67 32.92 38.12 140.33
2 model VI 12.00 46.67 53.17 356.68 229.89
3 our algorithm 6.66 6.86 11.71 21.96 213.27
I have made a graph of this in Excel, now I want something similar in R. Please note that the vertical scale is logarithmic, with powers of 2.
What R commands do I need to use to have this?
Sorry if this is a very easy question, I am a complete novice to R.
The reshape2 and ggplot packages should help accomplish what you want:
dat = read.table(header=TRUE, text=
"algo taxi.d taxi.s hanoi.d hanoi.s ep
1 'plain VI' 7.81 9.67 32.92 38.12 140.33
2 'model VI' 12.00 46.67 53.17 356.68 229.89
3 'our algorithm' 6.66 6.86 11.71 21.96 213.27")
install.packages("reshape2") # only run the first time
install.packages("ggplot2") # only run the first time
library(reshape2)
library(ggplot2)
# convert the data into a more graph-friendly format
data2 = melt(dat, id.vars='algo', value.name='performance', variable.name='benchmark')
# graph data + bar chart + log scale
ggplot(data2) +
geom_bar(aes(x = benchmark, y = performance, fill = algo), stat='identity', position='dodge') +
scale_y_log10()
Hope this code will help you up with your plot
dat <- matrix(c(
c(0.25,0.25,0.25,0.25),
c(0.05,0,0.95,0),
c(0.4,0.1,0.1,0.4)),
nrow=4,ncol=3,byrow=FALSE,
dimnames=list(c("A","C","G","T"),
c("E","S","I"))
)
barplot(dat,border=FALSE,beside=TRUE,
col=rainbow(4),ylim=c(0,1),
legend=rownames(dat),main="Plot name",
xlab="State",ylab="observation")
grid()
box()
Related
I have colored a graph with ggplot2 based on a threshold value of 1. Surface scores greater than 1
was colored azure and surface scores less than 1 is colored beige. Here is my sample code.
library(ggplot2)
setwd("F:/SUST_mutation/Graph_input")
d <- read.csv(file = "N.csv", sep = ",", header = TRUE)
ggplot(d, aes(x= Position,y= wild_Score)) + xlab("Positions") + ylab("Scores") +
geom_ribbon(aes(ymin=pmin(wild_Score,1), ymax=1), fill="beige", alpha= 1.5) +
geom_ribbon(aes(ymin=1, ymax=pmax(wild_Score,1)), fill="azure", alpha= 1.5)
My problem is that if I go through the upper surface to the lower surface, I expect the surface line in one line.
But if you see the figure, you will see that they are not. Around the threshold line, the lower surface does not meet the upper surface rather it creates some extra surface. For convenience, I have marked the portions with a red circle.
extra surface on the negative portion close to threshold:
Position Wild_Score
4 1.048
5 1.052
6 1.016
7 0.996
8 0.97
9 0.951
10 0.971
11 1.047
12 1.036
13 1.051
14 1.124
15 1.172
16 1.172
17 1.164
18 1.145
19 1.186
20 1.197
21 1.197
22 1.216
23 1.193
24 1.216
25 1.216
26 1.262
Problem-2:
I have a data frame like following.
Position Score_1 Score_2
4 1.048 1.048
5 1.052 1.052
6 1.016 1.016
7 0.996 1.433
8 0.97 1.432
9 0.951 1.567
10 0.971 1.231
11 1.047 1.055
12 1.036 1.036
13 1.051 1.051
14 1.124 1.124
15 1.172 1.172
16 1.172 1.172
17 1.164 1.164
I plot the surface for position vs score_1 with Tibble and a line graph on that surface with the same positions vs score_2 like the following,
desired graph
As the line just differs at some points I subsetted the main dataset(both column and row).
I get the following error.
"Error: Aesthetics must be either length 1 or the same as the data (13): x" I guess this is because I used two different data frames for the graphs.
here is my code:
d <- read.csv(file = "E.csv", sep = ",", header = TRUE)
d1 <- tibble::tibble(
x = seq(min(d$Position), max(d$Position), length.out = 1000),
y = approx(d$Position, d$Score_1, xout = x)$y
)
ggplot(d1, aes(x= x,y= y)) + xlab("Positions") + ylab("Scores") +
geom_ribbon(aes(ymin=pmin(y,1), ymax=1), fill="red", alpha= 1.5) +
geom_ribbon(aes(ymin=1, ymax=pmax(y,1)), fill="blue", alpha= 1.5) +
geom_line(aes(y=1)) + geom_line(d = d[c(3:10), c(1,3)],aes(y =
Score_2), color = "blue", size = 1)
I want to know what is causing the problem and how should I deal with it?
It's because the negative surface at, for example, row 3 and 4 starts from 1 and goes to 0.996, instead of going from 1.016 to 0.996. Relevant discussion and other examples at ggplot2's issue tracker.
This problem is typically only visible if the number of observations is small-ish, so the typical way people overcome this problem is to interpolate the data. You can find an example of that below (I've omitted your colours because it was hard to see):
library(ggplot2)
# txt <- "your_example_table" # Omitted for brevity
df <- read.table(text = txt, sep = "\t", header = TRUE)
data2 <- tibble::tibble(
x = seq(min(df$Position), max(df$Position), length.out = 1000),
y = approx(df$Position, df$Wild_Score, xout = x)$y
)
ggplot(data2, aes(x= x,y= y)) + xlab("Positions") + ylab("Scores") +
geom_ribbon(aes(ymin=pmin(y,1), ymax=1, fill = "A")) +
geom_ribbon(aes(ymin=1, ymax=pmax(y,1), fill = "B"))
This is great for hiding the problem, but calculating the exact line intersection points is a bit of a pain. I apologise for the self-promotion but I ran into this too and wrapped my solution for finding these line intersection points in a function on the dev version of my package ggh4x, which you might find useful.
library(ggh4x) # devtools::install_github("teunbrand/ggh4x")
ggplot(df, aes(x= Position,y= Wild_Score)) +
stat_difference(aes(ymin = 1, ymax = Wild_Score))
Created on 2021-08-15 by the reprex package (v1.0.0)
I have a data frame that looks like this:
Teff logg M_div_H U B V R I J H K L Lprime M
1 2000 4.0 -0.1 -13.443 -11.390 -7.895 -4.464 -1.831 1.666 3.511 2.701 4.345 4.765 5.680
2 2000 4.5 -0.1 -13.402 -11.416 -7.896 -4.454 -1.794 1.664 3.503 2.728 4.352 4.772 5.687
3 2000 5.0 -0.1 -13.358 -11.428 -7.888 -4.431 -1.738 1.664 3.488 2.753 4.361 4.779 5.685
4 2000 5.5 -0.1 -13.220 -11.079 -7.377 -4.136 -1.483 1.656 3.418 2.759 4.355 4.753 5.638
5 2200 3.5 -0.1 -11.866 -9.557 -6.378 -3.612 -1.185 1.892 3.294 2.608 3.929 4.289 4.842
6 2200 4.5 -0.1 -11.845 -9.643 -6.348 -3.589 -1.132 1.874 3.310 2.648 3.947 4.305 4.939
...
Let's say I have two values:
input_Teff = 4.8529282904170595E+003
input_log_g = 1.9241934741026787E+000
Notice how every V value has a unique Teff, logg combination. From the input values, I would like to interpolate a value for V. Is there a way to do this in R?
Edit 1: Here is the link to the full data frame: https://www.dropbox.com/s/prbceabxmd25etx/lcb98cor.dat?dl=0
Building on Ian Campbell's observation that you can consider your data as points on a two-dimensional plane, you can use spatial interpolation methods. The simplest approach is inverse-distance weighting, which you can implement like this
library(data.table)
d <- fread("https://www.dropbox.com/s/prbceabxmd25etx/lcb98cor.dat?dl=1")
setnames(d,"#Teff","Teff")
First rescale the data as appropriate (not shown here, see Ian's answer)
library(gstat)
# fit model
idw <- gstat(id="V", formula = V~1, locations = ~Teff+logg, data=d, nmax=7, set=list(idp = .5))
# new "points" to predict to
newd <- data.frame(Teff=c(4100, 4852.928), logg=c(1.5, 1.9241934741026787))
p <- predict(idw, newd)
#[inverse distance weighted interpolation]
p$V.pred
#[1] -0.9818571 -0.3602857
For higher dimensions you could use fields::Tps (I think you can force that to be an exact method, that is, exactly honor the observations, by making each observation a node)
We can imagine that Teff and logg exist in a 2-dimensional plane. We can see that your input point exists in that same space:
library(tidyverse)
ggplot(data,aes(x = Teff, y = logg)) +
geom_point() +
geom_point(data = data.frame(Teff = 4.8529282904170595e3, logg = 1.9241934741026787),
color = "orange")
However, we can see the scale of Teff and logg are not the same. Simply taking log(Teff) gets us pretty close, but not quite. So we can rescale between 0 and 1 instead. We can create a custom rescale function. It will become clear why we can't use scales::rescale in a moment.
rescale = function(x,y){(x - min(y))/(max(y)-min(y))}
We can now rescale the data:
data %>%
mutate(Teff.scale = rescale(Teff,Teff),
logg.scale = rescale(logg,logg)) -> data
From here, we might use raster::pointDistance to calculate the distance from the input point to all of the scaled values:
raster::pointDistance(cbind(rescale(input_Teff,data$Teff),rescale(input_log_g,data$logg)),
data[,c("Teff.scale","logg.scale")],
lonlat = FALSE)
We can use which.min to find the row with the minimum distance:
data[which.min(raster::pointDistance(cbind(rescale(input_Teff,data$Teff),rescale(input_log_g,data$logg)),
data[,c("Teff.scale","logg.scale")],
lonlat = FALSE)),]
Teff logg M_div_H U B V R I J H K L Lprime M Teff.scale logg.scale
1: 4750 2 -0.1 -2.447 -1.438 -0.355 0.159 0.589 1.384 1.976 1.881 2.079 2.083 2.489 0.05729167 0.4631902
Here we can visualize the result:
ggplot(data,aes(x = Teff.scale, y = logg.scale)) +
geom_point() +
geom_point(data = data[which.min(raster::pointDistance(cbind(rescale(input_Teff,data$Teff),rescale(input_log_g,data$logg)),data[,c("Teff.scale","logg.scale")], FALSE)),],
color = "blue") +
geom_point(data = data.frame(Teff.scale = rescale(input_Teff,data$Teff),logg.scale = rescale(input_log_g,data$logg)),
color = "orange")
And access the appropriate value for V:
data[which.min(raster::pointDistance(cbind(rescale(input_Teff,data$Teff),rescale(input_log_g,data$logg)),data[,c("Teff.scale","logg.scale")], FALSE)),"V"]
V
1: -0.355
Data:
library(data.table)
data <- fread("https://www.dropbox.com/s/prbceabxmd25etx/lcb98cor.dat?dl=1")
setnames(data,"#Teff","Teff")
input_Teff = 4.8529282904170595E+003
input_log_g = 1.9241934741026787E+000
Trying to Label my scatter points in R. This is my first plot, very straight forward but can't seem to figure out adding text. I've looked at some of the other posts in here and they partially make sense but i just don't understand the lingo yet.
stats <- read.csv(file.choose())
qplot(data=stats, x=Avg.of.FD.Points, y=Avg.FD.Dev)
text(x, y, label=Home.Skater)
Home.Skater Avg.of.FD.Points Avg.FD.Dev
A.J. Greer | 4.27 | 2.84
Aaron Ekblad | 12.40 | 6.22
Aaron Ness | 5.60 | 4.00
Here is a simple scatterplot example with geom_text based on your sample data.
df <- read.table(text =
"Home.Skater Avg.FD.PTS Avg.FD.Dev
A.J._Greer 4.27 2.84
Aaron_Ekblad 12.40 6.22
Aaron_Ness 5.60 4.00", header = T);
require(ggplot2);
ggplot(df, aes(x = Avg.FD.PTS, y = Avg.FD.Dev, label = Home.Skater)) +
geom_point() +
geom_text(hjust = 0, nudge_x = 0.05) +
xlim(0, 15);
To avoid cluttering of (many) labels, you may want to consider the R library ggrepel.
I have following data in csv format
0.828666667 0.100333333
0.725666667 0.153666667
0.364333333 0.036666667
0.475666667 0.051
0.522333333 0.052333333
0.457 0.041666667
0.644666667 0.093333333
0.404333333 0.039333333
0.497 0.042333333
0.155666667 0.031666667
0.160666667 0.081333333
0.145666667 0.026666667
0.138666667 0.033666667
0.094333333 0.03
0.141 0.023666667
0.148666667 0.052
0.195666667 0.039
0.196333333 0.039333333
......
I am using following code
library(ggplot2)
data<-read.csv("sample.csv",header=TRUE,sep=",")
ggplot(data,aes(x=A,y=B,))+ geom_line() + scale_x_continuous(breaks=seq(0,9,0.5)) +
scale_y_continuous(breaks=seq(0,9,0.5))
I want have same scale in x and y axis thats y i am using breaks..but this doesnt give what i want it does over plotting
But i want to image as follows see example second image
Thanks for help
I think you need to manipulate the data a little bit...
library(reshape2)
library(ggplot2)
dat <- YOUR CSV
names(dat) <- c('a', 'b')
# need an x for the plot
dat$Num <- as.numeric(row.names(dat))
meltDat <- melt(dat, id.vars = 'Num')
ggplot(meltDat,
aes(x = Num, y = value, group = variable, color = variable)) +
geom_line()
I'm not sure I understand exactly what you want to do. But here's my attempt to get from your data to something similar to your second plot.
data <- read.table(text="0.828666667 0.100333333
0.725666667 0.153666667
0.364333333 0.036666667
0.475666667 0.051
0.522333333 0.052333333
0.457 0.041666667
0.644666667 0.093333333
0.404333333 0.039333333
0.497 0.042333333
0.155666667 0.031666667
0.160666667 0.081333333
0.145666667 0.026666667
0.138666667 0.033666667
0.094333333 0.03
0.141 0.023666667
0.148666667 0.052
0.195666667 0.039
0.196333333 0.039333333")
names(data) <- c("A", "B")
# prepare data for plotting
require(reshape2)
data$id <- 1:nrow(data)
df <- melt(data, id.var="id")
# plot
library(ggplot2)
ggplot(df, aes(x=id, y=value, color=variable)) + geom_line()
If this did not answer your question, please try to be more specific.
I'm a new here but I hope that you can help me. I just googled my problem but coudn't solve it.
I have a data frame containing lots of data which I want to plot with ggplot in R. All does work very well but the legend drives me crazy. The linetypes in the legend are always solid instead of what I defined.
I'm loading a csv file, then making subsets with loops and summarize the subsets with SummarySE().
A subset is looking like this:
ExperimentCombinations LB TargetPosition N C_measured sd se ci
1 HS 0.10 Foveal 10 0.11007970 0.04114193 0.013010221 0.02943116
2 HS 0.21 Foveal 10 0.09821870 0.04838134 0.015299523 0.03460992
3 HS 0.30 Foveal 9 0.07911856 0.04037776 0.013459252 0.03103709
4 HS 1.00 Foveal 11 0.06657355 0.02688821 0.008107099 0.01806374
5 LED 0.10 Foveal 8 0.12569725 0.03607487 0.012754393 0.03015935
6 LED 0.21 Foveal 10 0.08797370 0.02091996 0.006615472 0.01496524
7 LED 0.30 Foveal 10 0.07358290 0.03002596 0.009495042 0.02147928
8 LED 1.00 Foveal 8 0.06630350 0.01894423 0.006697796 0.01583777
in this case TargetPosition has Levels Foveal or Peripheral.
The ggplot code I'm using is (looks awful because I was trying to solve my problem...):
ColourFoveal <- c("#FFCC33","#00CCFF")
ColourPeripheral <- c("#FFCC33","#00CCFF")
#ColourPeripheral <- c("#FF9900","#0066FF")
PointType <- c(20,20)
PointTypeSweepUp <- c(24,24)
PointTypeSweepDown <- c(25,25)
ColourHSFillFoveal <- c("#FFCC33", "#FFCC33")
ColourHSFillPeripheral <- c("#FF9900", "#FF9900")
#LineTypeFoveal <- c("solid", "solid")
LineTypePeripheral <- c("dashed","dashed")
xbreaks <- c(0.1,0.21,0.3,1.0)
plotsYmax <- 0.2
if(field=="Peripheral"){
lineType<-sprintf("dashed")
lineColour<-ColourPeripheral
}else{
lineType<-"solid"
lineColour<-ColourFoveal
}
ggplot(df, aes(x=LB, y=C_measured, shape=ExperimentCombinations, colour=ExperimentCombinations)) +
geom_errorbar(aes(ymin=C_measured-se, ymax=C_measured+se), width=.1) +
geom_line(linetype=lineType) +
geom_point() +
ggtitle(paste(targetDataFrame$AgeGroup, targetDataFrame$TargetPosition)) +
scale_colour_manual(name="", values=lineColour)+
scale_shape_manual(name="", values=PointType)+
scale_fill_manual(name="", values=lineColour)+
scale_x_continuous(breaks=xbreaks)+
coord_cartesian(ylim = c(0, plotsYmax+0.01))+
scale_y_continuous(breaks=c(0,0.05,0.1,0.15,0.2))+
theme(axis.line=element_line(colour="black"))+
theme(panel.grid=element_blank())+
theme_bw()+
theme(legend.key.width=unit(2,"line"))
}
The peripheral plots should have dashed lines, the foveal ones solid lines.
What I get is always like this: (As a new user I'm not allowed to post images!)
The lines are dashed an in the colours I like to, the points are right, too. But in the legend, the lines are solid instead of dashed. The Colours and points are alright in the legend, too.
Could you help me to define the linetypes in the legend as dashed in the peripheral case?