Related
I'm trying plots a graph lines using ggplot library in R, but I get a good plots but I need reduce the gradual space or height between rows grid lines because I get big separation between lines.
This is my R script:
library(ggplot2)
library(reshape2)
data <- read.csv('/Users/keepo/Desktop/G.Con/Int18/input-int18.csv')
chart_data <- melt(data, id='NRO')
names(chart_data) <- c('NRO', 'leyenda', 'DTF')
ggplot() +
geom_line(data = chart_data, aes(x = NRO, y = DTF, color = leyenda), size = 1)+
xlab("iteraciones") +
ylab("valores")
and this is my actual graphs:
..the first line is very distant from the second. How I can reduce heigth?
regards.
The lines are far apart because the values of the variable plotted on the y-axis are far apart. If you need them closer together, you fundamentally have 3 options:
change the scale (e.g. convert the plot to a log scale), although this can make it harder for people to interpret the numbers. This can also change the behavior of each line, not just change the space between the lines. I'm guessing this isn't what you will want, ultimately.
normalize the data. If the actual value of the variable on the y-axis isn't important, just standardize the data (separately for each value of leyenda).
As stated above, you can graph each line separately. The main drawback here is that you need 3 graphs where 1 might do.
Not recommended:
I know that some graphs will have the a "squiggle" to change scales or skip space. Generally, this is considered poor practice (and I doubt it's an option in ggplot2 because it masks the true separation between the data points. If you really do want a gap, I would look at this post: axis.break and ggplot2 or gap.plot? plot may be too complexe
In a nutshell, the answer here depends on what your numbers mean. What is the story you are trying to tell? Is the important feature of your plots the change between them (in which case, normalizing might be your best option), or the actual numbers themselves (in which case, the space is relevant).
you could use an axis transformation that maps your data to the screen in a non-linear fashion,
fun_trans <- function(x){
d <- data.frame(x=c(800, 2500, 3100), y=c(800,1950, 3100))
model1 <- lm(y~poly(x,2), data=d)
model2 <- lm(x~poly(y,2), data=d)
scales::trans_new("fun",
function(x) as.vector(predict(model1,data.frame(x=x))),
function(x) as.vector(predict(model2,data.frame(y=x))))
}
last_plot() + scale_y_continuous(trans = "fun")
enter image description here
I have a dataset:
a<-c(1,2,3,4,5,6,7,8,9,10)
b<-c(2,2,2,2,4,5,6,8,4,1)
c<-c("red","red","red","blue","blue","blue","orange","orange","orange","orange")
data<-data.frame(a=a,b=b,c=c)
I now want to plot the data on a graph with each group having a different colour:
plot(a[c=="red"],b[c=="red"],col="red",xlim=c(min(a),max(a)),ylim=c(min(b),max(b)))
points(a[c=="blue"],b[c=="blue"],col="blue")
points(a[c=="orange"],b[c=="orange"],col="orange")
This works fine - however, say if I have 30 groups, the task of writing the code becomes tedious. I am wondering if there is a better way of writing the code such that R will automatically plot the graph and give different colours to different groups?
Also, I wonder if there is a quick way to display a legend in the graph.
Thank you for all your help.
Try this:
with(data,plot(a,b,col=c))
The col argument in plot() stands for color. This can contain a vector of the colors you want.
Additionally, you don't have to make a column just to define the color if the color-group relationship is not that important. For example, you could make column c a more meaningful column like this:
a<-c(1,2,3,4,5,6,7,8,9,10)
b<-c(2,2,2,2,4,5,6,8,4,1)
c<-c(rep('Group1',3),rep('Group2',3),rep('Group3',4))
data<-data.frame(a=a,b=b,c=c)
Then to plot, use:
with(data,plot(a,b,col=c))
To add a legend:
legend('topleft',legend = levels(data[,'c']),col=1:nlevels(data[,'c']),pch=1)
Try ggplot2
library(ggplot2)
ggplot(data=data, aes(x=a, y=b, colour=c)) + geom_point()
There are lots of situations where I use ggplot to create a nice looking graph, but I would like to play around with the colors/shapes/sizes for data belonging to a certain group (e.g. to highlight it).
I understand how to set these properties differently for each group when I first create the plot. However, I would like to know if there is a simple command to change the properties after the plot has been created preferably without having to specify the properties for all other subsets).
As an example consider the following code:
library(ggplot2)
x = seq(0,1,0.2)
y = seq(0,1,0.2)
types = c("a","a","a","b","b","c")
df = data.frame(x,y,types)
table_of_colors = c("a"="red","b"="blue","c"="green")
table_of_shapes = c("a"=15,"b"=15,"c"=16)
my_plot = ggplot(df) +
theme_bw() +
geom_point(aes(x=x,y=y,color=types,shape=types),size=10) +
scale_color_manual(values = table_of_colors) +
scale_shape_manual(values=table_of_shapes)
which produces the following plot:
I'm wondering:
Is there a way to change the color of the green point (type=="c") without having to type out the colors for the other points?
Is there a way to change the shape of the blue/red points (type %in% c("a","b")) without having to type out the shapes for all the other points?
The size of all points is currently set to 10. Is there a way to change the size of only the green point to say 15, while keeping the size of all remaining points at 10?
I'm not sure if this is an existing feature, but hacks are welcome (so long as the changes will be reflected in the legend).
This seems kind of hacky to me, but the code below addresses items 1 and 2 in your list:
my_plot +
scale_colour_manual(values=c(table_of_colors[1:2],c="green")) +
scale_shape_manual(values=c(a=4,b=6, table_of_shapes[3]))
I thought maybe you could change the size with something like scale_size_manual(values=c(10,10,15)), but that doesn't work, perhaps because size was hard-coded, rather than set with an aesthetic to begin with.
It would probably be cleaner to just create new vectors of shapes, colors, etc., as needed, rather than to make individual ad hoc changes like those above.
I am looking to create a plot in R that shows the relative change of some variables between two factors. I would like to stack them to reduce redundant text and make it easy to visually compare the changes between the two factors. I would like it to look something like this: http://postimg.org/image/clmw5zj37/.
where the lines (or bars) represent the relative change (y) in each variable (X), a solid circle (or any other symbol) represents no change, and an asterisk indicates the the change is statistically significant. Anyone have an idea of how to accomplish this in R?
This?
set.seed(1)
df <- data.frame(x=toupper(letters[1:10]),
y=rnorm(20,0,50),
sig=sample(0:1,20,replace=T),
factor=rep(c("Factor1","Factor2"),each=10))
library(ggplot2)
ggplot(df) +
geom_point(aes(x=x,y=y),shape=1,size=3)+
geom_linerange(aes(x=x,ymin=0,ymax=y))+
geom_text(data=df[df$sig==1,], aes(x=x,y=y+10*sign(y)),label="*",size=10)+
geom_hline(yintercept=0)+
facet_grid(factor~.)
Note that it is considered polite to provide a representative dataset. See this link for the way to formulate a question well.
Edit In response to OP's comment.
To plot points only wheny=0, set data=df[df$y==0,] in the call to geom_point(...). Vertical alignment of the stars can be done using vjust= in the call to geom_text(...). So, this code:
set.seed(1)
df <- data.frame(x=toupper(letters[1:10]),
y=rnorm(20,0,50),
sig=sample(0:1,20,replace=T),
factor=rep(c("Factor1","Factor2"),each=10))
df[sample(1:nrow(df),4),2:3]=0 # add some zeros to example
library(ggplot2)
ggplot(df) +
geom_point(data=df[df$y==0,],aes(x=x,y=y),size=5)+
geom_linerange(aes(x=x,ymin=0,ymax=y))+
geom_text(data=df[df$sig==1,], aes(x=x,y=y+10*sign(y)),
label="*", size=10, vjust=+0.65)+
geom_hline(yintercept=0)+
facet_grid(factor~.)
Genreates this ggplot:
I have performed a multidimensional cluster analysis in matlab. For each cluster, I have calculated mean and covariance (assuming conditional independence).
I have chosen two or three dimensions out of my raw data and plotted it into a scatter or scatter3 plot.
Now I would like to add the cluster-means and the corresponding standart deviations into the same plot.
In other words, I wand to add some data points with error bars to a scatter plot.
This question is almost what I want. But I would be ok with bars instead of boxes and I wonder if in that case there is a built-in way to do it with less effort.
Any suggestions on how to do that?
Once you realize that line segments will probably suffice for your purpose (and may be less ugly than the usual error bars with the whiskers, depending on the number of points), you can do something pretty simple (which applies to probably any plotting package, not just MATLAB).
Just plot a scatter, then write a loop to plot all line-segments you want corresponding to error bars (or do it in the opposite order like I did with error bars first then the scatter plot, depending if you want your dots or your error bars on top).
Here is the simple MATLAB code, along with an example figure showing error bars in two dimensions (sorry for the boring near-linearity):
As you can see, you can plot error bars for each axis in different colors to aid in visualization.
function scatterError(x, y, xe, ye, varargin)
%Brandon Barker 01/20/2014
nD = length(x);
%Make these defaults later:
dotColor = [1 0.3 0.3]; % conservative pink
yeColor = [0, 0.4, 0.8]; % bright navy blue
xeColor = [0.35, 0.35, 0.35]; % not-too-dark grey
dotSize = 23;
figure();
set(gcf, 'Position', get(0,'Screensize')); % Maximize figure.
set(gca, 'FontSize', 23);
hold all;
for i = 1:nD
plot([(x(i) - xe(i)) (x(i) + xe(i))], [y(i) y(i)], 'Color', xeColor);
plot([x(i) x(i)], [(y(i) - ye(i)) (y(i) + ye(i))], 'Color', yeColor);
end
scatter(x, y, dotSize, repmat(dotColor, nD, 1));
set(gca, varargin{:});
axis square;
With some extra work, it wouldn't be too hard to add whiskers to your error bars if you really want them.
If you are not too picky about what the graph looks like and are looking for performance, a builtin function is indeed often a good choice.
My first thought would be to try using a boxplot, it has quite a lot of options so probably one combination of them will give you the result you need.
Sidenote: At first sight the answer you referred to does not look very inefficient so you may have to manage your expectations when it comes to achievable speedups.