If I have a head(df) like:
feature Comparison Primary diff key
1 work 15.441176 20.588235 5.1470588 1
2 employee 22.794118 19.117647 -3.6764706 2
3 good 11.029412 11.764706 0.7352941 3
4 improve 8.088235 10.294118 2.2058824 4
5 career 2.941176 8.823529 5.8823529 5
6 manager 2.941176 8.823529 5.8823529 6
and I'm trying to plot something with:
p = ggplot(x, aes(x = feature,size=8)) + geom_point(aes(y = Primary)) +
geom_point(aes(y=Comparison)) + coord_flip()
ggplotly(p)
Is there something I'm missing that causes p not to plot the order of the data above? the first five on the plot are
work
train
time
skill
people
But according to the df, it should be work, employee, good, improve, career.
There are these things called "levels" which ggplot uses to determine the order things should appear in the plot. If you ran levels(x$feature) in the console, then I bet the list you see has the same order as what appears in the plot.
To have them show up in the order you want, you can just have to override the "levels" for the feature column.
x$feature = factor(x$feature, levels = c("work",
"employee",
"good",
"improve",
"manager"))
Related
I have a dataset (df) that looks like this:
EIN Year Cat Fund
1 16 2005 A 9784.490
2 16 2006 A 10020.720
3 16 2007 A 9232.796
4 15 2008 B 8567.893
5 15 2009 B 10292.670
6 17 2010 C 9274.589
The data has relatively large dimensions (around 300k observations), which makes plotting a potentially slow process. I would like to plot the variable Fund for each year, by the identifier EIN. Based on this post I have tried the following code:
library(ggplot2)
ggplot(df, mapping = aes(x = Year, y = Fund)) +
geom_line(aes(linetype = as.factor(EIN)))
Here are my questions:
This code becomes pretty slow given the high amount of observations that I have. Do you suggest any alternatives that could speed up the process?
Since I have a huge number of EINs, the legend ends-up taking all the space available for the graph, so I would like to get rid of it unsuccesfully. I tried adding + guides(fill=FALSE) at the end, but it did not work. Any advice?
If I wanted to either subset or color code my plot by Cat, what would be the best way to do it?
Thanks a lot for your help!
You can get rid of the legend using:
+ theme(legend.position = 'none')
To subset (facet) your plot, especially if there aren't too many categories, use facet_wrap:
+ facet_wrap(~Cat)
To colour instead, put colour = Cat inside your aes() calll.
I'm new to R and feeling a bit lost ... I'm working on a dataset which contains 7 point-likert-scale answers.
My data looks like this for example:
My goal is to create a barplot which displays the likert scale on the x-lab and frequency on y-lab.
What I understood so far is that I first have to transform my data into a frequency table. For this I used a code that I found in another post on this site:
data <- factor(data, levels = c(1:7))
table(data)
However I always get this output:
data
1 2 3 4 5 6 7
0 0 0 0 0 0 0
Any ideas what went wrong or other ideas how I could realize my plan?
Thanks a lot!
Lorena
This is a very simple way of handling your question, only using base-R
## your data
my_obs <- c(4,5,3,4,5,5,3,3,3,6)
## use a factor for class data
## you could consider making it ordered (ordinal data)
## which makes sense for Likert data
## type "?factor" in the console to see the documentation
my_factor <- factor(my_obs, levels = 1:7)
## calculate the frequencies
my_table <- table(my_factor)
## print my_table
my_table
# my_factor
# 1 2 3 4 5 6 7
# 0 0 4 2 3 1 0
## plot
barplot(my_table)
yielding the following simple barplot:
Please, let me know whether this is what you want
Lorena!
First, there's no need to apply factor() neither table() in the dataset you showed. From what I gather, it looks fine.
R comes with some interesting plotting options, hist() is one of them.
Histogram with hist()
In the following example, I'll use the "Valenz" variable, as named in your dataset.
To get the frequency without needing to beautify it, you can simply ask:
hist(dataset, Valenz)
The first argument (dataset) informs where these values are; the second argument (Valenz) informs which values from dataset you want to use.
If you only want to know the frequency, without having to inform it in some elegant way, that oughta do it (:
Histogram with ggplot()
If you want to make it prettier, you can style your plot with the ggplot2 package, one of the most used packages in R.
First, install and then load the package.
install.packages("ggplot2")
library(ggplot2)
Then, create a histogram with x as the number of times some score occurred.
ggplot(dataset, aes(x = Valenz)) +
geom_histogram(bins = 7, color = "Black", fill = "White") +
labs(title = NULL, x = "Name of my variable", y = "Count of 'Variable'") +
theme_minimal()
ggplot() takes the value of your dataframe, then aes() specifies you want Valenz to be in the x-axis.
geom_histogram() gives you a histogram with "bins = 7" (7 options, since it's a likert scale), and the bars with "color = 'Black'" and "fill = 'White'".
labs() specifies the labels that appear beneath x ("x = "Name of my variable") and then by y (y = "Count of 'Variable'").
theme_minimal() makes the plot look cooler.
I hope I helped you in some way, Lorena. (:
I am looking for a way where data points are connected following a top-down manner to visualize a ranking. In that the y-axis represents the rank and the x-axis the attributes. With the normal setting the line connects the point starting from left to right. This results that the points are connected in the wrong order.
With the data below the line should be connected from (6,1) to (4,2) and then (5,3) etc. Optimally the ranking scale need to be inverted so that rank one starts on the top.
data <- read.table(header=TRUE, text='
attribute rank
1 6
2 5
3 4
4 2
5 3
6 1
7 7
8 11
9 10
10 8
11 9
')
plot(data$attribute,data$rank,type="l")
Is there a way to change the line drawing direction? My second idea would be to rotate the graph or maybe you have better ideas.
The graph I am trying to achieve is somewhat similar to this one:
example vertical line chart
You can do this with ggplot:
library(ggplot2)
ggplot(data, aes(y = attribute, x = rank)) +
geom_line() +
coord_flip() +
scale_x_reverse()
It solves the problem exactly the way you suggested. The first part of the command (ggplot(...) + geom_line()) creates an "ordinary" line plot. Note that I have already switched x- and y-coordinates. The next command (coord_flip()) flips x- and y-axis, and the last one (scale_x_reverse) changes the ordering of the x-axis (which is plotted as the y-axis) such that 1 is in the top left corner.
Just to show you that something like the example you linked in your question can be done with ggplot2, I add the following example:
library(tidyr)
data$attribute2 <- sample(data$attribute)
data$attribute3 <- sample(data$attribute)
plot_data <- pivot_longer(data, cols = -"rank")
ggplot(plot_data, aes(y = value, x = rank, colour = name)) +
geom_line() +
geom_point() +
coord_flip() +
scale_x_reverse()
If you intend to do your plots with R, learning ggplot2 is really worthwhile. You can find many examples on Cookbook for R.
I am trying to create an interaction plot, and R is throwing the error geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?, which I don't understand why. Below is my data frame:
topPagesCount DIRTY_INDUSTRY IND_DIRTY_HETEROGENEITY
1 10 1.4444444 1.1727001
2 831 1.4444444 1.1727001
3 1 0.8218391 0.4599108
4 0 0.8218391 0.4599108
5 0 0.8821549 0.4870270
6 30 0.8190476 0.6582197
7 26 0.8218391 0.4599108
8 0 1.4444444 1.1727001
9 7 0.8821549 0.4870270
10 398 0.8218391 0.4599108
and below is my code:
greatDF$DIRTY_INDUSTRY_fac <- as.factor (greatDF$DIRTY_INDUSTRY)
ggplot(data = greatDF, aes(x = IND_DIRTY_HETEROGENEITY, y=topPagesCount,
colour=DIRTY_INDUSTRY_fac, group=DIRTY_INDUSTRY_fac))+
stat_summary(fun.y=mean, geom="point")+
stat_summary(fun.y=mean, geom="line")
I don't see any reason for the error because clearly, there are more than 1 type of value for my response variable topPagesCount for the interaction term DIRTY_INDUSTRY:IND_DIRTY_HETEROGNEITY...am I right? maybe I am misunderstanding something...
thanks,
The reason this happens, as #Troy points out, is because the grouping itself is meaningless for geom_line() or geom_path(). There are no points to be connected with lines at all!
That's why everything is OK when you remove the last line. Note that this "error" is not an actual error, it plots the legend as it is intended to look, there is not a single actual line that should be plotted according to your aesthetics and stats.
How to fix this? Well, that depends on what you are trying to achieve, as usual. Note the difference between your code and mine:
ggplot(data = greatDF, aes(x = IND_DIRTY_HETEROGENEITY, y=topPagesCount,
colour=DIRTY_INDUSTRY_fac, group=DIRTY_INDUSTRY_fac)) +
geom_line(size=1.4) +
geom_point(size=5, shape=10) +
stat_summary(fun.y=mean, geom="point", size=5)
Is my guess correct? You may see this question for more insights on the topic.
I am trying to plot two vectors with different values, but equal length on the same graph as follows:
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(x,y,days)
a b days
1 23.33 33.33 1
2 24.33 34.33 2
3 25.33 35.33 3
4 26.33 36.33 4
5 27.33 37.33 5
etc..
I am trying to use ggplot2 to plot x and y on the x-axis and the days on the y-axis. However, I can't figure out how to do it. I am able to plot them individually and combine the graphs, but I want just one graph with both a and b vectors (different colors) on x-axis and number of days on y-axis.
What I have so far:
X<-ggplot(df, aes(x=a,y=days)) + geom_line(color="red")
Y<-ggplot(df, aes(x=b,y=days)) + geom_line(color="blue")
Is there any way to define the x-axis for both a and b vectors? I have also tried using the melt long function, but got stuck afterwards.
Any help is much appreciated. Thank you
I think the best way to do it is via a the approach of melting the data (as you have mentioned). Especially if you are going to add more vectors. This is the code
library(reshape2)
library(ggplot2)
a<-23:52
b<-33:62
days<-1:30
df<-data.frame(x=a,y=b,days)
df_molten=melt(df,id.vars="days")
ggplot(df_molten) + geom_line(aes(x=value,y=days,color=variable))
You can also change the colors manually via scale_color_manual.
A simpler solution is to use only ggplot. The following code will work in your case
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(a,b,days)
ggplot(data = df)+
geom_line(aes(x = df$days,y = df$a), color = "blue")+
geom_line(aes(x = df$days,y = df$b), color = "red")
I added the colors, you might want to use them to differentiate between your variables.