I'm new to R and feeling a bit lost ... I'm working on a dataset which contains 7 point-likert-scale answers.
My data looks like this for example:
My goal is to create a barplot which displays the likert scale on the x-lab and frequency on y-lab.
What I understood so far is that I first have to transform my data into a frequency table. For this I used a code that I found in another post on this site:
data <- factor(data, levels = c(1:7))
table(data)
However I always get this output:
data
1 2 3 4 5 6 7
0 0 0 0 0 0 0
Any ideas what went wrong or other ideas how I could realize my plan?
Thanks a lot!
Lorena
This is a very simple way of handling your question, only using base-R
## your data
my_obs <- c(4,5,3,4,5,5,3,3,3,6)
## use a factor for class data
## you could consider making it ordered (ordinal data)
## which makes sense for Likert data
## type "?factor" in the console to see the documentation
my_factor <- factor(my_obs, levels = 1:7)
## calculate the frequencies
my_table <- table(my_factor)
## print my_table
my_table
# my_factor
# 1 2 3 4 5 6 7
# 0 0 4 2 3 1 0
## plot
barplot(my_table)
yielding the following simple barplot:
Please, let me know whether this is what you want
Lorena!
First, there's no need to apply factor() neither table() in the dataset you showed. From what I gather, it looks fine.
R comes with some interesting plotting options, hist() is one of them.
Histogram with hist()
In the following example, I'll use the "Valenz" variable, as named in your dataset.
To get the frequency without needing to beautify it, you can simply ask:
hist(dataset, Valenz)
The first argument (dataset) informs where these values are; the second argument (Valenz) informs which values from dataset you want to use.
If you only want to know the frequency, without having to inform it in some elegant way, that oughta do it (:
Histogram with ggplot()
If you want to make it prettier, you can style your plot with the ggplot2 package, one of the most used packages in R.
First, install and then load the package.
install.packages("ggplot2")
library(ggplot2)
Then, create a histogram with x as the number of times some score occurred.
ggplot(dataset, aes(x = Valenz)) +
geom_histogram(bins = 7, color = "Black", fill = "White") +
labs(title = NULL, x = "Name of my variable", y = "Count of 'Variable'") +
theme_minimal()
ggplot() takes the value of your dataframe, then aes() specifies you want Valenz to be in the x-axis.
geom_histogram() gives you a histogram with "bins = 7" (7 options, since it's a likert scale), and the bars with "color = 'Black'" and "fill = 'White'".
labs() specifies the labels that appear beneath x ("x = "Name of my variable") and then by y (y = "Count of 'Variable'").
theme_minimal() makes the plot look cooler.
I hope I helped you in some way, Lorena. (:
Related
I have scripted a ggplot compiled from two separate data frames, but as it stands there is no legend as the colours aren't included in aes. I'd prefer to keep the two datasets separate if possible, but can't figure out how to add the legend. Any thoughts?
I've tried adding the colours directly to the aes function, but then colours are just added as variables and listed in the legend instead of colouring the actual data.
Plotting this with base r, after creating the plot I would've used:
legend("top",c("Delta 18O","Delta 13C"),fill=c("red","blue")
and gotten what I needed, but I'm not sure how to replicate this in ggplot.
The following code currently plots exactly what I want, it's just missing the legend... which ideally should match what the above line would produce, except the "18" and "13" need superscripted.
Examples of an old plot using base r (with a correct legend, except lacking superscripted 13 and 18) and the current plot missing the legend can be found here:
Old: https://imgur.com/xgd9e9C
New, missing legend: https://imgur.com/eGRhUzf
Background data
head(avar.data.x)
time av error
1 1.015223 0.030233604 0.003726832
2 2.030445 0.014819145 0.005270609
3 3.045668 0.010054801 0.006455241
4 4.060891 0.007477541 0.007453974
5 5.076113 0.006178282 0.008333912
6 6.091336 0.004949045 0.009129470
head(avar.data.y)
time av error
1 1.015223 0.06810001 0.003726832
2 2.030445 0.03408136 0.005270609
3 3.045668 0.02313839 0.006455241
4 4.060891 0.01737148 0.007453974
5 5.076113 0.01405144 0.008333912
6 6.091336 0.01172788 0.009129470
The following avarn function produces a data frame with three columns and several thousand rows (see header above). These are then graphed over time on a log/log plot.
avar.data.x <- avarn(data3$"d Intl. Std:d 13C VPDB - Value",frequency)
avar.data.y <- avarn(data3$"d Intl. Std:d 18O VPDB-CO2 - Value",frequency)
Create allan deviation plot
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av)),color="red")+
geom_line(data=avar.data.x,aes(x=time,y=sqrt(av)),color="blue")+
scale_x_log10()+
scale_y_log10()+
labs(x=expression(paste("Averaging Time ",tau," (seconds)")),y="Allan Deviation (per mil)")
The above plot is only missing a legend to show the name of the two plotted datasets and their respective colours. I would like the legend in the top centre of the graph.
How to superscript legend titles?:
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av),
color =expression(paste("Delta ",18^,"O"))))+
geom_line(data=avar.data.xmod,aes(x=time,y=sqrt(av),
color=expression(paste("Delta ",13^,"C"))))+
scale_color_manual(values = c("blue", "red"),name=NULL) +
scale_x_log10()+
scale_y_log10()+
labs(
x=expression(paste("Averaging Time ",tau," (seconds)")),
y="Allan Deviation (per mil)") +
theme(legend.position = c(0.5, 0.9))
Set color inside the aes and add a scale_color_ function to your plot should do the trick.
ggplot()+
geom_line(data=avar.data.y,aes(x=time,y=sqrt(av), color = "a"))+
geom_line(data=avar.data.x,aes(x=time,y=sqrt(av), color="b"))+
scale_color_manual(
values = c("red", "blue"),
labels = expression(avar.data.x^2, "b")
) +
scale_x_log10()+
scale_y_log10()+
labs(
x=expression(paste("Averaging^2 Time ",tau," (seconds)")),
y="Allan Deviation (per mil)") +
theme(legend.position = c(0.5, 0.9))
You can make better use of ggplot's aesthetics by combining both data sets into one. This is particularly easy when your data frames have the same structure. Here, you could then for example use color.
This way you only need one call to geom_line and it is easier to control the legend(s). You could even make some fancy function to automate your labels. etc.
Also note that white spaces in column names are not great (you're making your own life very difficult) and that you may want to think about automating your avarn calls, e.g. with lapply, which would result in a list of data frames and makes the binding of the data frames even easier.
avar.data.x <- readr::read_table("0 time av error
1 1.015223 0.030233604 0.003726832
2 2.030445 0.014819145 0.005270609
3 3.045668 0.010054801 0.006455241
4 4.060891 0.007477541 0.007453974
5 5.076113 0.006178282 0.008333912
6 6.091336 0.004949045 0.009129470")
avar.data.y <- readr::read_table("0 time av error
1 1.015223 0.06810001 0.003726832
2 2.030445 0.03408136 0.005270609
3 3.045668 0.02313839 0.006455241
4 4.060891 0.01737148 0.007453974
5 5.076113 0.01405144 0.008333912
6 6.091336 0.01172788 0.009129470")
library(tidyverse)
combine_df <- bind_rows(list(a = avar.data.x, b = avar.data.y), .id = 'ID')
ggplot(combine_df)+
geom_line(aes(x = time, y = sqrt(av), color = ID))+
scale_color_manual(values = c("red", "blue"),
labels = c(expression("Delta 18"^"O"), expression("Delta 13"^"C")))
Created on 2019-11-11 by the reprex package (v0.2.1)
This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 5 years ago.
This is my first time using ggplot2. I have a table of 3 columns and I want to plot frequency distribution of all three columns in one figure. I have only used hist() before so I am a little lost on this ggplot2. Here is an example of my table. Tab separated table with 3 columns A,B,C headers.
A B C
1.38502 1.38502 -nan
0.637291 0.753084 1.55556
0.0155242 0.0164394 -nan
3.29355 1.15757 -nan
1.00254 1.10108 0.132039
0.0155424 0.0155424 nan
0.760261 0.681639 0.298851
1.21365 1.21365 -nan
1.216 1.22541 -nan
0.61317 0.738528 0.585657
0.618276 0.940312 0.820591
1.96779 1.31051 1.58609
0.725413 2.29621 1.78989
0.684681 0.67331 0.290221
I have used the following code by looking up similar posts but I end up with error.
library(ggplot2)
dnds <- read.table('dNdS_plotfile', header =TRUE)
ggplot(data=dnds, melt(dnds), aes_(value, fill = L1))+
geom_histogram()
ERROR:No id variables; using all as measure variables
Error: Mapping should be created with aes() or aes_().
I am really lost on how to solve this error. I want one figure with three different colored histograms that do not overlap in my final figure. Please help me achieve this. Thank you.
This should accomplish what you're looking for. I like to load the package tidyverse, which loads a bunch of helpful packages like ggplot2 and dplyr.
In geom_histogram(), you can specify the bindwidth of the histograms with the argument binwidth() or the number of bins with bins(). If you also want the bars to not be stacked you can use the argument position = "dodge".
See the documentation here: http://ggplot2.tidyverse.org/reference/geom_histogram.html
library(tidyverse)
data <- read.table("YOUR_DATA", header = T)
graph <- data %>%
gather(category, value)
ggplot(graph, aes(x = value, fill = category)) +
geom_histogram(binwidth = 0.5, color = "black")
I am having trouble deciding how to graph the data I have.
It consists of overlapping quantities that represent a population, hence my decision to use a stacked bar.
These represent six population divisions ("groups") wherein group 1 and group 2 are the main division. Groups 4 to 6 are subgroups of two, and these are subgroups of each other. Its simple diagram is below:
Note: groups 1 and 2 complete the entire population or group 1 + group 2 = 100%.
I want all of these information in one chart which I do not know what and how to implement.
So far I have the one below, which is wrong because Group 1 is included in the main bar.
require(ggplot2)
require(reshape)
tab <- data.frame(
set=c("XXX","XXX","XXX","XXX","XXX","XXX"),
group=c("1","6","5","4","3","2"),
rate=as.numeric(c(10000,20000,50000,55000,75000,100000))
)
dat <- melt(tab)
dat$time <- factor(dat$group,levels=dat$group)
ggplot(dat,aes(x=set)) +
geom_bar(aes(weight=value,fill=group),position="fill",color="#7F7F7F") +
scale_fill_brewer("Groups", palette="OrRd")
What do you guys suggest to visualize it? I want to use R and ggplot for consistency and uniformity with the other graphs I have made already.
Using facets you can divide your plot into two:
# changed value of set for group 1
tab <- data.frame(
set=c("UUU","XXX","XXX","XXX","XXX","XXX"),
group=c("1","6","5","4","3","2"),
rate=as.numeric(c(10000,20000,50000,55000,75000,100000))
)
# explicitly defined id.vars
dat <- melt(tab, id.vars=c('set','group'))
dat$time <- factor(dat$group,levels=dat$group)
# added facet_wrap, in geom_bar aes changed weight to y,
# added stat="identity", changed position="stack"
ggplot(dat,aes(x=set)) +
geom_bar(aes(y=value,fill=group),position="stack", stat="identity", color="#7F7F7F") +
scale_fill_brewer("Groups", palette="OrRd") +
facet_wrap(~set, scale="free_x")
My guess is what you need is a treemap. Please correct me if I misunderstood your question.
here a link on Treemapping]1
If tree map is what you need you can use either portfolio package or googleVis.
assume the following frequency table in R, which comes out of a survey:
1 2 3 4 5 8
m 5 16 3 16 5 0
f 12 25 3 10 3 1
NA 1 0 0 0 0 0
The rows stand for the gender of the survey respondent (male/female/no answer). The colums represent the answers to a question on a 5 point scale (let's say: 1= agree fully, 2 = agree somewhat, 3 = neither agree nor disagree, 4= disagree somewhat, 5 = disagree fully, 8 = no answer).
The data is stored in a dataframe called "slm", the gender variable is called "sex", the other variable is called "tv_serien".
My problem is, that I don't find a (in my opinion) proper way to create a line chart, where the x-axis represents the 5-point scale (plus the don't know answers) and the y-axis represents the frequencies for every point on the scale. Furthemore I want to create two lines (one for males, one for females).
My solution so far is the following:
I create a plot without plotting the "content" and the x-axis:
plot(slm$tv_serien, xlim = c(1,6), ylim = c(0,100), type = "n", xaxt = "n")
The problem here is that it feels like cheating to specify the xlim=c(1,6), because the raw scores of slm$tv_serienare 100 values. I tried also to to plot the variable via plot(factor(slm$tv_serien)...), but then it would still create a metric scale from 1 to 8 (because the dont know answer is 8).
So my first question is how to tell R that it should take the six distinct values (1 to 5 and 8) and take that as the x-axis?
I create the new x axis with proper labels:
axis(1, 1:6, labels = c("1", "2", "3", "4", "5", "DK"))
At least that works pretty well. ;-)
Next I create the line for the males:
lines(1:5, table(slm$tv_serien[slm$sex == 1]), col = "blue")
The problem here is that there is no DK (=8) answer, so I manually have to specify x = 1:5 instead of 1:6 in the "normal" case. My question here is, how to tell R to also draw the line for nonexisting values? For example, what would have happened, if no male had answered with 3, but I want a continuous line?
At last I create the line for females, which works well:
lines(1:6, table(slm$tv_serien[slm$sex == 2], col = "red")
To summarize:
How can I tell R to take the 6 distinct values of slm$tv_serien as the x axis?
How can i draw continuous lines even if the line contains "0"?
Thanks for your help!
PS: Attached you find the current plot for the abovementiond functions.
PPS: I tried to make a list from "1." to "4." but it seems that every new list element started again with "1.". Sorry.
Edit: Response to OP's comment.
This directly creates a line chart of OP's data. Below this is the original answer using ggplot, which produces a far superior output.
Given the frequency table you provided,
df <- data.frame(t(freqTable)) # transpose (more suitable for plotting)
df <- cbind(Response=rownames(df),df) # add row names as first column
plot(as.numeric(df$Response),df$f,type="b",col="red",
xaxt="n", ylab="Count",xlab="Response")
lines(as.numeric(df$Response),df$m,type="b",col="blue")
axis(1,at=c(1,2,3,4,5,6),labels=c("Str.Agr.","Sl.Agr","Neither","Sl.Disagr","Str.Disagr","NA"))
Produces this, which seems like what you were looking for.
Original Answer:
Not quite what you asked for, but converting your frequency table to a data frame, df
df <- data.frame(freqTable)
df <- cbind(Gender=rownames(df),df) # append rownames (Gender)
df <- df[-3,] # drop unknown gender
df
# Gender X1 X2 X3 X4 X5 X8
# m m 5 16 3 16 5 0
# f f 12 25 3 10 3 1
df <- df[-3,] # remove unknown gender column
library(ggplot2)
library(reshape2)
gg=melt(df)
labels <- c("Agree\nFully","Somewhat\nAgree","Neither Agree\nnor Disagree","Somewhat\nDisagree","Disagree\nFully", "No Answer")
ggp <- ggplot(gg,aes(x=variable,y=value))
ggp <- ggp + geom_bar(aes(fill=Gender), position="dodge", stat="identity")
ggp <- ggp + scale_x_discrete(labels=labels)
ggp <- ggp + theme(axis.text.x = element_text(angle=90, vjust=0.5))
ggp <- ggp + labs(x="", y="Frequency")
ggp
Produces this:
Or, this, which is much better:
ggp + facet_grid(Gender~.)
I have a data like this
subject<-1:208
ev<-runif(208, min=1, max=2)
seeds<-gl(6,40,labels=c('seed1', 'seed2','seed3','seed4','seed5','seed6'),length=208)
ngambles<-gl(2,1, labels=c('4','32'))
trial<-rep(1:20, each= 2, length=208)
ngambles<-rep('4','32' ,each=1, length=208)
data<-data.frame(subject,ev,seeds,ngambles,trial)
the data looks like this
subject ev seeds ngambles trial
1 1.996717 seed1 4 1
2 1.280977 seed1 32 1
3 1.571648 seed1 4 2
4 1.153311 seed1 32 2
5 1.502559 seed1 4 3
6 1.644001 seed1 32 3
I plot a graph with rep as x axis and expected_value as y axis for each seed and n_gambles by this command.
qplot(trial,ev,data=data,
facets=ngambles~seeds,xlab="Trial", ylab="Expected Value", geom="line")+
opts(title = "Expected Value for Each Seed")
now I want to draw a new graph by aggregating ev for trial equal to 1-5, 6-10,11-15,and 16-20. I also want to draw an error bar.
I have no clue how to do in R
maybe somebody can help me
thanks in advance
Assuming that your data frame is called df. First, added new column ag that show to which interval original trial value will belong with function cut().
df$ag<-cut(df$trial,c(1,6,11,16,21),right=FALSE)
Now there is two possibilities - first, aggregate your data using stat_.. functions of ggplot2. There is stat_summary() function already defined and then you should define also stat_sum_df() function (taken from stat_summary() help file) to calculate more than one summary value.
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data=fun, colour="red", geom=geom, width=0.2, ...)
}
With stat_sum_df() and argument "mean_cl_normal" calculate confidence intervals to use in geom="errorbar" and with stat_summary() mean value for geom="line". As x value use new column ag. With scale_x_discrete() you can get right labels for x axis.
ggplot(df, aes(ag,ev,group=seeds))+stat_sum_df("mean_cl_normal",geom="errorbar")+
stat_summary(fun.y="mean",geom="line",color="red")+
facet_grid(ngambles~seeds)+
scale_x_discrete(labels=c("1-5","6-10","11-15","16-20"))
Second approach is to summarize data before plotting, for example, with function ddply() from library plyr. Also in this case you need column ag made in first example. And then use new data for plotting.
library(plyr)
df.new<-ddply(df,.(ag,seeds,ngambles),summarise,ev.m=mean(ev),
ev.lim=qt(0.975,length(ev)-1)*sd(ev)/sqrt(length(ev)))
ggplot(df.new,aes(ag,group=seeds))+
geom_errorbar(aes(y=ev.m,ymin=ev.m-ev.lim,ymax=ev.m+ev.lim))+
geom_line(aes(y=ev.m))+
facet_grid(ngambles~seeds)+
scale_x_discrete(labels=c("1-5","6-10","11-15","16-20"))