R plot function - axes for a line chart - r

assume the following frequency table in R, which comes out of a survey:
1 2 3 4 5 8
m 5 16 3 16 5 0
f 12 25 3 10 3 1
NA 1 0 0 0 0 0
The rows stand for the gender of the survey respondent (male/female/no answer). The colums represent the answers to a question on a 5 point scale (let's say: 1= agree fully, 2 = agree somewhat, 3 = neither agree nor disagree, 4= disagree somewhat, 5 = disagree fully, 8 = no answer).
The data is stored in a dataframe called "slm", the gender variable is called "sex", the other variable is called "tv_serien".
My problem is, that I don't find a (in my opinion) proper way to create a line chart, where the x-axis represents the 5-point scale (plus the don't know answers) and the y-axis represents the frequencies for every point on the scale. Furthemore I want to create two lines (one for males, one for females).
My solution so far is the following:
I create a plot without plotting the "content" and the x-axis:
plot(slm$tv_serien, xlim = c(1,6), ylim = c(0,100), type = "n", xaxt = "n")
The problem here is that it feels like cheating to specify the xlim=c(1,6), because the raw scores of slm$tv_serienare 100 values. I tried also to to plot the variable via plot(factor(slm$tv_serien)...), but then it would still create a metric scale from 1 to 8 (because the dont know answer is 8).
So my first question is how to tell R that it should take the six distinct values (1 to 5 and 8) and take that as the x-axis?
I create the new x axis with proper labels:
axis(1, 1:6, labels = c("1", "2", "3", "4", "5", "DK"))
At least that works pretty well. ;-)
Next I create the line for the males:
lines(1:5, table(slm$tv_serien[slm$sex == 1]), col = "blue")
The problem here is that there is no DK (=8) answer, so I manually have to specify x = 1:5 instead of 1:6 in the "normal" case. My question here is, how to tell R to also draw the line for nonexisting values? For example, what would have happened, if no male had answered with 3, but I want a continuous line?
At last I create the line for females, which works well:
lines(1:6, table(slm$tv_serien[slm$sex == 2], col = "red")
To summarize:
How can I tell R to take the 6 distinct values of slm$tv_serien as the x axis?
How can i draw continuous lines even if the line contains "0"?
Thanks for your help!
PS: Attached you find the current plot for the abovementiond functions.
PPS: I tried to make a list from "1." to "4." but it seems that every new list element started again with "1.". Sorry.

Edit: Response to OP's comment.
This directly creates a line chart of OP's data. Below this is the original answer using ggplot, which produces a far superior output.
Given the frequency table you provided,
df <- data.frame(t(freqTable)) # transpose (more suitable for plotting)
df <- cbind(Response=rownames(df),df) # add row names as first column
plot(as.numeric(df$Response),df$f,type="b",col="red",
xaxt="n", ylab="Count",xlab="Response")
lines(as.numeric(df$Response),df$m,type="b",col="blue")
axis(1,at=c(1,2,3,4,5,6),labels=c("Str.Agr.","Sl.Agr","Neither","Sl.Disagr","Str.Disagr","NA"))
Produces this, which seems like what you were looking for.
Original Answer:
Not quite what you asked for, but converting your frequency table to a data frame, df
df <- data.frame(freqTable)
df <- cbind(Gender=rownames(df),df) # append rownames (Gender)
df <- df[-3,] # drop unknown gender
df
# Gender X1 X2 X3 X4 X5 X8
# m m 5 16 3 16 5 0
# f f 12 25 3 10 3 1
df <- df[-3,] # remove unknown gender column
library(ggplot2)
library(reshape2)
gg=melt(df)
labels <- c("Agree\nFully","Somewhat\nAgree","Neither Agree\nnor Disagree","Somewhat\nDisagree","Disagree\nFully", "No Answer")
ggp <- ggplot(gg,aes(x=variable,y=value))
ggp <- ggp + geom_bar(aes(fill=Gender), position="dodge", stat="identity")
ggp <- ggp + scale_x_discrete(labels=labels)
ggp <- ggp + theme(axis.text.x = element_text(angle=90, vjust=0.5))
ggp <- ggp + labs(x="", y="Frequency")
ggp
Produces this:
Or, this, which is much better:
ggp + facet_grid(Gender~.)

Related

How to create surface plot in R

I'm currently trying to develop a surface plot that examines the results of the below data frame. I want to plot the increasing values of noise on the x-axis and the increasing values of mu on the y-axis, with the point estimate values on the z-axis. After looking at ggplot2 and ggplotly, it's not clear how I would plot each of these columns in surface or 3D plot.
df <- "mu noise0 noise1 noise2 noise3 noise4 noise5
1 1 0.000000 0.9549526 0.8908646 0.919630 1.034607
2 2 1.952901 1.9622004 2.0317115 1.919011 1.645479
3 3 2.997467 0.5292921 2.8592976 3.034377 3.014647
4 4 3.998339 4.0042379 3.9938346 4.013196 3.977212
5 5 5.001337 4.9939060 4.9917115 4.997186 5.009082
6 6 6.001987 5.9929932 5.9882173 6.015318 6.007156
7 7 6.997924 6.9962483 7.0118066 6.182577 7.009172
8 8 8.000022 7.9981131 8.0010066 8.005220 8.024569
9 9 9.004437 9.0066182 8.9667536 8.978415 8.988935
10 10 10.006595 9.9987245 9.9949733 9.993018 10.000646"
Thanks in advance.
Here's one way using geom_tile(). First, you will want to get your data frame into more of a Tidy format, where the goal is to have columns:
mu: nothing changes here
noise: need to combine your "noise0", "noise1", ... columns together, and
z: serves as the value of the noise and we will apply the fill= aesthetic using this column.
To do that, I'm using dplyr and gather(), but there are other ways (melt(), or pivot_longer() gets you that too). I'm also adding some code to pull out just the number portion of the "noise" columns and then reformatting that as an integer to ensure that you have x and y axes as numeric/integers:
# assumes that df is your data as data.frame
df <- df %>% gather(key="noise", value="z", -mu)
df <- df %>% separate(col = "noise", into=c('x', "noise"), sep=5) %>% select(-x)
df$noise <- as.integer(df$noise)
Here's an example of how you could plot it, but aesthetics are up to you. I decided to also include geom_text() to show the actual values of df$z so that we can see better what's going on. Also, I'm using rainbow because "it's pretty" - you may want to choose a more appropriate quantitative comparison scale from the RColorBrewer package.
ggplot(df, aes(x=noise, y=mu, fill=z)) + theme_bw() +
geom_tile() +
geom_text(aes(label=round(z, 2))) +
scale_fill_gradientn(colors = rainbow(5))
EDIT: To answer OP's follow up, yes, you can also showcase this via plotly. Here's a direct transition:
p <- plot_ly(
df, x= ~noise, y= ~mu, z= ~z,
type='mesh3d', intensity = ~z,
colors= colorRamp(rainbow(5))
)
p
Static image here:
A much more informative way to show this particular set of information is to see the variation of df$z as it relates to df$mu by creating df$delta_z and then using that to plot. (you can also plot via ggplot() + geom_tile() as above):
df$delta_z <- df$z - df$mu
p1 <- plot_ly(
df, x= ~noise, y= ~mu, z= ~delta_z,
type='mesh3d', intensity = ~delta_z,
colors= colorRamp(rainbow(5))
)
Giving you this (static image here):
ggplot accepts data in the long format, which means that you need to melt your dataset using, for example, a function from the reshape2 package:
dfLong = melt(df,
id.vars = "mu",
variable.name = "noise",
value.name = "meas")
The resulting column noise contains entries such as noise0, noise1, etc. You can extract the numbers and convert to a numeric column:
dfLong$noise = with(dfLong, as.numeric(gsub("noise", "", noise)))
This converts your data to:
mu noise meas
1 1 0 1.0000000
2 2 0 2.0000000
3 3 0 3.0000000
...
As per ggplot documentation:
ggplot2 can not draw true 3D surfaces, but you can use geom_contour(), geom_contour_filled(), and geom_tile() to visualise 3D surfaces in 2D.
So, for example:
ggplot(dfLong,
aes(x = noise
y = mu,
fill = meas)) +
geom_tile() +
scale_fill_gradientn(colours = terrain.colors(10))
Produces:

R: creating a likert scale barplot

I'm new to R and feeling a bit lost ... I'm working on a dataset which contains 7 point-likert-scale answers.
My data looks like this for example:
My goal is to create a barplot which displays the likert scale on the x-lab and frequency on y-lab.
What I understood so far is that I first have to transform my data into a frequency table. For this I used a code that I found in another post on this site:
data <- factor(data, levels = c(1:7))
table(data)
However I always get this output:
data
1 2 3 4 5 6 7
0 0 0 0 0 0 0
Any ideas what went wrong or other ideas how I could realize my plan?
Thanks a lot!
Lorena
This is a very simple way of handling your question, only using base-R
## your data
my_obs <- c(4,5,3,4,5,5,3,3,3,6)
## use a factor for class data
## you could consider making it ordered (ordinal data)
## which makes sense for Likert data
## type "?factor" in the console to see the documentation
my_factor <- factor(my_obs, levels = 1:7)
## calculate the frequencies
my_table <- table(my_factor)
## print my_table
my_table
# my_factor
# 1 2 3 4 5 6 7
# 0 0 4 2 3 1 0
## plot
barplot(my_table)
yielding the following simple barplot:
Please, let me know whether this is what you want
Lorena!
First, there's no need to apply factor() neither table() in the dataset you showed. From what I gather, it looks fine.
R comes with some interesting plotting options, hist() is one of them.
Histogram with hist()
In the following example, I'll use the "Valenz" variable, as named in your dataset.
To get the frequency without needing to beautify it, you can simply ask:
hist(dataset, Valenz)
The first argument (dataset) informs where these values are; the second argument (Valenz) informs which values from dataset you want to use.
If you only want to know the frequency, without having to inform it in some elegant way, that oughta do it (:
Histogram with ggplot()
If you want to make it prettier, you can style your plot with the ggplot2 package, one of the most used packages in R.
First, install and then load the package.
install.packages("ggplot2")
library(ggplot2)
Then, create a histogram with x as the number of times some score occurred.
ggplot(dataset, aes(x = Valenz)) +
geom_histogram(bins = 7, color = "Black", fill = "White") +
labs(title = NULL, x = "Name of my variable", y = "Count of 'Variable'") +
theme_minimal()
ggplot() takes the value of your dataframe, then aes() specifies you want Valenz to be in the x-axis.
geom_histogram() gives you a histogram with "bins = 7" (7 options, since it's a likert scale), and the bars with "color = 'Black'" and "fill = 'White'".
labs() specifies the labels that appear beneath x ("x = "Name of my variable") and then by y (y = "Count of 'Variable'").
theme_minimal() makes the plot look cooler.
I hope I helped you in some way, Lorena. (:

How can I have different color for each bar of stack barplots? in R

My question maybe very simple but I couldn't find the answer!
I have a matrix with 12 entries and I made a stack barplot with barplot function in R.
With this code:
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
barplot(mydata, xlim=c(0,25),horiz=T,
legend.text = c("A","B","C","D","E","F"),
col=c("blue","green"),axisnames = T, main="Stack barplot")
Here is the image from the code:
What I want to do is to give each of the group (A:F , only the blue part) a different color but I couldn't add more than two color.
and I also would like to know how can I start the plot from x=2 instead of 0.
I know it's possible to choose the range of x by using xlim=c(2,25) but when I choose that part of my bars are out of range and I get picture like this:
What I want is to ignore the part of bars that are smaller than 2 and start the x-axis from two and show the rest of bars instead of put them out of range.
Thank you in advance,
As already mentioned in the other post is entirely clear your desired output. Here another option using ggplot2. I think the difficulty here is to reshape2 the data, then the plot step is straightforwardly.
library(reshape2)
library(ggplot2)
## Set a seed to make your data reproducible
set.seed(1)
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
## tranfsorm you matrix to names data.frame
myData <- setNames(as.data.frame(mydata),LETTERS[1:6])
## put the data in the long format
dd <- melt(t(myData))
## transform the fill variable to the desired behavior.
## I used cumsum to bes sure to have a unique value for all VAR2==2.
## maybe you should chyange this step if you want an alternate behvior
## ( see other solution)
dd <- transform(dd,Var2 =ifelse(Var2==1,cumsum(Var2)+2,Var2))
## a simple bar plot
ggplot(dd) +
## use stat identity since you want to set the y aes
geom_bar(aes(x=Var1,fill=factor(Var2),y=value),stat='identity') +
## horizontal rotation and zooming
coord_flip(ylim = c(2, max(dd$value)*2)) +
theme_bw()
Another option using lattice package
I like the formula notation in lattice and its flexibility for flipping coordinates for example:
library(lattice)
barchart(Var1~value,groups=Var2,data=dd,stack=TRUE,
auto.key = list(space = "right"),
prepanel = function(x,y, ...) {
list(xlim = c(2, 2*max(x, na.rm = TRUE)))
})
You do this by using the "add" and "offset" arguments to barplot(), along with setting axes and axisnames FALSE to avoid double-plotting: (I'm throwing in my color-blind color palette, as I'm red-green color-blind)
# Conservative 8-color palette adapted for color blindness, with first color = "black".
# Wong, Bang. "Points of view: Color blindness." nature methods 8.6 (2011): 441-441.
colorBlind.8 <- c(black="#000000", orange="#E69F00", skyblue="#56B4E9", bluegreen="#009E73",
yellow="#F0E442", blue="#0072B2", reddish="#D55E00", purplish="#CC79A7")
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
cols <- colorBlind.8[1:ncol(mydata)]
bar2col <- colorBlind.8[8]
barplot(mydata[1,], xlim=c(0,25), horiz=T, col=cols, axisnames=T,
legend.text=c("A","B","C","D","E","F"), main="Stack barplot")
barplot(mydata[2,], offset=mydata[1,], add=T, axes=F, axisnames=F, horiz=T, col=bar2col)
For the second part of your question, the "offset" argument is used for the first set of bars also, and you change xlim and use xaxp to adjust the x-axis numbering, and of course you must also adjust the height of the first row of bars to remove the excess offset:
offset <- 2
h <- mydata[1,] - offset
h[h < 0] <- 0
barplot(h, offset=offset, xlim=c(offset,25), xaxp=c(offset,24,11), horiz=T,
legend.text=c("A","B","C","D","E","F"),
col=cols, axisnames=T, main="Stack barplot")
barplot(mydata[2,], offset=offset+h, add=T, axes=F, axisnames=F, horiz=T, col=bar2col)
I'm not entirely sure if this is what you're looking for: 'A' has two values (x1 and x2), but your legend seems to hint otherwise.
Here is a way to approach what you want with ggplot. First we set up the data.frame (required for ggplot):
set.seed(1)
df <- data.frame(
name = letters[1:6],
x1=sample(1:6, replace=T),
x2=sample(1:6, replace=T))
name x1 x2
1 a 5 3
2 b 3 5
3 c 5 6
4 d 3 2
5 e 5 4
6 f 6 1
Next, ggplot requires it to be in a long format:
# Make it into ggplot format
require(dplyr); require(reshape2)
df <- df %>%
melt(id.vars="name")
name variable value
1 a x1 5
2 b x1 3
3 c x1 5
4 d x1 3
5 e x1 5
6 f x1 6
...
Now, as you want some bars to be a different colour, we need to give them an alternate name so that we can assign their colour manually.
df <- df %>%
mutate(variable=ifelse(
name %in% c("b", "d", "f") & variable == "x1",
"highlight_x1",
as.character(variable)))
name variable value
1 a x1 2
2 b highlight_x1 3
3 c x1 4
4 d highlight_x1 6
5 e x1 2
6 f highlight_x1 6
7 a x2 6
8 b x2 4
...
Next, we build the plot. This uses the standard colours:
require(ggplot2)
p <- ggplot(data=df, aes(y=value, x=name, fill=factor(variable))) +
geom_bar(stat="identity", colour="black") +
theme_bw() +
coord_flip(ylim=c(1,10)) # Zooms in on y = c(2,12)
Note that I use coord_flip (which in turn calls coord_cartesian) with the ylim=c(1,10) parameter to 'zoom in' on the data. It doesn't remove the data, it just ignores it (unlike setting the limits in the scale). Now, if you manually specify the colours:
p + scale_fill_manual(values = c(
"x1"="coral3",
"x2"="chartreuse3",
"highlight_x1"="cornflowerblue"))
I would like to simplify the proposed solution by #tedtoal, which was the finest one for me.
I wanted to create a barplot with different colors for each bar, without the need to use ggplot or lettuce.
color_range<- c(black="#000000", orange="#E69F00", skyblue="#56B4E9", bluegreen="#009E73",yellow="#F0E442", blue="#0072B2", reddish="#D55E00", purplish="#CC79A7")
barplot(c(1,6,2,6,1), col= color_range[1:length(c(1,6,2,6,1))])

Color Dependent Bar Graph in R

I'm a bit out of my depth with this one here. I have the following code that generates two equally sized matrices:
MAX<-100
m<-5
n<-40
success<-matrix(runif(m*n,0,1),m,n)
samples<-floor(MAX*matrix(runif(m*n),m))+1
the success matrix is the probability of success and the samples matrix is the corresponding number of samples that was observed in each case. I'd like to make a bar graph that groups each column together with the height being determined by the success matrix. The color of each bar needs to be a color (scaled from 1 to MAX) that corresponds to the number of observations (i.e., small samples would be more red, for instance, whereas high samples would be green perhaps).
Any ideas?
Here is an example with ggplot. First, get data into long format with melt:
library(reshape2)
data.long <- cbind(melt(success), melt(samples)[3])
names(data.long) <- c("group", "x", "success", "count")
head(data.long)
# group x success count
# 1 1 1 0.48513473 8
# 2 2 1 0.56583802 58
# 3 3 1 0.34541582 40
# 4 4 1 0.55829073 64
# 5 5 1 0.06455401 37
# 6 1 2 0.88928606 78
Note melt will iterate through the row/column combinations of both matrices the same way, so we can just cbind the resulting molten data frames. The [3] after the second melt is so we don't end up with repeated group and x values (we only need the counts from the second melt). Now let ggplot do its thing:
library(ggplot2)
ggplot(data.long, aes(x=x, y=success, group=group, fill=count)) +
geom_bar(position="stack", stat="identity") +
scale_fill_gradient2(
low="red", mid="yellow", high="green",
midpoint=mean(data.long$count)
)
Using #BrodieG's data.long, this plot might be a little easier to interpret.
library(ggplot2)
library(RColorBrewer) # for brewer.pal(...)
ggplot(data.long) +
geom_bar(aes(x=x, y=success, fill=count),colour="grey70",stat="identity")+
scale_fill_gradientn(colours=brewer.pal(9,"RdYlGn")) +
facet_grid(group~.)
Note that actual values are probably different because you use random numbers in your sample. In future, consider using set.seed(n) to generate reproducible random samples.
Edit [Response to OP's comment]
You get numbers for x-axis and facet labels because you start with matrices instead of data.frames. So convert success and samples to data.frames, set the column names to whatever your test names are, and prepend a group column with the "list of factors". Converting to long format is a little different now because the first column has the group names.
library(reshape2)
set.seed(1)
success <- data.frame(matrix(runif(m*n,0,1),m,n))
success <- cbind(group=rep(paste("Factor",1:nrow(success),sep=".")),success)
samples <- data.frame(floor(MAX*matrix(runif(m*n),m))+1)
samples <- cbind(group=success$group,samples)
data.long <- cbind(melt(success,id=1), melt(samples, id=1)[3])
names(data.long) <- c("group", "x", "success", "count")
One way to set a threshold color is to add a column to data.long and use that for fill:
threshold <- 25
data.long$fill <- with(data.long,ifelse(count>threshold,max(count),count))
Putting it all together:
library(ggplot2)
library(RColorBrewer)
ggplot(data.long) +
geom_bar(aes(x=x, y=success, fill=fill),colour="grey70",stat="identity")+
scale_fill_gradientn(colours=brewer.pal(9,"RdYlGn")) +
facet_grid(group~.)+
theme(axis.text.x=element_text(angle=-90,hjust=0,vjust=0.4))
Finally, when you have names for the x-axis labels they tend to get jammed together, so I rotated the names -90°.

Gantt style time line plot (in base R)

I have a dataframe that looks like this:
person n start end
1 sam 6 0 6
2 greg 5 6 11
3 teacher 4 11 15
4 sam 4 15 19
5 greg 5 19 24
6 sally 5 24 29
7 greg 4 29 33
8 sam 3 33 36
9 sally 5 36 41
10 researcher 6 41 47
11 greg 6 47 53
Where start and end are times or durations (sam spoke from 0 to 6; greg from 6 to 11 etc.). n is how long (in this case # of words) the person spoke. I want to plot this as a time line in base R (I eventually may ask a similar question using ggplot2 but this answer is specific to base R [when I say base I mean the packages that come with a standard install]).
The y axis will be by person and the x axis will be time. Hopefully the final product looks something like this for the data above:
I would like to use base R to make this. I'm not sure how to approach this. My thoughts are to use a dot plot and plot a dotplot but leave out the dots. Then go over this with square end segments. I'm not sure about how this will work since the segments need numeric x and y points to make the segments and the y axis is categorical. Another thought is to convert the factors to numeric (assign each factor a number) and plot as a blank scatterplot and then go over with square end line segments. This could be a powerful tool in my field looking at speech patterns.
I thank you in advance for your help.
PS the argument for square ended line segments is segments(... , lend=2) to save time looking this information up for those not familiar with all the segment arguments.
You say you want a base R solution, but you don't say why. Since this is one line of code in ggplot, I show this anyway.
library(ggplot2)
ggplot(dat, aes(colour=person)) +
geom_segment(aes(x=start, xend=end, y=person, yend=person), size=3) +
xlab("Duration")
Pretty similar to #John's approach, but since I did it, I will post it :)
Here's a generic function to plot a gantt (no dependencies):
plotGantt <- function(data, res.col='resources',
start.col='start', end.col='end', res.colors=rainbow(30))
{
#slightly enlarge Y axis margin to make space for labels
op <- par('mar')
par(mar = op + c(0,1.2,0,0))
minval <- min(data[,start.col],na.rm=T)
maxval <- max(data[,end.col],na.rm=T)
res.colors <- rev(res.colors)
resources <- sort(unique(data[,res.col]),decreasing=T)
plot(c(minval,maxval),
c(0.5,length(resources)+0.5),
type='n', xlab='Duration',ylab=NA,yaxt='n' )
axis(side=2,at=1:length(resources),labels=resources,las=1)
for(i in 1:length(resources))
{
yTop <- i+0.1
yBottom <- i-0.1
subset <- data[data[,res.col] == resources[i],]
for(r in 1:nrow(subset))
{
color <- res.colors[((i-1)%%length(res.colors))+1]
start <- subset[r,start.col]
end <- subset[r,end.col]
rect(start,yBottom,end,yTop,col=color)
}
}
par(mar=op) # reset the plotting margins
}
Usage example:
data <- read.table(text=
'"person","n","start","end"
"sam",6,0,6
"greg",5,6,11
"teacher",4,11,15
"sam",4,15,19
"greg",5,19,24
"sally",5,24,29
"greg",4,29,33
"sam",3,33,36
"sally",5,36,41
"researcher",6,41,47
"greg",6,47,53',sep=',',header=T)
plotGantt(data, res.col='person',start.col='start',end.col='end',
res.colors=c('green','blue','brown','red','yellow'))
Result:
While the y-axis is categorical all you need to do is assign numbers to the categories (1:5) and track them. Using the default as.numeric() of the factor will usually number them alphabetically but you should check anyway. Make your plot with the xaxt = 'n' argument. Then use the axis() command to put in a y-axis.
axis(2, 1:5, myLabels)
Keep in mind that whenever you're plotting the only way to place things is with a number. Categorical x or y values are always just the numbers 1:nCategories with category name labels in place of the numbers on the axis.
Something like the following gets you close enough (assuming your data.frame object is called datf)...
datf$pNum <- as.numeric(datf$person)
plot(datf$pNum, xlim = c(0, 53), type = 'n', yaxt = 'n', xlab ='Duration (words)', ylab = 'person', main = 'Speech Duration')
axis(2, 1:5, sort(unique(datf$person)), las = 2, cex.axis = 0.75)
with(datf, segments(start, pNum, end, pNum, lwd = 3, lend=2))

Resources