Adding character values of a column in R - r

I have two columns i.e. square_id & Smart_Nsmart as given below.
I want to count(add) N's and S's against each square_id and ggplot the data i.e. plot square_id vs Smart_Nsmart.
square_id 1
1
2 2 2 2 3 3 3 3
Smart_Nsmart
S N N N S S N S S S

We can use count and then use ggplot to plot the frequency. Here, we are plotting it with geom_bar (as it is not clear from the OP's post)
library(dplyr)
library(ggplot2)
df %>%
count(square_id, Smart_Nsmart) %>%
ggplot(., aes(x= square_id, y = n, fill = Smart_Nsmart)) +
geom_bar(stat = 'identity')

The above answer is very smart. However, instead of count function, you can implement group_by and summarise just in case in future you want to apply some other functions to your code.
library(dplyr)
library(ggplot2)
dff <- data.frame(a=c(1,1,1,1,2,1,2),b=c("C","C","N","N","N","C","N"))
dff %>%
group_by(a,b) %>%
summarise(n = length(b) ) %>%
ggplot(., aes(x= a, y = n, fill = b)) +
geom_bar(stat = 'identity')

Related

R geom_point() number of points reflect value in column

Say I have mydf, a dataframe which is as follows:
Name
Value
Mark
101
Joe
121
Bill
131
How would I go about creating a scatterplot in ggplot that takes the data in the value column (e.g., 101) and makes that number of points on a chart? Would this be a stat = that I am unfamiliar with, or would I have to structure the data such that Mark, for example, has 101 unique rows, Joe has 121, etc.?
Update: As suggest by Ben Bolker (many thanks) we could set the width of geom_jitter additionally we could add some colour asthetics:
df %>%
group_by(Name) %>%
complete(Value = 1:Value) %>%
ggplot(aes(x=Name, y=Value, colour=Name))+
geom_jitter(width = 0.1)
OR more compact as suggested by Henrik (many thanks) using uncount:
ggplot(uncount(df, Value, .id = "y"), aes(x = Name, y = y)) + ...
First answer:
Something like this?
library(dplyr)
library(ggplot2)
library(tidyr) # complete
df %>%
group_by(Name) %>%
complete(Value = 1:Value) %>%
ggplot(aes(x=Name, y=Value))+
geom_jitter()

summarise column and add it's common values in R

I have a dataframe something like this and my end goal is to make a bar chart.
Here is the data frame.
a 5
a 7
b 23
b 12
c 21
c 21
c 27
I want to summarize the dataframe with the first column but want to add the values of the 2nd column and make a bar chart for the values of 2nd column. The resulting data frame should be :
a 12
b 35
c 69
I tried something like this but it does not work:
d %>%
group_by(V1) %>%
summarise(V2) %>%
ggplot(aes(x = V1, y = V2)) + geom_col()+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
A simple base R option using barplot + aggregate
barplot(SumValue ~ ., aggregate(cbind(SumValue = Value) ~ ., df, sum))
Seems to be pretty straightforward. Let me know if this helps.
library(dplyr)
library(ggplot2)
#Converting your values into a dataframe
data <- data.frame("Key" = c("a","a","b","b","c","c","c"), "Value" = c(5,7,23,12,21,21,27))
data <- data %>%
group_by(Key) %>%
summarise(Value = sum(Value))
#Plot
ggplot(data, aes(x=Key, y=Value))+
geom_bar(stat="identity")

How to `table` all factors columns and ggplot geom_bar with facet_wrap?

df <- data.frame(
cola = c('1',NA,'c','1','1','e','1',NA,'c','d'),
colb = c("a",NA,"c","d",'a','b','c','d','c','d'),
colc = c('a',NA,'1','d','a',NA,'c',NA,'c','d'),stringsAsFactors = TRUE)
table(df$cola)
Output of above R script is:
1 c d e
4 2 1 1
We can use geom_bar(stat = "identity"..., in ggplot to plot bar like:
How to use ggplot geom_bar with facet_wrap to one-time plot cola,colb,colc as below?
We gather the columns to 'long' format and then do the ggplot
library(tidyverse)
df %>%
# gather to long format
gather(na.rm = TRUE) %>%
# get the frequency count of key, value columns
count(key, value) %>%
ggplot(., aes(x = value, y = n)) +
geom_bar(stat = "identity") +
# facet wrap with key column
facet_wrap(~ key)
Try this
library(tidyverse)
df %>%
map(function(x){as.data.frame(table(x))}) %>%
bind_rows(.id = "variable") %>%
ggplot(aes(x = x, y = Freq)) +
geom_col() +
facet_wrap(~variable)

ggplot2() bar chart and dplyr() grouped and overall data in R

I'd like to make a stacked proportional bar chart representing the prevalence of diabetes in a cohort of individuals residing in towns A, B, and C. I'd also like the plot to feature a bar representing the entire cohort.
I'm happy with the below plot, but I'd like to know if there is a way of incorporating the pre-processing step into the processing step, ie piping it with dplyr()?
Thanks!
Starting point (df):
dfa <- data.frame(town=c("A","A","A","B","B","C","C","C","C","C"),diabetes=c("y","y","n","n","y","n","y","n","n","y"),heartdisease=c("n","y","y","n","y","y","n","n","n","y"))
Pre-processing:
dfb <- rbind(dfa, transform(dfa, town = "ALL"))
Processing and plot:
library(dplyr)
library(ggplot)
dfc <- dfb %>%
group_by(town) %>%
count(diabetes) %>%
mutate(prop = n / sum(n))
ggplot(dfc, aes(x = town, y = prop, fill = diabetes)) +
geom_bar(stat = "identity") +
coord_flip()
Like this:
dfc <- dfa %>%
bind_rows(dfa %>%
mutate(town = "ALL")) %>%
group_by(town) %>%
count(diabetes) %>%
mutate(prop = n / sum(n)) %>%
ggplot(aes(x = town, y = prop, fill = diabetes)) +
geom_bar(stat = "identity") +
coord_flip()
EDIT: added pre-processing into pipeline using bind_rows and mutate instead of rbind and transform

How to Plot Every Column in Descending Order in R

I intend to plot every categorical column in the dataframe in a descending order depends on the frequency of levels in a variable.
I have already found out how to plot every column and reorder the levels, but I cannot figure out how to combine them together. Could you please give me some suggestions?
Code for plot every column:
require(purrr)
library(tidyr)
library(ggplot2)
diamonds %>%
keep(is.factor) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_bar()
Code for reorder the levels of one variable:
tb <- table(x)
factor(x, levels = names(tb[order(tb, decreasing = TRUE)]))
BTW, if you feel there is a better way writing these codes, please let me know.
Thanks.
Alternative 1
No need to use gridExtra to emulate facet_wrap, just include the function reorder_size inside aes:
reorder_size <- function(x) {
factor(x, levels = names(sort(table(x), decreasing = TRUE)))
}
diamonds %>%
keep(is.factor) %>%
gather() %>%
ggplot(aes(x = reorder_size(value))) +
facet_wrap(~ key, scales = "free") +
geom_bar()
Alternative 2
Using dplyrto calculate the count grouping by key and value. Then we reorder the value in descending order by count inside aes.
library(dplyr)
diamonds %>%
keep(is.factor) %>%
gather() %>%
group_by(key,value) %>%
summarise(n = n()) %>%
ggplot(aes(x = reorder(value, -n), y = n)) +
facet_wrap(~ key, scales = "free") +
geom_bar(stat='identity')
Output
The problem with your approach is that the long form of your data-frame will introduce a lot of factors that would be plotted as 0 for the geom_bar().
Instead of relying on facet_wrap and dealing with the long data-form, here's an alternative.
Reordering by size function:
reorder_size <- function(x) {
factor(x, levels = names(sort(table(x), decreasing=T)))
}
Using gridExtra::grid.arrange function to deliver similar facet_wrap style figure:
library(gridExtra)
a <- ggplot(diamonds, aes(x=reorder_size(cut))) + geom_bar()
b <- ggplot(diamonds, aes(x=reorder_size(color))) + geom_bar()
c <- ggplot(diamonds, aes(x=reorder_size(clarity))) + geom_bar()
grid.arrange(a,b,c, nrow=1)

Resources