Plot only specific dataframe rows that matches a criteria in R - r

I have a data frame built like this:
Id Client data
1 5 25
2 8 63
3 13 42
4 5 87
5 8 35
and a array: clients <- c(5,8)
I need to plot a different histogram(of the data column) for each client that is in the "clients" array. In this example i would plot histogram for the client 5 with two bars(25,87) and one for the client 8 also with two bars(63,35). I think that i need to use the facet_wrap function to plot a histogram for each client, i also tried to do something like a for plotting for each client but didn't worked. I'm not sure about how i can do it so any help would be great!

Seems like you just didn't do enough data-wrangling. Also, from your description, you need barplot, not a histogram (which would report counts of particular values in data, not their value).
This is a solution in base.
dt = data.frame("id" = 1:5, "client" = c(5,8,13,5,8), "data"=c(25,63,42,87,35))
clients = c(5,8,13) # for particular clients, or unique(dt$client) for all clients
# get data for every client
lst = lapply(clients, function(x){dt[dt$client == x, "data"]})
# unify length and transform into a matrix
len = max(sapply(lst, length))
mat = do.call(cbind, lapply(lst, "[", seq_len(len)))
# Put some nice legend
colnames(mat) = paste("Client", clients)
# plot this matrix with barplot
barplot(mat, beside=TRUE, las=1)

You can plot on the same graph if there are limited number of clients.
library(dplyr)
library(ggplot2)
df %>%
filter(Client %in% clients) %>%
group_by(Client) %>%
mutate(Id = factor(row_number())) %>%
ggplot() + aes(Client, data, fill = Id) +
geom_bar(stat = 'identity', position = 'dodge')
With facets :
df %>%
filter(Client %in% clients) %>%
group_by(Client) %>%
mutate(Id = factor(row_number())) %>%
ggplot() + aes(Client, data, fill = Id) +
geom_bar(stat = 'identity', position = 'dodge') +
facet_wrap(~Client, scales = 'free_x')
data
df <- structure(list(Id = 1:5, Client = c(5L, 8L, 13L, 5L, 8L), data = c(25L,
63L, 42L, 87L, 35L)), class = "data.frame", row.names = c(NA, -5L))
clients <- c(5,8)

Related

How do I construct a crosswalk table with multiple crosswalks with the ggplot2 package?

I wanna present my crosswalk results for 5 different crosswalks in a combined table with the ggplot2 package.
I've created a data.frame with all results that need to be displayed:
crosswalk <- data.frame(subset(fsdiscDET,, (1:2)),(subset(fsdiscDIS,, 2)),
(subset(fsdiscANT,, 2)), (subset(fsdiscPSY,, 2)),(subset(fsdiscANPICD,, 2)),
(subset(fsdiscANPID5,,2)))
#Define Column names for the data frame named "crosswalk
colnames(crosswalk) <- c("SumScore", "ThetaDET", "ThetaDIS","ThetaANT", "ThetaPSY",
"ThetaANPiCD", "ThetaPID5BF+M")
The table is constructed like this:
SumScores ThetaDET ThetaDIS ThetaANT ThetaPSY ThetaANPiCD ThetaPID5BF+M
0 -2 ... ....
1
2
3
4
5
6
7
8
Unfortunately, I can't show my real results but the table is filled with scores, that should be displayed as a crosswalk from the sum scores, so here is some example data: (first row)
> dput(head(crosswalk, 1))
structure(list(SumScore = 0L, ThetaDET = -0.880871248855981,
ThetaDIS = -0.594351208632866, ThetaANT = -0.463495518249115,
ThetaPSY = -0.471562212797643, ThetaANPiCD =
-0.850865132469677,
`ThetaPID5BF+M` = -0.91391979254119), row.names = 0L, class =
"data.frame")
Here is an example of what I want to create: example
In my case, the different "columns" of the example would be the sum scores (0 to 8) of the table I have created above. The crosswalk would than be to place the sum scores on the y-axis (Theta), where the corresponding Theta score would be. So the different columns like ThetaDET and ThetaDIS are all filled with values from -3 to 4 and should be represented at the left y-axis of the graphic.
Does someone have an idea how to do that?
Here's an example with the mtcars dataset. We can reshape long, then scale within each variable, and plot:
library(tidyverse)
mtcars %>%
rownames_to_column() %>%
pivot_longer(-rowname) %>%
group_by(name) %>%
mutate(scaled = as.numeric(scale(value))) %>%
ungroup() %>%
ggplot(aes(name, scaled, label = value, color = name)) +
geom_point(shape = "-", size = 7) +
geom_text(hjust = -0.5, size = 2, alpha = 0.7, check_overlap =TRUE) +
scale_x_discrete(position = "top", name = NULL) +
guides(color = "none") +
theme_minimal()

ggplot2 R charts

I'm new to R and started with basic plots. I have used a simple excel table which has the following columns- Family, Family Name, Product.ID, Sales, Time, Value
The table is similar to the below snapshot
I used the code below but for some reason the when I try to group the chart by Family Name it is just showing a single color and even the legends is not appearing for all values in Family Name
library("ggplot2")
library("data.table")
library("readxl")
Data<-read.xlsx("C:/Users/vc/Desktop/Sales Data.xlsx")
setDT(Data)
Plot<-ggplot(data = Data,
aes(x=Time,y=Value,group='Family Name',shape='Family Name',color='Family Name')) +
geom_line() +
geom_point()
I have attached the output graph and it doesn't make any sense.
Do you want this?
For depiction of date on x axis you need to convert variable to as.Date() first
library(tidyverse)
df <- data.frame(
stringsAsFactors = FALSE,
Family = c("19A", "19A", "19A", "19A", "19B", "19B", "19B", "19B"),
Family.Name = c("A Box","A Box","A Box",
"A Box","B Box","B Box","B Box","B Box"),
Product.ID = c("AA980","AA980","AA980",
"AA980","AL345","AL345","AL345","AL345"),
Sales = c("actualsalesunit",
"actualsalesunit","actualsalesunit","actualsalesunit",
"actualsalesunit","actualsalesunit","actualsalesunit",
"actualsalesunit"),
Times = c("01-01-2021","01-02-2021",
"01-03-2021","01-04-2021","01-01-2021","01-02-2021",
"01-03-2021","01-04-2021"),
value = c(63L, 34L, 93L, 99L, 1L, 0L, 0L, 0L)
)
df %>%
mutate(Times = as.Date(Times, '%d-%m-%Y')) %>%
ggplot(aes(x= Times, y = value, color = Family.Name, label = value)) +
geom_line() +
geom_point() +
geom_text(size = 3.2, position =position_dodge(0.9), vjust = -0.5)
Created on 2021-05-27 by the reprex package (v2.0.0)

Draw a line between 2 zip code on US map in R

I have been trying to draw a straight line connecting 2 zip code in the US. So far I've only been able to plot points of each zip code in the US, but can't draw the line between them
Here is what my data look like. I am trying to draw 8 points on the US map and connect them, with 4 lines. I tried using zip.plot but it would only point points, not drawing a line
df1 <- data.frame(trip = c(1,2,3,4), zip1 = c(55803,87112,55107,66006), zip2=c(12909,93703,12205,78210))
df1
trip zip1 zip2
1 1 55803 12909
2 2 87112 93703
3 3 55107 12205
4 4 66006 78210
Take a look at the code for zip.plot function, and you'll see it is straightforward. It will merge your zip code data with longitude and latitude data from data(zips). You'll notice it will plot points, but no function to connect them, and it doesn't return points plotted.
You could adapt a similar function that meets your needs. If you include library(muRL) you can load zip data by data(zips). After plotting the points, you can add lines to connect them based on trip variable.
For example, create a new function zip.plot.new:
library(muRL)
data(zips)
zip.plot.new <- function(data, map.type = "state", ...){
data.z <- merge(data, zips[,c("zip", "lat", "lon")], by.x = "zip", by.y = "zip", all.x = TRUE)
maps::map(map.type, ...)
points(data.z$lon, data.z$lat, cex = 1, col = "black", pch = 20)
mapply(lines, split(data.z$lon, data.z$trip), split(data.z$lat, data.z$trip))
}
This includes mapply(lines... to connect points by trip.
Then, you can use your data frame, convert to longer form, and call this new function:
library(tidyverse)
df1 %>%
pivot_longer(cols = starts_with("zip_"), names_to = c(".value", "group"), names_sep = "_") %>%
zip.plot.new(.)
Note that zip code 12909 was not matched in the data (appears not valid?).
Data
df1 <- data.frame(trip = c(1,2,3,4),
zip_1 = c("55803","87112","55107","66006"),
zip_2 = c("12909","93703","12205","78210"))
Edit: Here's a ggplot version:
library(ggmap)
library(maps)
library(ggplot2)
library(tidyverse)
MainStates <- map_data("state")
point_data <- df1 %>%
pivot_longer(cols = starts_with("zip_"), names_to = c(".value", "group"), names_sep = "_") %>%
mutate(zip = factor(zip, levels = levels(zips$zip))) %>%
left_join(zips)
ggplot() +
geom_polygon(data=MainStates, aes(x=long, y=lat, group=group), color = "black", fill = "white") +
geom_point(data = point_data, aes(x = lon, y = lat, group = trip)) +
geom_line(data = point_data, aes(x = lon, y = lat, group = trip)) +
coord_fixed(1.3) +
theme_nothing()

overlay/superimpose grouped bar plots in ggplot2

I'd like to make a bar plot featuring an overlay of data from two time points, 'before' and 'after'.
At each time point, participants were asked two questions ('pain' and 'fear'), which they would answer by stating a score of 1, 2, or 3.
My existing code plots the counts for the data from the 'before' time point nicely, but I can't seem to add the counts for the 'after' data.
This is a sketch of what I'd like the plot to look like with the 'after' data added, with the black bars representing the 'after' data:
I'd like to make the plot in ggplot2() and I've tried to adapt code from How to superimpose bar plots in R? but I can't get it to work for grouped data.
Many thanks!
#DATA PREP
library(dplyr)
library(ggplot2)
library(tidyr)
df <- data.frame(before_fear=c(1,1,1,2,3),before_pain=c(2,2,1,3,1),after_fear=c(1,3,3,2,3),after_pain=c(1,1,2,3,1))
df <- df %>% gather("question", "answer_option") # Get the counts for each answer of each question
df2 <- df %>%
group_by(question,answer_option) %>%
summarise (n = n())
df2 <- as.data.frame(df2)
df3 <- df2 %>% mutate(time = factor(ifelse(grepl("before", question), "before", "after"),
c("before", "after"))) # change classes and split data into two data frames
df3$n <- as.numeric(df3$n)
df3$answer_option <- as.factor(df3$answer_option)
df3after <- df3[ which(df3$time=='after'), ]
df3before <- df3[ which(df3$time=='before'), ]
# CODE FOR 'BEFORE' DATA ONLY PLOT - WORKS
ggplot(df3before, aes(fill=answer_option, y=n, x=question)) + geom_bar(position="dodge", stat="identity")
# CODE FOR 'BEFORE' AND 'AFTER' DATA PLOT - DOESN'T WORK
ggplot(mapping = aes(x, y,fill)) +
geom_bar(data = data.frame(x = df3before$question, y = df3before$n, fill= df3before$index_value), width = 0.8, stat = 'identity') +
geom_bar(data = data.frame(x = df3after$question, y = df3after$n, fill=df3after$index_value), width = 0.4, stat = 'identity', fill = 'black') +
theme_classic() + scale_y_continuous(expand = c(0, 0))
I think the clue is to set the width of the "after" bars, but to dodge them as if their width are 0.9 (i.e. the same (default) width as the "before" bars). In addition, because we don't map fill of the "after" bars, we need to use the group aesthetic instead to achieve the dodging.
I prefer to have only one data set and just subset it in each call to geom_col.
ggplot(mapping = aes(x = question, y = n, fill = factor(ans))) +
geom_col(data = d[d$t == "before", ], position = "dodge") +
geom_col(data = d[d$t == "after", ], aes(group = ans),
fill = "black", width = 0.5, position = position_dodge(width = 0.9))
Data:
set.seed(2)
d <- data.frame(t = rep(c("before", "after"), each = 6),
question = rep(c("pain", "fear"), each = 3),
ans = 1:3, n = sample(12))
Alternative data preparation using data.table, starting with your original 'df':
library(data.table)
d <- melt(setDT(df), measure.vars = names(df), value.name = "ans")
d[ , c("t", "question") := tstrsplit(variable, "_")]
Either pre-calculate the counts and proceed as above with geom_col
# d2 <- d[ , .N, by = .(question, ans)]
Or let geom_bar do the counting:
ggplot(mapping = aes(x = question, fill = factor(ans))) +
geom_bar(data = d[d$t == "before", ], position = "dodge") +
geom_bar(data = d[d$t == "after", ], aes(group = ans),
fill = "black", width = 0.5, position = position_dodge(width = 0.9))
Data:
df <- data.frame(before_fear = c(1,1,1,2,3), before_pain = c(2,2,1,3,1),
after_fear = c(1,3,3,2,3),after_pain = c(1,1,2,3,1))
My solution is very similar to #Henrik's, but I wanted to point out a few things.
First, you're building your data frames inside your geom_cols, which is probably messier than you need it to be. If you've already created df3after, etc., you might as well use it inside your ggplot.
Second, I had a hard time following your tidying. I think there are a couple tidyr functions that might make this task easier on you, so I went a different route, such as using separate to create the columns of time and measure, rather than essentially searching for them manually, making it more scalable. This also lets you put "pain" and "fear" on your x-axis, rather than still having "before_pain" and "before_fear", which are no longer accurate representations once you have "after" values on the plot as well. But feel free to disregard this and stick with your own method.
library(tidyverse)
df <- data.frame(before_fear = c(1,1,1,2,3),
before_pain = c(2,2,1,3,1),
after_fear = c(1,3,3,2,3),
after_pain = c(1,1,2,3,1))
df_long <- df %>%
gather(key = question, value = answer_option) %>%
mutate(answer_option = as.factor(answer_option)) %>%
count(question, answer_option) %>%
separate(question, into = c("time", "measure"), sep = "_", remove = F)
df_long
#> # A tibble: 12 x 5
#> question time measure answer_option n
#> <chr> <chr> <chr> <fct> <int>
#> 1 after_fear after fear 1 1
#> 2 after_fear after fear 2 1
#> 3 after_fear after fear 3 3
#> 4 after_pain after pain 1 3
#> 5 after_pain after pain 2 1
#> 6 after_pain after pain 3 1
#> 7 before_fear before fear 1 3
#> 8 before_fear before fear 2 1
#> 9 before_fear before fear 3 1
#> 10 before_pain before pain 1 2
#> 11 before_pain before pain 2 2
#> 12 before_pain before pain 3 1
I split this into before & after datasets, as you did, then plotted them with 2 geom_cols. I still put df_long into ggplot, treating it almost as a dummy to get uniform x and y aesthetics. Like #Henrik said, you can use different width in the geom_col and in its position_dodge to dodge the bars at a width of 90% but give the bars themselves a width of only 40%.
df_before <- df_long %>% filter(time == "before")
df_after <- df_long %>% filter(time == "after")
ggplot(df_long, aes(x = measure, y = n)) +
geom_col(aes(fill = answer_option),
data = df_before, width = 0.9,
position = position_dodge(width = 0.9)) +
geom_col(aes(group = answer_option),
data = df_after, fill = "black", width = 0.4,
position = position_dodge(width = 0.9))
What you could instead of making the two separate data frames is to filter inside each geom_col. This is generally my preference unless the filtering is more complex. This code will get the same plot as above.
ggplot(df_long, aes(x = measure, y = n)) +
geom_col(aes(fill = answer_option),
data = . %>% filter(time == "before"), width = 0.9,
position = position_dodge(width = 0.9)) +
geom_col(aes(group = answer_option),
data = . %>% filter(time == "after"), fill = "black", width = 0.4,
position = position_dodge(width = 0.9))

How to group data and then draw bar chart in ggplot2

I have data frame (df) with 3 columns e.g.
NUMERIC1: NUMERIC2: GROUP(CHARACTER):
100 1 A
200 2 B
300 3 C
400 4 A
I want to group NUMERIC1 by GROUP(CHARACTER), and then calculate mean for each group.
Something like that:
mean(NUMERIC1): GROUP(CHARACTER):
250 A
200 B
300 C
Finally I'd like to draw bar chart using ggplot2 having GROUP(CHARACTER) on x axis a =nd mean(NUMERIC) on y axis.
It should look like:
I used
mean <- tapply(df$NUMERIC1, df$GROUP(CHARACTER), FUN=mean)
but I'm not sure if it's ok, and even if it's, I don't know what I supposed to do next.
This is what stat_summmary(...) is designed for:
colnames(df) <- c("N1","N2","GROUP")
library(ggplot2)
ggplot(df) + stat_summary(aes(x=GROUP,y=N1),fun.y=mean,geom="bar",
fill="lightblue",col="grey50")
Try something like:
res <- aggregate(NUMERIC1 ~ GROUP, data = df, FUN = mean)
ggplot(res, aes(x = GROUP, y = NUMERIC1)) + geom_bar(stat = "identity")
data
df <- structure(list(NUMERIC1 = c(100L, 200L, 300L, 400L), NUMERIC2 = 1:4,
GROUP = structure(c(1L, 2L, 3L, 1L), .Label = c("A", "B",
"C"), class = "factor")), .Names = c("NUMERIC1", "NUMERIC2",
"GROUP"), class = "data.frame", row.names = c(NA, -4L))
I'd suggest something like:
#Imports; data.table, which allows for really convenient "apply a function to
#"each part of a df, by unique value", and ggplot2
library(data.table)
library(ggplot2)
#Convert df to a data.table. It remains a data.frame, so any function that works
#on a data.frame can still work here.
data <- as.data.table(df)
#By each unique value in "CHARACTER", subset and calculate the mean of the
#NUMERIC1 values within that subset. You end up with a data.frame/data.table
#with the columns CHARACTER and mean_value
data <- data[, j = list(mean_value = mean(NUMERIC1)), by = "CHARACTER"]
#And now we play the plotting game (the plotting game is boring, lets
#play Hungry Hungry Hippos!)
plot <- ggplot(data, aes(CHARACTER, mean_value)) + geom_bar()
#And that should do it.
Here's a solution using dplyr to create the summary. In this case, the summary is created on the fly within ggplot, but you can also create a separate summary data frame first and then feed that to ggplot.
library(dplyr)
library(ggplot2)
ggplot(df %>% group_by(GROUP) %>%
summarise(`Mean NUMERIC1`=mean(NUMERIC1)),
aes(GROUP, `Mean NUMERIC1`)) +
geom_bar(stat="identity", fill=hcl(195,100,65))
Since you're plotting means, rather than counts, it might make more sense use points, rather than bars. For example:
ggplot(df %>% group_by(GROUP) %>%
summarise(`Mean NUMERIC1`=mean(NUMERIC1)),
aes(GROUP, `Mean NUMERIC1`)) +
geom_point(pch=21, size=5, fill="blue") +
coord_cartesian(ylim=c(0,310))
Why ggplot when you could do the same with your own code and barplot:
barplot(tapply(df$NUMERIC1, df$GROUP, FUN=mean))

Resources