ggplot for multiple values in the same row - r

I have a data frame with multiple values in the same row
index price
1 1000,2000,3000
2 2000,500
The data frame has 12 rows and not all price rows have equal length. I want to plot index vs price with index along x-axis and price along y-axis. I have the following code-
ggplot(data_m,
aes(x = 1:12,
y = data_m$price))
I get the error- Error: Aesthetics must be either length 1 or the same as the data (12): y
How do I plot every value in the price column?

Maybe you are looking for this. You have to reshape data and then look for the strategy to plot as mentioned by #TheSciGuy. Here a tidyverse approach using separate_rows() to split values in your rows and then a full_join() to compact with the index you wish. Next the code:
library(tidyverse)
#Data and plot
df %>% separate_rows(price,sep=',') %>%
mutate(price=as.numeric(price)) %>%
full_join(data.frame(index=1:12)) %>%
ggplot(aes(x=factor(index),y=price))+
geom_point()+
xlab('index')
Output:
Some data used:
#Data
df <- structure(list(index = 1:2, price = c("1000,2000,3000", "2000,500"
)), class = "data.frame", row.names = c(NA, -2L))
And if you want some color per index:
#Data and plot 2
df %>% separate_rows(price,sep=',') %>%
mutate(price=as.numeric(price)) %>%
full_join(data.frame(index=1:12)) %>%
ggplot(aes(x=factor(index),y=price,color=factor(index)))+
geom_point()+
xlab('index')+
theme(legend.position = 'none')
Output:

Related

Starting ggplot x-axis at 0 when there is no data with 0 values

I am trying to plot the number of unique detections per day throughout the year.
I had data that looked like this.
I summarized the number of unique detections per day using these codes
unique_day <- data %>% group_by(Day,tag) %>% filter(Date==min(Date)) %>% slice(1) %>% ungroup()
sum <- unique_day %>% group_by(Date) %>% summarise(Detections=n())
I then ended up with a dataframe like this
I then plot with this code
sum[year(sum$Date)==2016,] %>%
ggplot(aes(x=Date,y=Detections))+
theme_bw(base_size = 16,base_family = 'serif')+
theme(panel.grid.major = element_blank(),panel.grid.minor=element_blank())+
geom_line(size=1)+
scale_x_datetime(date_breaks = '1 month',date_labels = "%b",
limits = c(as.POSIXct('2016-01-01'),as.POSIXct("2016-12-01")))+
ggtitle('DIS 2016')+
theme(plot.title = element_text(hjust = 0.5))+
xlab("Date")+
scale_y_continuous(expand = c(0,0),limits = c(0,5))
And get a plot like this
I cant seem to get the plot to start at 0... I figure it always starts at 1 because there are no 0 values for detections... I only have a data frame summarizing the days when there was detections, not when there was not detections. I have tried using ylim, scale_y_continuous and coord_cartesian... any ideas?
Any ideas?
Here is a simple way to get to your problem:
df_null <- data.frame(Date = seq(as.Date("2015/01/01"), by = "day", length.out = 365),
Detections = 0)
For the year 2015 we create a data.frame containing all days with value 0. Suppose your data.frame looks like this:
df <- data.frame(Date = c(as.Date("2015/06/07"), as.Date("2015/06/08"), as.Date("2015/12/12")),
Detections = 1:3)
Using dplyr we combine those two data.frames and summarize the values:
df %>%
bind_rows(df_null) %>%
group_by(Date) %>%
summarise(Detections = sum(Detections))
Finally you can get your plot using your ggplot2-code.

Plot multiple lists on the same graph in r (scatter plot)

I was trying to plot a graph that looks like the below figure based on the code under it:
xAxisName <- c("ML", "MN")
car1 <- c(5,6)
names(car1) <- xAxisName
car2 <- c(5.5,6.2)
names(car2) <- xAxisName
car3 <- c(4.9, 5.4)
names(car3) <- xAxisName
The plot plots 2 car properties on the x axis and each property has 3 car values. But these are separate lists. How could this plot be plotted?
Get all the 'car' objects into a list, bind them with bind_rows and use ggplot, then pivot to 'long' format and use ggplot
library(ggplot2)
library(dplyr)
library(tidyr)
mget(ls(pattern = '^car\\d+$')) %>%
bind_rows(.id = 'car') %>%
pivot_longer(cols = -car) %>%
ggplot(aes(x = name, y = value, color = car)) +
geom_point()+
scale_y_continuous(expand = c(5, 6))

Avoid converting numbers to dates in plotly

I have a matrix that I want to create a heatmap for in plotly. the row names are assays and the colnames are CASRN and they are in this format "131-55-5"
my matrix looks like this
the data matrix for the heatmap
for some reason plotly thinks these are dates and converts them to something like March 2000 and gives me an empty plot.
before i convert my data frame to matrix i checked and all columns are factors.
is there any way I can make sure my numbers wont turn into dates when i plot my matrix?
this is the code i am using for my heatmap
plot_ly(x=colnames(dm_new2), y=rownames(dm_new2), z = dm_new2, type = "heatmap") %>%
layout(margin = list(l=120))
Using some random data to mimic your dataset. Simply put your matrix in a dataframe. Try this:
set.seed(42)
library(plotly)
library(dplyr)
library(tidyr)
dm_new2 <- matrix(runif(12), nrow = 4, dimnames = list(LETTERS[1:4], c("131-55-5", "113-48-4", "1582-09-8")))
# Put matrix in a dataframe
dm_new2 <- as.data.frame(dm_new2) %>%
# rownames to column
mutate(x = row.names(.)) %>%
# convert to long format
pivot_longer(-x, names_to = "y", values_to = "value")
dm_new2 %>%
plot_ly(x = ~x, y = ~y, z = ~value, type = "heatmap") %>%
layout(margin = list(l=120))
Created on 2020-04-08 by the reprex package (v0.3.0)

How to change values in a data.frame column into numbers?

I have the following (sample) data.frame
x <- data.frame(gene = 1:3, Sample1 = 5:7, Sample2 = 4:6, Sample3 = 6:8)
I want to change the column names and then use the numbers in the new titles as x-axis values for my plot
colnames(x) <- c("Gene", "HeLa_0.2", "HeLa_2.0", "HeLa_5.0")
x_gather <- x %>%
gather(key=treatment, value=values, -c(Gene)) %>%
tidyr::separate(treatment, into=c("Cell_line", "treatment"),sep="_")
ggplot()+
geom_line(x_gather, mapping=aes(treatment, y=values, group=Gene))
But I want the numbers to be spaced on an x-axis like this, instead of on an axis like this (which I get only if I copy my data to excel, format them as numbers, and then load it into R again...)
Any suggestions to how to solve this?
Thanks! :)
All you need to do is make the treatment variable numeric. For instance:
x_gather <- x %>%
gather(key=treatment, value=values, -c(Gene)) %>%
tidyr::separate(treatment, into=c("Cell_line", "treatment"),sep="_") %>%
mutate(treatment = as.numeric(treatment))
ggplot()+
geom_line(x_gather, mapping=aes(treatment, y=values, group=Gene))

How to group data and then draw bar chart in ggplot2

I have data frame (df) with 3 columns e.g.
NUMERIC1: NUMERIC2: GROUP(CHARACTER):
100 1 A
200 2 B
300 3 C
400 4 A
I want to group NUMERIC1 by GROUP(CHARACTER), and then calculate mean for each group.
Something like that:
mean(NUMERIC1): GROUP(CHARACTER):
250 A
200 B
300 C
Finally I'd like to draw bar chart using ggplot2 having GROUP(CHARACTER) on x axis a =nd mean(NUMERIC) on y axis.
It should look like:
I used
mean <- tapply(df$NUMERIC1, df$GROUP(CHARACTER), FUN=mean)
but I'm not sure if it's ok, and even if it's, I don't know what I supposed to do next.
This is what stat_summmary(...) is designed for:
colnames(df) <- c("N1","N2","GROUP")
library(ggplot2)
ggplot(df) + stat_summary(aes(x=GROUP,y=N1),fun.y=mean,geom="bar",
fill="lightblue",col="grey50")
Try something like:
res <- aggregate(NUMERIC1 ~ GROUP, data = df, FUN = mean)
ggplot(res, aes(x = GROUP, y = NUMERIC1)) + geom_bar(stat = "identity")
data
df <- structure(list(NUMERIC1 = c(100L, 200L, 300L, 400L), NUMERIC2 = 1:4,
GROUP = structure(c(1L, 2L, 3L, 1L), .Label = c("A", "B",
"C"), class = "factor")), .Names = c("NUMERIC1", "NUMERIC2",
"GROUP"), class = "data.frame", row.names = c(NA, -4L))
I'd suggest something like:
#Imports; data.table, which allows for really convenient "apply a function to
#"each part of a df, by unique value", and ggplot2
library(data.table)
library(ggplot2)
#Convert df to a data.table. It remains a data.frame, so any function that works
#on a data.frame can still work here.
data <- as.data.table(df)
#By each unique value in "CHARACTER", subset and calculate the mean of the
#NUMERIC1 values within that subset. You end up with a data.frame/data.table
#with the columns CHARACTER and mean_value
data <- data[, j = list(mean_value = mean(NUMERIC1)), by = "CHARACTER"]
#And now we play the plotting game (the plotting game is boring, lets
#play Hungry Hungry Hippos!)
plot <- ggplot(data, aes(CHARACTER, mean_value)) + geom_bar()
#And that should do it.
Here's a solution using dplyr to create the summary. In this case, the summary is created on the fly within ggplot, but you can also create a separate summary data frame first and then feed that to ggplot.
library(dplyr)
library(ggplot2)
ggplot(df %>% group_by(GROUP) %>%
summarise(`Mean NUMERIC1`=mean(NUMERIC1)),
aes(GROUP, `Mean NUMERIC1`)) +
geom_bar(stat="identity", fill=hcl(195,100,65))
Since you're plotting means, rather than counts, it might make more sense use points, rather than bars. For example:
ggplot(df %>% group_by(GROUP) %>%
summarise(`Mean NUMERIC1`=mean(NUMERIC1)),
aes(GROUP, `Mean NUMERIC1`)) +
geom_point(pch=21, size=5, fill="blue") +
coord_cartesian(ylim=c(0,310))
Why ggplot when you could do the same with your own code and barplot:
barplot(tapply(df$NUMERIC1, df$GROUP, FUN=mean))

Resources