I have a dataframe of tempratures where each column represents a year from 1996 to 2015 and rows are data from 1-Jul to Oct-31:
head(df)
[![Dataframe head][1]][1]
I am trying to create a line plot with x= DAYS and y=temp per year. when I use DAYS in the loop, either with aes() or aes_strint() it doesn't produce anything:
iterator <- c(colnames(df))[-1]
g <- ggplot(df, aes_string(x = 'DAY'))
for (i in iterator){
g <- g+ geom_line(aes_string(y=i))
}
print(g)
so I added an index column which is just integers from 1 to 123. Now the same code plots a bunch of lines but very strange:
df$index <- c(1:123)
iterator <- c(colnames(df))[-1]
iterator <- iterator[-21]
g <- ggplot(df, aes_string(x = 'index'))
for (i in iterator){
g <- g+ geom_line(aes_string(y=i))
}
print(g)
[![Final plot][2]][2]
as you can see, I have one line per column name and all the Colum names are stacking above each other. This has compressed the vertical axis so much that the variations in temperature is not visible. I wish my y-axis just goes from 50 to 100 and there will be one line per column name there with the same scale as other columns. How do I do that?
[1]: https://i.stack.imgur.com/ruF11.png
[2]: https://i.stack.imgur.com/gAvMe.png
Agree with Andrew's solution. Just a minor change: you have to remove the "df" on 3rd line as you declared it already in the beginning.
df %>%
tidyr::pivot_longer(!DAYS, names_to = "column", values_to = "temp") %>%
ggplot(aes(x = DAYS, y = temp, group = column)) +
geom_line()
I think you could rearrange your data frame, e.g. using the tidyr package, so that you have a data frame with "year", "day" and "temp" columns
library(ggplot2)
library(tidyr)
year1 = c(5,6,4,5)
year2 = c(6,5,5,6)
year3 = c(3,4,3,4)
date = c("a", "b", "c", "d")
data = data.frame(date, year1, year2, year3)
data2 = gather(data , "year", "temp", -date)
Then, you can easily plot the temperature per year.
ggplot(data2, aes(x = date, y = temp, group = year, color = year))+
geom_path()
If you're doing something with loops in R, especially with ggplot2, you are probably doing something wrong. I'm not 100% sure why you're looping at all, when you probably want to do something more like,
df %>%
tidyr::pivot_longer(!DAYS, names_to = "column", values_to = "temp") %>%
ggplot(df, aes(x = day, y = temp, group = column)) +
geom_line()
but without a reprex / data set I can't be sure if that's what you want.
Related
I am trying to plot the Id column with some other variables which I have managed to do with geom_col but when my plot is retrieved I can see that R is taking the column "Id" as a factor or number and I am not getting the results I am looking for, here the graph:
How can I convert the column into a string so that it actually allow me to plot all the users that participated in the survey which are 33? Here is where I'm coming from:
activity_distance <-
merged_activity_calories %>%
group_by(Id) %>%
summarise(
mean_activity_distance= mean(VeryActiveDistance),
mean_ma_distance= mean(ModeratelyActiveDistance),
mean_la_distance= mean(LightActiveDistance),
mean_sa_distance= mean(SedentaryActiveDistance)
)
ggplot(data= activity_distance) +
geom_col(mapping= aes(x=Id , y= mean_activity_distance))
You can convert ID into a factor.
library(ggplot)
df <- data.frame(id = c(1232121321321321,123213213213,123213213213213),
y = c(123,234,22.4))
ggplot(df) +
geom_col(mapping = aes(x = (id), y = y))
ggplot(df) +
geom_col(mapping = aes(x = factor(id), y = y))
I have a dataframe like this:
library(tidyverse)
my_data <- tibble(name = c("Justin", "Janet", "Marisa"),
x = c(100, 50, 75),
y = c(2, 3, 6))
Each name is unique, and I want to make a bar graph for each person without having to do it line by line. I also want to save each plot as a unique object because I'll be inputting it into a power point using the officer package. Last, the names won't always be the same, but each name will always be unique.
For instance, I want one plot for Janet, one plot for Justin, and one plot for Marisa. I don't want them faceted but instead as their own objects.
Any thoughts?
We can get the data in long format first and for each individual name create the plot.
library(tidyverse)
long_data <- my_data %>% tidyr::pivot_longer(cols = -name, names_to = 'col')
plots_list <- map(unique(my_data$name), ~long_data %>%
filter(name == .x) %>%
ggplot() + aes(name, value, fill = col) +
geom_bar(stat = 'identity', position = 'dodge') +
scale_fill_manual(values = c('red', 'blue')) +
ggtitle(paste0('Plot for ', .x)))
This will return list of plots where individual plots can be accessed via plots_list[[1]], plots_list[[2]] etc.
plots_list[[1]]
i have a dataframe structured like this
Elem. Category. SEZa SEZb SEZc
A. ONE. 1. 3. 4
B. TWO. 4. 5. 6
i want to plot three histograms in three different facets (SEZa, SEZb, SEZc) with ggplot where the x values are the category values (ONE. e TWO.) and the y values are the number present in columns SEZa, SEZb, SEZc.
something like this:
how can I do? thank you for your suggestions!
Assume df is your data.frame, I would first convert from wide format to a long format:
new_df <- reshape2::melt(df, id.vars = c("Elem", "Category"))
And then make the plot using geom_col() instead of geom_histogram() because it seems you've precomputed the y-values and wouldn't need ggplot to calculate these values for you.
ggplot(new_df, aes(x = Category, y = value, fill = Elem)) +
geom_col() +
facet_grid(variable ~ .)
I think that what you are looking for is something like this :
library(ggplot2)
library(reshape2)
df <- data.frame(Category = c("One", "Two"),
SEZa = c(1, 4),
SEZb = c(3, 5),
SEZc = c(4, 6))
df <- melt(df)
ggplot(df, aes(x = Category, y = value)) +
geom_col(aes(fill = variable)) +
facet_grid(variable ~ .)
My inspiration is :
http://felixfan.github.io/stacking-plots-same-x/
I want to plot the cumulative counts of level OK of factor X (*), over time (column Date). I am not sure what is the best strategy, whether or not I should create a new data frame with a summary column, or if there is a ggplot2 way of doing this.
Sample data
DF <- data.frame(
Date = as.Date(c("2018-01-01", "2018-01-01", "2018-02-01", "2018-03-01", "2018-03-01", "2018-04-01") ),
X = factor(rep("OK", 6), levels = c("OK", "NOK")),
Group = factor(c(rep("A", 4), "B", "B"))
)
DF <- rbind(DF, list(as.Date("2018-02-01"), factor("NOK"), "A"))
From similar questions I tried this:
ggplot(DF, aes(Date, col = Group)) + geom_line(stat='bin')
Using stat='count' (as the answer to this question) is even worse:
ggplot(DF, aes(Date, col = Group)) + geom_line(stat='count')
which shows the counts for factor levels (*), but not the accumulation over time.
Desperate measure - count with table
I tried creating a new data frame with counts using table like this:
cum <- as.data.frame(table(DF$Date, DF$Group))
ggplot(cum, aes(Var1, cumsum(Freq), col = Var2, group = Var2)) +
geom_line()
Is there a way to do this with ggplot2? Do I need to create a new column with cumsum? If so, how should I cumsum the factor levels, by date?
(*) Obs: I could just filter the data frame to use only the intended levels with DF[X == "OK"], but I am sure someone can find a smarter solution.
One option using dplyr and ggplot2 can be as:
library(dplyr)
library(ggplot2)
DF %>% group_by(Group) %>%
arrange(Date) %>%
mutate(Value = cumsum(X=="OK")) %>%
ggplot(aes(Date, y=Value, group = Group, col = Group)) + geom_line()
I can't quite figure this out. A CSV of 200+ rows assigned to data like so:
gid,bh,p1_id,p1_x,p1_y
90467,R,543333,80.184,98.824
90467,L,408045,74.086,90.923
90467,R,543333,57.629,103.797
90467,L,408045,58.589,95.937
Trying to group by p1_id and plot the mean values for p1_x and p1_y:
grp <- data %>% group_by(p1_id)
Trying to plot geom_point objects like so:
geom_point(aes(mean(grp$p1_x), mean(grp$p1_y), color=grp$p1_id))
But that isn't showing unique plot points per distinct p1_id values.
What's the missing step here?
Why not calculate the mean first?
library(dplyr)
grp <- data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y))
Then plot:
library(ggplot2)
ggplot(grp, aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))
Edit: As per #eipi10, you can also pipe directly into ggplot
data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y)) %>%
ggplot(aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))