I'm an R newbie and need help with the following.
I have the following data
# Simulate matrix of integers
set.seed(1)
df <- matrix(sample.int(5, size = 3*5, replace = TRUE), nrow = 3, ncol = 5)
print(df)
df <- tbl_df(df) # tabulate as dataframe
df <- rbind(df, c(3,5,4,1,4))
print(df)
Within a single command, I need to plot the data for each row, so that y-axis: data in each row (in my case these are values from 1 to 5); x-axis: values 1,2,3,4,5 that refer to each column. So effectively, for each row, I am trying to plot how row values change for every single column.
I have tried the following, which works but has two problems which I need to resolve. First, this only plots 1 row at a time. Not an efficient way of doing things especially if there are many rows. Second, I could not find a way to refer to the x-axis as the number of columns, so I resorted to simply counting the number of columns (i.e. 5) and put a c(1:5) vector to represent a number of columns. I also tried to put ncol(df) to represent x-axis but that returns an error saying that variables have different length. Indeed when requesting ncol(df) it return number 5, which is the number of columns but it does not do what I wanted it to, i.e. to represent number of columns sequentially 1,2,3,4,5.
plot(c(1:5),df[1,], type = "b", pch=19,
col = "blue", xlab = "number of columns", ylab = "response format")
Thank you, your help is much appreciated
You could do:
library(tidyverse)
df %>%
mutate(row_number = as.factor(row_number())) %>%
gather(columns, responses, V1:V5) %>%
ggplot(aes(x = columns, y = responses, group = row_number, color = row_number)) +
geom_line() + geom_point()
Output:
What this does:
Creates an id for each row (row_number);
Transforms the data frame into a long format with 1 column for columns, and another for responses;
Plots everything on 1 chart where each color represents one row.
You could also slightly change the plot so that each line (row) has its own chart by adding facet_wrap, e.g.:
df %>%
mutate(row_number = as.factor(row_number())) %>%
gather(columns, responses, V1:V5) %>%
ggplot(aes(x = columns, y = responses, group = row_number, color = row_number)) +
geom_line() + geom_point() +
facet_wrap(~ row_number)
Output:
Related
I wrapped my head around this question. The chart should look similar like this:
So I am basically trying to plot returns but "standardizing" them before. Is there any quick way to do this? I thought about dividing each row entry by the value of the first row respectively, e.g. if stock starts trading at 200, data point 1 will be 200/200=1, datapoint 2 say 210/200= 1.05 etc. - I could then also multiply that value by 100 so I would start the first one with 100, second 105 etc.
Does this make sense or is there a smarter way to do this?
Many thanks!
You may want seq_along(). I don't have your data, so here is an example with some dummy data:
set.seed(12345)
df <- data.frame(company = c(rep("A", 100), rep("B", 100), rep("C",100), rep("D",100)),
value = c(rnorm(100, 150, 25), rnorm(100, 250, 25), rnorm(100, 50, 25), rnorm(100, 300, 25)),
time = c(151:250, 100:199, 200:299, 251:350))
Add a new column in your data after grouping by the group/color variable. Use seq_along() to populate that column with a sequence of integers starting at 1 for each set. If you need to, transform that new column to whatever scale you need. Note, this only works if your horizontal axis data is evenly spaced. If the intervals are not the same, this will cause trouble.
library(dplyr)
library(ggplot2)
df %>%
group_by(company) %>%
mutate(time2 = seq_along(time)) %>%
ggplot(aes(x = time2, y = value, color = company)) +
geom_line(size = 2) +
xlab("relative time")
If your data is unevenly spaced, consider transforming to subtract each value per group by the minimum per group. This has the bonus of preserving the interval widths. If you divide by the minimum value, the time intervals will be compressed differently in each group. Again, you could manipulate the new variable in other ways, like adding 100 so that all values start at 100.
df %>%
group_by(company) %>%
mutate(time2 = time - min(time)) %>%
ggplot(aes(x = time2, y = value, color = company)) +
geom_line(size = 2) +
xlab("relative time")
To come back to the solution I've proposed before, I've prepared following dataset:
close <- aapl %>%
select(AAPL.Close) #selecting only the closing prices for apple
The first closing price for apple is 32.1875 - for me to make it comparable with another firm independent of the share price I want to "standardize this value". Thus I will divide each row in the dataset by 32.1875 and multiply the solution by 100. This will lead to a new row, which I call relative, that begins with 100 (the base value).
close$Relative <- (close$AAPL.Close/32.1875)*100
Now I do the same with AMZN, I spare you guys the code the concept is the same. When done I bind both data.frames together:
close <- cbind(close,amzn)
And plot the data:
ggplot(close, aes(x=close$dates))+
geom_line(aes(y=Relative), color="Red")+
geom_line(aes(y=Relative1), color="Blue")
I've tried about every iteration I can find on Stack Exchange of for loops and lapply loops to create ggplots and this code has worked well for me. My only problem is that I can't assign unique titles and labels. From what I can tell in the function i takes the values of my response variable so I can't index the title I want as the ith entry in a character string of titles.
The example I've supplied creates plots with the correct values but the 2nd and 3rd plots in the plot lists don't have the correct titles or labels.
Mock dataset:
library(ggplot2)
nms=c("SampleA","SampleB","SampleC")
measr1=c(0.6,0.6,10)
measr2=c(0.6,10,0.8)
measr3=c(0.7,10,10)
qual1=c("U","U","")
qual2=c("U","","J")
qual3=c("J","","")
df=data.frame(nms,measr1,qual1,measr2,qual2,measr3,qual3,stringsAsFactors = FALSE)
identify columns in dataset that contain response variable
measrsindex=c(2,4,6)
Create list of plots that show all samples for each measurement
plotlist=list()
plotlist=lapply(df[,measrsindex], function(i) ggplot(df,aes_string(x="nms",y=i))+
geom_col()+
ggtitle("measr1")+
geom_text(aes(label=df$qual1)))
Create list of plots that show all measurements for each sample
plotlist2=list()
plotlist2=lapply(df[,measrsindex],function(i)ggplot(df,aes_string(x=measrsindex, y=i))+
geom_col()+
ggtitle("SampleA")+
geom_text(aes(label=df$qual1)))
The problem is that I cant create unique title for each plot. (All plots in the example have the title "measr1" or "SampleA)
Additionally I cant apply unique labels (from qual columns) for each bar. (ex. the letter for qual 2 should appear on top of the column for measr2 for each sample)
Additionally in the second plot list the x-values aren't "measr1","measr2","measr3" they're the index values for those columns which isn't ideal.
I'm relatively new to R and have never posted on Stack Overflow before so any feedback about my problem or posting questions is welcomed.
I've found lots of questions and answers about this sort of topic but none that have a data structure or desired plot quite like mine. I apologize if this is a redundant question but I have tried to find the solution in previous answers and have been unable.
This is where I got the original code to make my loops, however this example doesn't include titles or labels:
Looping over ggplot2 with columns
You could loop over the names of the columns instead of the column itself and then use some non-standard evaluation to get column values from the names. Also, I have included label in aes.
library(ggplot2)
library(rlang)
plotlist3 <- purrr::map(names(df)[measrsindex],
~ggplot(df, aes(nms, !!sym(.x), label = qual1)) +
geom_col() + ggtitle(.x) + geom_text(vjust = -1))
plotlist3[[1]]
plotlist3[[2]]
The same can be achieved with lapply as well
plotlist4 <- lapply(names(df)[measrsindex], function(x)
ggplot(df, aes(nms, !!sym(x), label = qual1)) +
geom_col() + ggtitle(x) + geom_text(vjust = -1))
I would recommend putting your data in long format prior to using ggplot2, it makes plotting a much simpler task. I also recoded some variables to facilitate constructing the plot. Here is the code to construct the plots with lapply.
library(tidyverse)
#Change from wide to long format
df1<-df %>%
pivot_longer(cols = -nms,
names_to = c(".value", "obs"),
names_sep = c("r","l")) %>%
#Separate Sample column into letters
separate(col = nms,
sep = "Sample",
into = c("fill","Sample"))
#Change measures index to 1-3
measrsindex=c(1,2,3)
plotlist=list()
plotlist=lapply(measrsindex, function(i){
#Subset by measrsindex (numbers) and plot
df1 %>%
filter(obs == i) %>%
ggplot(aes_string(x="Sample", y="meas", label="qua"))+
geom_col()+
labs(x = "Sample") +
ggtitle(paste("Measure",i, collapse = " "))+
geom_text()})
#Get the letters A : C
samplesvec<-unique(df1$Sample)
plotlist2=list()
plotlist2=lapply(samplesvec, function(i){
#Subset by samplesvec (letters) and plot
df1 %>%
filter(Sample == i) %>%
ggplot(aes_string(x="obs", y = "meas",label="qua"))+
geom_col()+
labs(x = "Measure") +
ggtitle(paste("Sample",i,collapse = ", "))+
geom_text()})
Watching the final plots, I think it might be useful to use facet_wrap to make these plots. I added the code to use it with your plots.
#Plot for Measures
ggplot(df1, aes(x = Sample,
y = meas,
label = qua)) +
geom_col()+
facet_wrap(~ obs) +
ggtitle("Measures")+
labs(x="Samples")+
geom_text()
#Plot for Samples
ggplot(df1, aes(x = obs,
y = meas,
label = qua)) +
geom_col()+
facet_wrap(~ Sample) +
ggtitle("Samples")+
labs(x="Measures")+
geom_text()
Here is a sample of the plots using facet_wrap.
This question already has answers here:
Order data inside a geom_tile
(2 answers)
Closed 4 years ago.
I am trying to do the following. Consider the following dataset
trends <- c('Virtual Assistant', 'Citizen DS', 'Deep Learning', 'Speech Recognition',
'Handwritten Recognition', 'Machine Translation', 'Chatbots',
'NLP')
impact <- sample(5,8, replace = TRUE)
maturity <- sample(5,8, replace = TRUE)
strategy <- sample(5,8, replace = TRUE)
h <- sample(5,8, replace = TRUE)
df <- data.frame(trends, impact, maturity, strategy, h)
rownames(df) <- df$trends
I am trying to generate a heatmap. So far is good. That is relatively easy. For example I can use
dftemp = df[,c("impact", "maturity", "strategy", "h")]
dt2 <- dftemp %>%
rownames_to_column() %>%
gather(colname, value, -rowname)
and then
ggplot(dt2, aes(x = rowname, y = colname, fill = value)) +
geom_tile()
I know the labels on the x-axis are horizontal, but I know how to fix that. What I would like to have is to order the x-axis based on one specific rows. For example I would like to have the heatmap with the row "impact" (for example) values in ascending order. Anyone can point me in the right direction?
Shoudl I convert the x in a factor and change the levels there?
Yes, you could convert it into factors and specify the levels. So to change it based on impact row we can do
dt2$rowname <- factor(dt2$rowname, levels = df$trends[order(df$impact)])
library(ggplot2)
ggplot(dt2, aes(x = rowname, y = colname, fill = value)) +
geom_tile()
It's often the case I melt my dataframes to show multiple variables on one barplot. The goal is to create a geom_bar with one par for each variable, and one summary label for each bar.
For example, I'll do this:
mtcars$id<-rownames(mtcars)
tt<-melt(mtcars,id.vars = "id",measure.vars = c("cyl","vs","carb"))
ggplot(tt,aes(variable,value))+geom_bar(stat="identity")+
geom_text(aes(label=value),color='blue')
The result is a barplot in which the label for each bar is repeated for each case (it seems):
What I want to have is one label for each bar, like this:
A common solution is to create aggregated values to place on the graph, like this:
aggr<-tt %>% group_by(variable) %>% summarise(aggrLABEL=mean(value))
ggplot(tt,aes(variable,value))+geom_bar(stat="identity")+
geom_text(aes(label=aggr$aggrLABEL),color='blue')
or
ggplot(tt,aes(variable,value))+geom_bar(stat="identity")+
geom_text(label=dplyr::distinct(tt,value),color='blue')
However, these attempts result in errors, respectively:
For solution 1: Error: Aesthetics must be either length 1 or the same as the data (96): label, x, y
For solution 2: Error in [<-.data.frame(*tmp*, aes_params, value = list(label = list( : replacement element 1 is a matrix/data frame of 7 rows, need 96
So, what to do? Setting geom_text to stat="identity" does not help either.
What I would do is create another dataframe with the summary values of your columns. I would then refer to that dataframe in the geom_text line. Like this:
library(tidyverse) # need this for the %>%
tt_summary <- tt %>%
group_by(variable) %>%
summarize(total = sum(value))
ggplot(tt, aes(variable, value)) +
geom_col() +
geom_text(data = tt_summary, aes(label = total, y = total), nudge_y = 1) # using nudge_y bc it looks better.
I have a large dataset with 30 different variables. I want to investigate some characteristics of each variable by making a histogram for each variable.
For example, for my variable A this now looks like:
hist = qplot(A, data = full_data_noNO, geom="histogram",
binwidth = 50, fill=I("lightblue"))+
theme_light()
Now, I want do this for all my variables. Does anyone know how I can loop through the names of all variables of my dataframe (so A should change each iteration).
Also, I want to loop through all variables in this code for the same purpose:
avg_price = full_data_noNO %>%
group_by(Month, Country) %>%
dplyr::summarize(total = mean(A, na.rm = TRUE))
You could reference your variables by column number:
histograms = list()
for(i in 1:ncol(full_data_noNO)){
histograms[[i]] = qplot(full_data_noNO[,i], geom="histogram",
binwidth = 50, fill=I("lightblue"))+
theme_light()
}
If all your variables are numeric, then you can do the following to produce a list of all plots, which you can then explore one by one with list indexing:
library(tidyverse)
list_of_plots <-
full_data_noNO %>%
map(~ qplot(x = ., geom = "histogram"))