I have this code.
a = c("a", 1)
b = c("b",2)
c = c('c',3)
d = c('d',4)
e = c('e',5)
z = data.frame(a,b,c,d,e)
hist = hist(as.numeric(z[2,]))
I am trying to have a histogram such that the bins would be a,b,c,d,e
and the freq values would be 1,2,3,4,5.
However, it gives me an empty screen(no bins at all for histogram model)
You are plotting the factor levels of each column for row 2, which is in this case always 1.
When creating the dataframe you add stringsAsFactors=FALSE to avoid converting the numbers to factors. This should work:
z = data.frame(a,b,c,d,e,stringsAsFactors=FALSE)
hist(as.numeric(z[2,]))
Perhaps this would work for you: it creates a data frame with the x elements being the letters a through 'e', and the y elements being the numbers 1 through 5. It then renders a histogram and tells ggplot not to perform any binning.
library(ggplot2)
tmp <- data.frame(x = letters[1:5], y = 1:5)
ggplot(tmp, aes(x = x, y = y)) + geom_histogram(stat = "identity")
Related
I have 400 columns with dynamic names (t_namelist1_namelist2). There are 20 names in each namelist1 and namelist2. I have to create a histogram with facets 20 by 20, with labels -
along row - namelist1
along col - namelist2
Can someone please show how to transform the data using pivot_longer() using _ which separates the three parts t, namelist1 and namelist2
In the sample problem, I have a tibble with 4 columns, I want to create 4 individual histograms in 2by2 facets with labels -
along row - a and b
along col - x and y
Thanks
library(tidyverse)
t_a_x <- rnorm(100)
t_b_x <- rnorm(100)
t_a_y <- rnorm(100)
t_b_y <- rnorm(100)
tbl <- tibble(t_a_x, t_a_y, t_b_x, t_b_y)
# create a histogram in 2 by 2 facets with labels -
# along row - a and b
# along col - x and y
The first chunk of code below rearranges the example data into three-column data frame, each column corresponding to either "ab", "xy", or the value ("t"); basically separating the original column name by "_". Then you can plot and facet based on "xy" and "ab".
# Rearrange table by separating the column names by "_" using pivot_longer()
tbl_formatted <- tbl %>% pivot_longer(everything(),
names_to = c(".value", "ab", "xy"),
names_sep = c("_")
)
# Plot
tbl_formatted %>% ggplot(aes(x = t, y = t)) +
geom_point() +
facet_wrap(facets = ab ~ xy)
This is a very basic version of the plot; you can customize it with colors and more.
To get it, you need to properly rearrange your dataframe:
library(tidyverse)
tbl <- tibble(value = c(t_a_x, t_b_x, t_a_y, t_b_y),
lab1 = rep(c("a", "b", "a", "b"), each = 100),
lab2 = rep(c("x", "x", "y", "y"), each = 100))
ggplot(tbl) +
geom_histogram(aes(value), binwidth = 0.5) +
facet_grid(lab1~lab2)
I'm pretty new at R and coding so I don't know how to explain it well on this site but I couldn't find a better forum to ask.
Basically I have a 6x6 matrix with each row being a discrete gene and each column being a sample.
I want the genes as the x-axis and the y-axis being the values of the samples, so that each gene will have its 6 samples above at their respective value.
I have this matrix in Excel and when I highlight it and plot it it gives me exactly what I want.
But trying to reduplicate it in R gives me a giant lattice plot at best.
I've tried boxplot(), scatterchart(), plot(), and ggplot().
I'm assuming I have to alter my matrix but I don't know how.
this may help:
library(tidyverse)
gene <- c("a", "b", "c", "d", "e", "f")
x1 <- c(1,2,3,4,5,6)
x2 <- c(2,3,4,5,-6,7)
x3 <- c(3,4,5,6,7,8)
x4 <- c(4,-5,6,7,8,9)
x5 <- c(9,8,7,6,5,4)
x6 <- c(5,4,3,2,-1,0)
df <- data.frame(gene, x1, x2, x3, x4, x5, x6) #creates data.frame
as_tibble(df) # convenient way to check data.frame values and column format types
df <- df %>% gather(sample, observation, 2:7) # here's the conversion to long format
as_tibble(df) #watch df change
#example plots
p1 <- ggplot(df, aes(x = gene, y = observation, color = sample)) + geom_point()
p1
p2 <- ggplot(df, aes(x = gene, y = observation, group = sample, color = sample)) +
geom_line()
p2
p3 <- p2 + geom_point()
p3
This is very easy to solve - if your matrix is 6x6 with one gene per row and one observation per column (thus six observations per gene) you first need to make it long format (36 rows) - with such a simple format this can be done using unlist - and then plotting that against a vector of numbers for representing the genes:
# Here I make some dummy data - a 6x6 matrix of random numbers:
df1 <- matrix(rnorm(36,0,1), ncol = 6)
# To help show which way the data unlists, and make the
# genes different, I add 4 to gene 1:
df1[1,] <- df1[1,] + 4
#### TL;DR - HERE IS THE SOULTION ####
# Then plot it, using rep to make the x-axis data vector
plot(x = rep(1:6, times = 6), y = unlist(df1))
To improve the readability add axis labels:
# With axis labels
plot(x = rep(1:6, times = 6), y = unlist(df1),
xlab = 'Gene', ylab = 'Value')
You could also used ggplot with the geom_point aesthetic or geom_jitter - e.g:
ggplot() +
geom_jitter(mapping = aes(x = rep(1:6, times = 6), y = as.numeric(unlist(data.frame(df1)))))
Note that you can also create a "jitter" effect in base R using rnorm() on the x values, tweaking the amount of jittering with the last argument of the rnorm() function:
plot(x = rep(1:6, times = 6) + rnorm(36, 0, 0.05), y = unlist(df1), xlab = 'Gene', ylab = 'Value')
Using the plotly package in R, I would like to do a desity plot. Actually, I need to add one more density line in my graph. I have a data with the income information of some public company by geographical region. Something like this
head(data)
id income region
1 4556 1
2 6545 1
3 65465 2
4 54555 1
5 71442 2
6 5645 6
In a first moment, I analysed 5 and 6 regions' income with the following density plot
reg56<- data[data$region %in% c(5,6) , ]
dens <- with(reg56, tapply(income, INDEX = region, density))
df <- data.frame(
x = unlist(lapply(dens, "[[", "x")),
y = unlist(lapply(dens, "[[", "y")),
cut = rep(names(dens), each = length(dens[[1]]$x))
)
# plot the density
p<- plot_ly(df, x = x, y = y, color = cut)
But, I want more than this. I would like to add the total income, i.e. the income of all regions. I tried something this
data$aux<- 1
dens2 <- with(data, tapply(income, INDEX = 1, density))
df2 <- data.frame(
x = unlist(lapply(dens2, "[[", "x")),
y = unlist(lapply(dens2, "[[", "y")),
cut = rep(names(dens2), each = length(dens2[[1]]$x)) )
p<- plot_ly(df, x = x, y = y, color = cut)
p<- add_trace(p, df2, x = x, y = y, color = cut)
p
Error in FUN(X[[i]], ...) :
'options' must be a fully named list, or have no names (NULL)
Some solution for this?
Because you are not naming the parameters that you pass to add_trace, it interprets them as corresponding to the default parameter order. The usage of add_trace is
add_trace(p = last_plot(), ..., group, color, colors, symbol, symbols,
size, data = NULL, evaluate = FALSE)
So, in your function call where you provide the data.frame df2 as the 2nd parameter, this is assumed to be correspond to the ... parameter, which must be a named list. You need to specify data = df2, so that add_trace understands what this parameter is.
Lets generate some dummy data to demonstrate on
library(plotly)
set.seed(999)
data <- data.frame(id=1:500, income = round(rnorm(500,50000,15000)), region=sample(6,500,replace=T) )
Now, (after calculating df and df2 as in your example):
p <- plot_ly(df, x = x, y = y, color = cut) %>%
add_trace(data=df2, x = x, y = y, color = cut)
p
So i want to create a stacked bar chart, with frequency counts printed for each
fill factor.
Showing data values on stacked bar chart in ggplot2
This question places the counts in the center of each segment, but the user specifies the values. In this example we dont input the specific value, and I am seeking an r function that automatically calcualtes counts.
Take the following data for example.
set.seed(123)
a <- sample(1:4, 50, replace = TRUE)
b <- sample(1:10, 50, replace = TRUE)
data <- data.frame(a,b)
data$c <- cut(data$b, breaks = c(0,3,6,10), right = TRUE,
labels = c ("M", "N", "O"))
head(data)
ggplot(data, aes(x = a, fill = c)) + geom_bar(position="fill")
So I want to print a "n= .." for M,N and O value in 1,2,3 and 4
So the end result looks like
Similiar to this question, however we do not have fr
Try the following:
obj <- ggplot_build(p)$data[[1]]
# some tricks for getting centered the y-positions:
library(dplyr)
y_cen <- obj[["y"]]
y_cen[is.na(y_cen)] <- 0
y_cen <- (y_cen - lag(y_cen))/2 + lag(y_cen)
y_cen[y_cen == 0 | y_cen == .5] <- NA
p + annotate("text", x = obj[["x"]], y = y_cen, label = paste("N = ", obj[["count"]]))
Which gives:
I have a qplot that is showing 5 different groupings (denoted with colour = type) with two dependent variables each. The command looks like this:
qplot(data = data, x = day, y = var1, geom = "line", colour = type) +
geom_line(aes(y = var2, colour = value)
I'd like to label the two different lines so that I can tell which five represent var1 and which five represent var2.
How do I do this?
You can convert the data to a "tall" format, with melt, and use another aesthetic, such as the line type, to distinguish the variables.
# Sample data
n <- 100
k <- 5
d <- data.frame(
day = rep(1:n,k),
type = factor(rep(1:k, each=n)),
var1 = as.vector( replicate(k, cumsum(rnorm(n))) ),
var2 = as.vector( replicate(k, cumsum(rnorm(n))) )
)
# Normalize the data
library(reshape2)
d <- melt(d, id.vars=c("day","type"))
# Plot
library(ggplot2)
ggplot(d) + geom_line(aes(x=day, y=value, colour=type, linetype=variable))