ggplot with three variables x~axis in r - r

I am currently trying to create a ggplot with three variables in r, that compares a H-1b support (y~axis) - the variable is h1bvis.supp and implicit bias (x~axis) by gender - the variable is impl.prejud. I have tried to create the plot with the folllowing code:
ggplot(data = immigrant) + geom_histogram(mapping = aes(x = impl.prejud, y = h1bvis.supp))
It is not working and I don't know why.
The dataset is this one:
immigrant <- read.csv("https://raw.githubusercontent.com/umbertomig/intro-prob-stat-FGV/master/datasets/immig.csv")

Is this what you need?
immigrant <- read.csv("https://raw.githubusercontent.com/umbertomig/intro-prob-stat-FGV/master/datasets/immig.csv")
ggplot(immigrant, aes(x = impl.prejud, y = h1bvis.supp)) + geom_col()

Related

Boxplot (ggplot2) not working as expected

I'm trying to plot a simple boxplot where I get 15 boxplots for respondents age of 15 "Cadernos" (15 surveys, surveys A, B, C .. to O). But that's not working as expected.
I have already tried to switch places to the "Cadernos" (surveys) and "Idade" (age) variables.
Any idea? What I expected was 15 boxplots in the vertical axis.
The code I'm using is the follow:
library(ggplot2)
select_base %>%
ggplot(aes(Idade,Caderno)) +
geom_boxplot()
the plot I get is the following:
I'm not sure that you did not provide your data, but you may try
select_base %>%
ggplot(aes(x = Caderno, y = Idade, group = Caderno)) +
geom_boxplot()
For example using data dummy it will be like plot below
dummy <- data.frame(
x = rnorm(50),
y = rep(c("a","b","c","d","e"),10)
)
dummy %>%
ggplot(aes(x = y,y = x, group = y)) +
geom_boxplot()

ggplot2 geom_qq change theoretical data

I have a set of pvalues i.e 0<=pval<=1
I want to plot qqplot using ggplot2
As in the documentation the following code will plot a q_q plot, however if my data are pvalues I want the therotical values to be also probabilites ie. 0<=therotical v<=1
df <- data.frame(y = rt(200, df = 5))
p <- ggplot(df, aes(sample = y))
p + stat_qq() + stat_qq_line()
I am aware of the qqplot.pvalues from gaston package it does the job but the plot is not as customizable as the ggplot version.
In gaston package the theoretical data are plotted as -log10((n:1)/(n + 1)) where n is number of pvalues. How to pass these values to ggplot as theoritical data?
Assuming you have some p-values, say from a normal distribution you could create it manually
library(ggplot2)
data <- data.frame(outcome = rnorm(150))
data$pval <- pnorm(data$outcome)
data <- data[order(data$pval),]
ggplot(data = data, aes(y = pval, x = pnorm(qnorm(ppoints(nrow(data)))))) +
geom_point() +
geom_abline(slope = 1) +
labs(x = 'theoraetical p-val', y = 'observed p-val', title = 'qqplot (pval-scale)')
Although I am not sure this plot is sensible to use for conclusions.

Plot categorical data as histogram/ bar in R?

I am new to R and have been trying for a few days to plot histogram / bar chart to view the trend. I have this categorical variable : countryx and coded it into 1,2,3.
I have tried these 2 scripts below and got error messages as follows :
Output 1: blank chart with x and y axis, no stack/bar trend
qplot(DI$countryx,geom = "histogram",ylab = "count",
xlab = "countryx",binwidth=5,colour=I("blue"),fill=I("wheat"))
Output 2: error message- ggplot2 doesn't know how to deal with data of class integer
ggplot(DI$countryX, aes(x=countryx))
+ geom_bar(aes(y=count), stat = "count",position ="stack",...,
width =5,aes=true)
Appreciate for all advice.
Thank you very much for your help!
Multiple problems with your code. ggplot takes a dataframe, not a vector, but you're supplying a vector. Try this
ggplot(DI, aes(x=countryx, y = count)) + geom_col(width = 5)
As #yeedle mentioned you need a data.frame (maybe use as.data.frame)
How about:
library(ggplot2)
df <- data.frame(countryx = rep(1:3), count = rbinom(3,10,0.3))
p <- ggplot2::ggplot(df, aes(x = countryx, y = count)) + ylab("count")
p + geom_col(aes(x = countryx, fill = factor(countryx)))

How to make a histogram from a matrix in R

I`m having trouble constructing an histogram from a matrix in R
The matrix contains 3 treatments(lamda0.001, lambda0.002, lambda0.005 for 4 populations rec1, rec2, rec3, con1). The matrix is:
lambda0.001 lambda0.002 lambda.003
rec1 1.0881688 1.1890554 1.3653264
rec2 1.0119031 1.0687678 1.1751051
rec3 0.9540271 0.9540271 0.9540271
con1 0.8053506 0.8086985 0.8272758
my goal is to plot a histogram with lambda in the Y axis and four groups of three treatments in X axis. Those four groups should be separated by a small break from eache other.
I need help, it doesn`t matter if in ggplot2 ou just regular plot (R basic).
Thanks a lot!
Agree with docendo discimus that maybe a barplot is what you're looking for. Based on what you're asking though I would reshape your data to make it a little easier to work with first and you can still get it done with stat = "identity"
sapply(c("dplyr", "ggplot2"), require, character.only = T)
# convert from matrix to data frame and preserve row names as column
b <- data.frame(population = row.names(b), as.data.frame(b), row.names = NULL)
# gather so in a tidy format for ease of use in ggplot2
b <- gather(as.data.frame(b), lambda, value, -1)
# plot 1 as described in question
ggplot(b, aes(x = population, y = value)) + geom_histogram(aes(fill = lambda), stat = "identity", position = "dodge")
# plot 2 using facets to separate as an alternative
ggplot(b, aes(x = population, y = value)) + geom_histogram(stat = "identity") + facet_grid(. ~ lambda)

ggplot2-line plotting with TIME series and multi-spline

This question's theme is simple but drives me crazy:
1. how to use melt()
2. how to deal with multi-lines in single one image?
Here is my raw data:
a 4.17125 41.33875 29.674375 8.551875 5.5
b 4.101875 29.49875 50.191875 13.780625 4.90375
c 3.1575 29.621875 78.411875 25.174375 7.8012
Q1:
I've learn from this post Plotting two variables as lines using ggplot2 on the same graph to know how to draw the multi-lines for multi-variables, just like this:
The following codes can get the above plot. However, the x-axis is indeed time-series.
df <- read.delim("~/Desktop/df.b", header=F)
colnames(df)<-c("sample",0,15,30,60,120)
df2<-melt(df,id="sample")
ggplot(data = df2, aes(x=variable, y= value, group = sample, colour=sample)) + geom_line() + geom_point()
I wish it could treat 0 15 30 60 120 as real number to show the time series, rather than name_characteristics. Even having tried this, I failed.
row.names(df)<-df$sample
df<-df[,-1]
df<-as.matrix(df)
df2 <- data.frame(sample = factor(rep(row.names(df),each=5)), Time = factor(rep(c(0,15,30,60,120),3)),Values = c(df[1,],df[2,],df[3,]))
ggplot(data = df2, aes(x=Time, y= Values, group = sample, colour=sample))
+ geom_line()
+ geom_point()
Loooooooooking forward to your help.
Q2:
I've learnt that the following script can add the spline() function for single one line, what about I wish to apply spline() for all the three lines in single one image?
n <-10
d <- data.frame(x =1:n, y = rnorm(n))
ggplot(d,aes(x,y))+ geom_point()+geom_line(data=data.frame(spline(d, n=n*10)))
Your variable column is a factor (you can verify by calling str(df2)). Just convert it back to numeric:
df2$variable <- as.numeric(as.character(df2$variable))
For your other question, you might want to stick with using geom_smooth or stat_smooth, something like this:
p <- ggplot(data = df2, aes(x=variable, y= value, group = sample, colour=sample)) +
geom_line() +
geom_point()
library(splines)
p + geom_smooth(aes(group = sample),method = "lm",formula = y~bs(x),se = FALSE)
which gives me something like this:

Resources