Problem with my code- Univariate regression plot not showing lines - plot

this will sound very basic, but I cannot find the solution to this problem with my code. I did a univariate regression (regr1) between the 2 variables immigrate_policy and lrgen. In plotting the commands for the lines do not show.
One problem could be the sequence maybe? Because the range for lrgen should actually be between 1 and 9, but I had to put manually 1:8 because every other sequence I put gives me an error. With this sequence, however, the lines in the plot are weird, and definitely not right
Following is my code:
regr1 <- lm(formula = ITA$immigrate_policy ~ ITA$lrgen, data = ITA)
summary(regr1)
install.packages("stargazer") library(stargazer) help(stargazer)
stargazer(regr1, type ="html",out="project.html")
stargazer(regr1, type="text",out="project/regression.html")
plot(ITA$lrgen, ITA$immigrate_policy,
xlab = "Political Stance of the party", ylab = "Position towards Immigration policies") abline(regr1, col = "red", lwd = 2)
range(ITA$lrgen)
ci <- data.frame(lrgen = seq(1:8))
sim <- predict(regr1, newdata = ci, interval = "confidence", level =
0.99)
lines(c(1:8),sim[,2], lt = "dashed", lwd = 1, col = "yellow")
lines(c(1:8),sim[,3], lt = "dashed", lwd = 1, col = "yellow")

Related

Add calculated mean value to vertical line in plot in R

I have created a density plot with a vertical line reflecting the mean - I would like to include the calculated mean number in the graph but don't know how
(for example the mean 1.2 should appear in the graph).
beta_budget[,2] is the column which includes the different numbers of the price.
windows()
plot(density(beta_budget[,2]), xlim= c(-0.1,15), type ="l", xlab = "Beta Coefficients", main = "Preis", col = "black")
abline(v=mean(beta_budget[,2]), col="blue")
legend("topright", legend = c("Price", "Mean"), col = c("black", "blue"), lty=1, cex=0.8)
I tried it with the text command but it didn't work...
Thank you for your advise!
Something along these lines:
Data:
set.seed(123)
df <- data.frame(
v1 = rnorm(1000)
)
Draw histogram with density line:
hist(df$v1, freq = F, main = "")
lines(density(df$v1, kernel = "cosine", bw = 0.5))
abline(v = mean(df$v1), col = "blue", lty = 3, lwd = 2)
Include the mean as a text element:
text(mean(df$v1), # position of text on x-axis
max(density(df$v1)[[2]]), # position of text on y-axis
mean(df$v1), # text to be plotted
pos = 4, srt = 270, cex = 0.8, col = "blue") # some graphical parameters

Is there a way to use R to break chart axis and break linear regression line?

I'm trying to figure out how to modify a scatter-plot that contains two groups of data along a continuum separated by a large gap. The graph needs a break on the x-axis as well as on the regression line.
This R code using the ggplot2 library accurately presents the data, but is unsightly due to the vast amount of empty space on the graph. Pearson's correlation is -0.1380438.
library(ggplot2)
p <- ggplot(, aes(x = dis, y = result[, 1])) + geom_point(shape = 1) +
xlab("X-axis") +
ylab("Y-axis") + geom_smooth(color = "red", method = "lm", se = F) + theme_classic()
p + theme(plot.title = element_text(hjust = 0.5, size = 14))
This R code uses gap.plot to produce the breaks needed, but the regression line doesn't contain a break and doesn't reflect the slope properly. As you can see, the slope of the regression line isn't as sharp as the graph above and there needs to be a visible distinction in the slope of the line between those disparate groups.
library(plotrix)
gap.plot(
x = dis,
y = result[, 1],
gap = c(700, 4700),
gap.axis = "x",
xlab = "X-Axis",
ylab = "Y-Axis",
xtics = seq(0, 5575, by = 200)
)
abline(v = seq(700, 733) , col = "white")
abline(lm(result[, 1] ~ dis), col = "red", lwd = 2)
axis.break(1, 716, style = "slash")
Using MS Paint, I created an approximation of what the graph should look like. Notice the break marks on the top as well as the discontinuity between on the regression line between the two groups.
One solution is to plot the regression line in two pieces, using ablineclip to limit what's plotted each time. (Similar to #tung's suggestion, although it's clear that you want the appearance of a single graph rather than the appearance of facets.) Here's how that would work:
library(plotrix)
# Simulate some data that looks roughly like the original graph.
dis = c(rnorm(100, 300, 50), rnorm(100, 5000, 100))
result = c(rnorm(100, 0.6, 0.1), rnorm(100, 0.5, 0.1))
# Store the location of the gap so we can refer to it later.
x.axis.gap = c(700, 4700)
# gap.plot() works internally by shifting the location of the points to be
# plotted based on the gap size/location, and then adjusting the axis labels
# accordingly. We'll re-compute the second half of the regression line in the
# same way; these are the new values for the x-axis.
dis.alt = dis - x.axis.gap[1]
# Plot (same as before).
gap.plot(
x = dis,
y = result,
gap = x.axis.gap,
gap.axis = "x",
xlab = "X-Axis",
ylab = "Y-Axis",
xtics = seq(0, 5575, by = 200)
)
abline(v = seq(700, 733), col = "white")
axis.break(1, 716, style = "slash")
# Add regression line in two pieces: from 0 to the start of the gap, and from
# the end of the gap to infinity.
ablineclip(lm(result ~ dis), col = "red", lwd = 2, x2 = x.axis.gap[1])
ablineclip(lm(result ~ dis.alt), col = "red", lwd = 2, x1 = x.axis.gap[1] + 33)

plotFit - data plotted as bars instead of points?

I am using the plotFit function in the investr package in R to display my data as follows:
Figure 1
The code I am using to generate this is simply:
plotFit(nls model, interval = "confidence", level = 0.95, pch = 19, shade = TRUE,
col.conf = "seagreen2", col.fit = "green", lwd.fit = 2,
ylim = c(y1,y2), xlim = c(x1,x2),
xaxp = c(0,200,10), n = 100,
ylab = "", xlab = "",
main = "")
Is there a simple way that I could adapt the code to plot the data as bars, rather than points?
Yes, use type = "h". For example,
fit <- lm(dist ~ speed, data = cars)
library(investr)
plotFit(fit)
plotFit(fit, type = "h", lwd = 3)

How to fit a curve to a histogram

I've explored similar questions asked about this topic but I am having some trouble producing a nice curve on my histogram. I understand that some people may see this as a duplicate but I haven't found anything currently to help solve my problem.
Although the data isn't visible here, here is some variables I am using just so you can see what they represent in the code below.
Differences <- subset(Score_Differences, select = Difference, drop = T)
m = mean(Differences)
std = sqrt(var(Differences))
Here is the very first curve I produce (the code seems most common and easy to produce but the curve itself doesn't fit that well).
hist(Differences, density = 15, breaks = 15, probability = TRUE, xlab = "Score Differences", ylim = c(0,.1), main = "Normal Curve for Score Differences")
curve(dnorm(x,m,std),col = "Red", lwd = 2, add = TRUE)
I really like this but don't like the curve going into the negative region.
hist(Differences, probability = TRUE)
lines(density(Differences), col = "Red", lwd = 2)
lines(density(Differences, adjust = 2), lwd = 2, col = "Blue")
This is the same histogram as the first, but with frequencies. Still doesn't look that nice.
h = hist(Differences, density = 15, breaks = 15, xlab = "Score Differences", main = "Normal Curve for Score Differences")
xfit = seq(min(Differences),max(Differences))
yfit = dnorm(xfit,m,std)
yfit = yfit*diff(h$mids[1:2])*length(Differences)
lines(xfit, yfit, col = "Red", lwd = 2)
Another attempt but no luck. Maybe because I am using qnorm, when the data obviously isn't normal. The curve goes into the negative direction again.
sample_x = seq(qnorm(.001, m, std), qnorm(.999, m, std), length.out = l)
binwidth = 3
breaks = seq(floor(min(Differences)), ceiling(max(Differences)), binwidth)
hist(Differences, breaks)
lines(sample_x, l*dnorm(sample_x, m, std)*binwidth, col = "Red")
The only curve that visually looks nice is the 2nd, but the curve falls into the negative direction.
My question is "Is there a "standard way" to place a curve on a histogram?" This data certainly isn't normal. 3 of the procedures I presented here are from similar posts but I am having some troubles obviously. I feel like all methods of fitting a curve will depend on the data you're working with.
Update with solution
Thanks to Zheyuan Li and others! I will leave this up for my own reference and hopefully others as well.
hist(Differences, probability = TRUE)
lines(density(Differences, cut = 0), col = "Red", lwd = 2)
lines(density(Differences, adjust = 2, cut = 0), lwd = 2, col = "Blue")
OK, so you are just struggling with the fact that density goes beyond "natural range". Well, just set cut = 0. You possibly want to read plot.density extends “xlim” beyond the range of my data. Why and how to fix it? for why. In that answer, I was using from and to. But now I am using cut.
## consider a mixture, that does not follow any parametric distribution family
## note, by construction, this is a strictly positive random variable
set.seed(0)
x <- rbeta(1000, 3, 5) + rexp(1000, 0.5)
## (kernel) density estimation offers a flexible nonparametric approach
d <- density(x, cut = 0)
## you can plot histogram and density on the density scale
hist(x, prob = TRUE, breaks = 50)
lines(d, col = 2)
Note, by cut = 0, density estimation is done strictly within range(x). Outside this range, density is 0.

R: prcomp, Order of columns in data table matter?

I ran the prcomp function on a data table containing 91 columns and 2030 rows and obtained a PCA plot. However, when I re-ordered the same data table to make it easier to color-code the data points, I got an entirely different looking PCA plot.
Does the order of the columns matter in prcomp()?
Just a note, the code included was provided for me by someone previously in my lab, who is no longer here to ask. I have a moderate understanding of what it is doing.
Thanks for the help!
pcaPlotter3d <- function(fileName, startColumn, endColumn){
x<- read.table(fileName, sep = '\t', header =TRUE, stringsAsFactors = FALSE)
pcaData <- prcomp(~., x[,startColumn:endColumn], na.action=na.exclude, scale = TRUE)
library(scatterplot3d)
colorList <- c(rep("magenta", 2), rep("blue", 12), rep("red",33), rep("purple", 2), rep("green", 6), rep("black",36))
shapeList <- c(rep(19, 91))#, rep(15, 24))
with (pcaData, {
pointsForPlot <- scatterplot3d(pcaData$rotation[,1:3], color=colorList,
pch = shapeList, main = "TAP Proteins PCA", mar = c(3,3,3,5), xlab = "PC1 (16.5%)", ylab = "PC2 (3.67%)", zlab = "PC3 (2.79%)",
col.grid = NULL)
pointsForPlot.coords <- pointsForPlot$xyz.convert(pcaData$rotation[,1:3])
legend(8,5, bty = "n", xpd = TRUE, cex = 0.75, inset = .1,
title = "Groups", c("Bio", "EF", "IF", "RF", "Rib", "Unk"),
col = c("magenta", "blue", "red", "purple", "green", "black") , pch = c(19,19,19,19,19,19));
})
print(summary(pcaData))
}
It's hard to see in the perspective plots, but it seems that all that has happened is that the signs of PC2 and PC3 have been flipped. (Eigenvectors/PCA directions are only defined up to a change in sign, and a trivial change like changing the order of the columns can indeed cause them to flip.) Given that the inertias/proportions of variance are the same and the ranges of the axes are inverted (e.g. PC2 goes from -0.1 to 0.5 in plot 1 and -0.5 to 0.1 in plot 2), this is the most likely explanation. You can simply multiply the PC2 and PC3 coordinates by -1 in the appropriate places if you want to recover the original plot ...

Resources