Scatterplot - adding equation and r square value - r

I am newbie at R. Now I want to plot data (two variables) and showing regression line including the boxplot. I am able to show those data except the r square value and equation chart.
Below is my script in showing the graph
library (car)
scatterplot(FIRST_S2A_NDVI, MEAN_DRONE_NDVI,
main = "NDVI Value from Sentinel and Drone",
xlab = "NDVI Value from Sentinel",
ylab = "NDVI Value from Drone",
pch = 15, col = "black",
regLine = list(col="green"), smooth = FALSE)
The figure is like this.
Now, the final touch is to add the equation and r square value on my figure. What script do I need to write. I tried this script from Add regression line equation and R^2 on graph but still no idea how to show them.
Thanks for read and hopefully helping me in this.
p.s.
Content of my data
OBJECTID SAMPLE_GRID FIRST_S2A_NDVI MEAN_DRONE_NDVI
1 1 1 0.6411405 0.8676092
2 2 2 0.4335293 0.5697814
3 3 3 0.7350439 0.7321858
4 4 4 0.7268013 0.8271566
5 5 5 0.3638939 0.5682631
6 6 6 0.1953890 0.3168246
7 7 7 0.4841993 0.7380627
8 8 8 0.4137447 0.3239288
9 9 9 0.8219178 0.8676065
10 10 10 0.2647872 0.2296441
11 11 11 0.8126657 0.8519964
12 12 12 0.2648504 0.2465738
13 13 13 0.5992035 0.8016030
14 14 14 0.2420299 0.3933670
15 15 15 0.5059137 0.7593807
16 16 16 0.7713419 0.8026068
17 17 17 0.3762540 0.5941540
18 18 18 0.5876435 0.7763927
19 19 19 0.2491609 0.5095306
20 20 20 0.3213648 0.4456958
21 21 21 0.2101466 0.1960858
22 22 22 0.3749034 0.4956361
23 23 23 0.5712630 0.7350484
24 24 24 0.8444895 0.8577550
25 25 25 0.3331450 0.4390229
26 26 26 0.1851611 0.4573663
27 27 27 0.4914998 0.2750837
28 28 28 0.7121390 0.7780228

For adding the equation and the R squared value to your current plot. You can simply create a model with the y and x variables and format a equation and paste in over the plot using mtext function.
m <- lm(MEAN_DRONE_NDVI~FIRST_S2A_NDVI)
eq <- paste0("y = ",round(coef(m)[2],3),"x ",
ifelse(coef(m)[1]<0,round(coef(m)[1],3),
paste("+",round(coef(m)[1],3))))
mtext(eq, 3,-1)
mtext(paste0("R^2 = ",round(as.numeric(summary(m)[8]),3)), 3, -3)
You can change the variables in your model and also change the position of the text with the 2nd and 3rd arguments in the mtext function

Related

Show only even numbers from a data set

I am trying to extract only the even numbers from the "cars" data set.
I know I need to create a new function.
I have come this far:
Is.even = function(x) x %% 2 == 0
When I enter in:
Is.even(cars[1])
It gives me back a logical response. I want to only display the actual even numbers in integer form and hide the odd numbers.
What am I doing wrong?
Apart from #neilfws' suggestion, if you pass your values as a vector you can also use Filter
Filter(Is.even, cars[, 1])
#[1] 4 4 8 10 10 10 12 12 12 12 14 14 14 14 16 16 18 18 18 18 20 20 20 20 20 22 24 24 24 24

Get the lag vector from variogram in gstat

I want to compute the variogram from a set of data in R. I am using the function "variogram" from the gstat package.
Now, I want to get the lag vector from the variogram. The problem is that myvariogram$dist returns the averages of the distances between all point paits.
How can I get the lag vector instead?
My data are in two dimension:coordinates x and y with z values
x y z
1 -0.9000000 1.102146e-16 0.160000000
2 -0.8724602 2.209369e-01 0.284010236
3 -0.7915264 4.283527e-01 0.408020473
4 -0.5527914 -7.102265e-01 -0.294704200
5 -0.7102265 -5.527914e-01 -0.170693964
6 -0.8241960 -3.615259e-01 -0.046683727
7 -0.8877252 -1.481351e-01 0.077326509
8 -0.6464646 -3.877551e-01 -0.205706068
9 -0.4444444 -1.428571e-01 -0.154399515
10 -0.5959596 -3.469388e-01 -0.227651744
11 -0.5454545 -5.510204e-01 -0.319427844
12 -0.6464646 -2.040816e-02 0.005767136
13 -0.8484848 -1.836735e-01 0.028933625
14 -0.6969697 -4.285714e-01 -0.174407224
15 -0.4949495 2.040816e-02 0.020626174
16 -0.7474747 2.040816e-02 0.075029711
17 -0.4444444 -3.061224e-01 -0.300002910
18 -0.6464646 1.428571e-01 0.135007208
19 -0.5959596 6.122449e-02 0.061006799
20 -0.5454545 -3.061224e-01 -0.239963488
21 -0.5959596 1.836735e-01 0.164762622
22 -0.3434343 -2.653061e-01 -0.324516690
23 -0.3939394 -3.469388e-01 -0.360400339
24 -0.5454545 6.122449e-02 0.058277761
25 -0.6464646 -3.061224e-01 -0.174779340
myvariog=variogram((z~1, data=mydata))

Adding extreme value distributed noise (with µ=0,σ=10) to a vector of numbers in R

I have the following matrix
Measurement Treatment
38 A
14 A
54 A
69 A
20 B
36 B
35 B
10 B
11 C
98 C
88 C
14 C
I want to add extreme value distributed noise (with mean=0 and sd=10) to the Measurement values. How can I achieve that in R?
I found revd in extRemes package, but it does not work as expected. Does devd from the same package do what I want to do? (but it does not allow for mean and sd to be defined)
If you want to use your measure as the mean for the noise, then you can do this:
measure = round(runif(10,0,30),0)
data = data.frame(measure)
for(i in 1:nrow(data)){
data$measure1[i] = rnorm(1,data$measure[i],10)
}
data
measure measure1
1 6 6.281557
2 12 -5.780177
3 18 13.529773
4 26 33.665584
5 14 12.666614
6 24 41.146132
7 5 -1.850390
8 14 16.728703
9 13 26.082601
10 13 14.066475
EDIT: You can avoid the for loop with this instead:
data$measure1 = data$measure + rnorm(1,0,10)

ggplot2 is plotting a line strangely

i am trying to plot the time series x_t = A + (-1)^t B
To do this i am using the following code. The problem is, that the ggplot is wrong.
require (ggplot2)
set.seed(42)
N<-2
A<-sample(1:20,N)
B<-rnorm(N)
X<-c(A+B,A-B)
dat<-sapply(1:N,function(n) X[rep(c(n,N+n),20)],simplify=FALSE)
dat<-data.frame(t=rep(1:20,N),w=rep(A,each=20),val=do.call(c,dat))
ggplot(data=dat,aes(x=t, y=val, color=factor(w)))+
geom_line()+facet_grid(w~.,scale = "free")
looking at the head of dat everything looks right:
> head(dat)
t w val
1 1 12 10.5533
2 2 12 13.4467
3 3 12 10.5533
4 4 12 13.4467
5 5 12 10.5533
6 6 12 13.4467
So the lower (blue) line should only have values 10.5533 and 13.4467. But it also takes different values. What is wrong in my code?
Thanks in advance for any help
You really should be more careful before asserting that something is "wrong". The way you are creating dat the rows are not ordered by dat$t, so head(...) is not displaying the extra values:
head(dat[order(dat$w,dat$t),],10)
# t w val
# 21 1 18 18.43530
# 61 1 18 18.36313
# 22 2 18 19.56470
# 62 2 18 17.63687
# 23 3 18 18.43530
# 63 3 18 18.36313
# 24 4 18 19.56470
# 64 4 18 17.63687
# 25 5 18 18.43530
# 65 5 18 18.36313
Note the row numbers.

How to create a stacked bar chart from summarized data in ggplot2

I'm trying to create a stacked bar graph using ggplot 2. My data in its wide form, looks like this. The numbers in each cell are the frequency of responses.
activity yes no dontknow
Social events 27 3 3
Academic skills workshops 23 5 8
Summer research 22 7 7
Research fellowship 20 6 9
Travel grants 18 8 7
Resume preparation 17 4 12
RAs 14 11 8
Faculty preparation 13 8 11
Job interview skills 11 9 12
Preparation of manuscripts 10 8 14
Courses in other campuses 5 11 15
Teaching fellowships 4 14 16
TAs 3 15 15
Access to labs in other campuses 3 11 18
Interdisciplinary research 2 11 18
Interdepartamental projects 1 12 19
I melted this table using reshape2 and
melted.data(wide.data,id.vars=c("activity"),measure.vars=c("yes","no","dontknow"),variable.name="haveused",value.name="responses")
That's as far as I can get. I want to create a stacked bar chart with activities on the x axis, frequency of responses in the y axis, and each bar showing the distribution of the yes, nos and dontknows
I've tried
ggplot(melted.data,aes(x=activity,y=responses))+geom_bar(aes(fill=haveused))
but I'm afraid that's not the right solution
Any help is much appreciated.
You haven't said what it is that's not right about your solution. But some issues that could be construed as problems, and one possible solution for each, are:
The x axis tick mark labels run into each other. SOLUTION - rotate the tick mark labels;
The order in which the labels (and their corresponding bars) appear are not the same as the order in the original dataframe. SOLUTION - reorder the levels of the factor 'activity';
To position text inside the bars set the vjust parameter in position_stack to 0.5
The following might be a start.
# Load required packages
library(ggplot2)
library(reshape2)
# Read in data
df = read.table(text = "
activity yes no dontknow
Social.events 27 3 3
Academic.skills.workshops 23 5 8
Summer.research 22 7 7
Research.fellowship 20 6 9
Travel.grants 18 8 7
Resume.preparation 17 4 12
RAs 14 11 8
Faculty.preparation 13 8 11
Job.interview.skills 11 9 12
Preparation.of.manuscripts 10 8 14
Courses.in.other.campuses 5 11 15
Teaching.fellowships 4 14 16
TAs 3 15 15
Access.to.labs.in.other.campuses 3 11 18
Interdisciplinay.research 2 11 18
Interdepartamental.projects 1 12 19", header = TRUE, sep = "")
# Melt the data frame
dfm = melt(df, id.vars=c("activity"), measure.vars=c("yes","no","dontknow"),
variable.name="haveused", value.name="responses")
# Reorder the levels of activity
dfm$activity = factor(dfm$activity, levels = df$activity)
# Draw the plot
ggplot(dfm, aes(x = activity, y = responses, group = haveused)) +
geom_col(aes(fill=haveused)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.25)) +
geom_text(aes(label = responses), position = position_stack(vjust = .5), size = 3) # labels inside the bar segments

Resources