Interpolating three columns - r

I have a set of data in ranges like:
x|y|z
-4|1|45
-4|2|68
-4|3|96
-2|1|56
-2|2|65
-2|3|89
0|1|45
0|2|56
0|3|75
2|1|23
2|2|56
2|3|75
4|1|42
4|2|65
4|3|78
Here I need to interpolate between x and y using the z value.
I tried interpolating separately for x and y using z value by using the below code:
interpol<-approx(x,z,method="linear")
interpol_1<-approx(y,z,method="linear")
Now I'm trying to use all the three columns but values are coming wrong.

In your script you forgot to direct to your data.frame. Note the use of $ in the approx function.
interpol <- approx(df$x,df$z,method="linear")
interpol_1 <- approx(df$y,df$z,method="linear")
Data:
df <- data.frame(
x = c(-4, -4, -4, -2, -2, -2, 0, 0, 0, 2, 2, 2, 4, 4, 4),
y = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
z = c(45, 68, 96, 56, 65, 89, 45, 56, 75, 23, 56, 75, 42, 65, 78)
)

Related

Zelen Exact Test - Trying to use a k 2x2 in the function zelen.test()

I am trying to use the zelen.test function on the package NSM3. I am having difficulty reading the data into the function.
You can recreate my data using
data <- c(4, 2, 3, 3, 8, 3, 4, 7, 0, 7, 1, 1, 12, 13,
74, 74, 77, 85, 31, 37, 11, 7, 18, 18, 96, 97, 48, 40)
events <- matrix(data, ncol = 2)
The documentation on CRAN states that zelen.test(z, example = F, r = 3) where z is an array of k 2 x 2 matrix, example is set to FALSE because it returns a p-value for an example I cannot access, and r is the number of decimals the users wants returned in the p-value.
I've tried:
zelen.test(events, r = 4)
I thought it may want the study number and the trial data, so I tried this:
studies <- c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7)
data <- c(4, 2, 3, 3, 8, 3, 4, 7, 0, 7, 1, 1, 12, 13,
74, 74, 77, 85, 31, 37, 11, 7, 18, 18, 96, 97, 48, 40)
events <- matrix(cbind(studies, events), ncol = 3)
zelen.test(events, r = 4)
but it continues to return and error stating
"Error in z[1, 1, ] : incorrect number of dimensions" for both cases I tried above.
Any help would be greatly appreciated!
If we check the source code by typing zelen.test on the console, if the example = TRUE, it is constructing a 3D array
...
if (example)
z <- array(c(2, 1, 2, 5, 1, 5, 4, 1), dim = c(2, 2, 2))
...
The input z dim is also specified in the documentation of ?zelen.test
z - data as an array of k 2x2 matrices. Small data sets only!
So, we may need to construct an array of dimensions 3
library(NSM3)
z1 <- array(c(4, 2, 3, 3, 8, 3, 4, 7), c(2, 2, 2))
zelen.test(z1, r = 4)
# Zelen's test:
# P = 1
Or with 3rd dimension of length 3
z1 <- array( c(4, 2, 3, 3, 8, 3, 4, 7, 0, 7, 1, 1), c(2, 2, 3))
zelen.test(z1, r = 4)
# Zelen's test:
#P = 0.1238

Add a group to one df based on values from another df

I have two df's:
df_1 <-
tribble(
~time, ~v1, ~v2,
-3, 213, 1,
-2, 124, 4,
-1, 532, 2,
0, 423, 5,
-3, 123, 3,
-2, 523, 2,
-1, 125, 5,
0, 515, 2,
-3, 321, 5
)
df_2 <-
tribble(
~trial, ~v4,
2, 12,
4, 23,
5, 34,
6, 53
)
'Time' of df_1 has values which at different points reset to -3. All rows before the next reset belong to a group which is defined in 'Trial' column of df_2. That is, rows of df_1 between the two resets belong to a group defined in a single row of df_2. I want to use the value from df_2 and paste it into all corresponding df_1 rows. Number of resets in df_1 matches the number of rows in df_2.
My target df would look like that:
df_final <-
tribble(
~time, ~v1, ~v2, ~trial,
-3, 213, 1, 2,
-2, 124, 4, 2,
-1, 532, 2, 2,
0, 423, 5, 2,
-3, 123, 3, 4,
-2, 523, 2, 4,
-1, 125, 5, 4,
0, 515, 2, 4,
-3, 321, 5, 5
)
Note that the 'Trial' is not simply a enumeration: it jumps from 2 to 4 in this example. This would be easy for left/right join but there is no common key in this case. I have a general idea how to do such a thing with a for loop and if, but as my df's are huge this wouldn't be optimal. Any ideas for a more typical R solution - preferably, but not necessarily using dplyr? I was trying something with 'lag' and 'which' functions but with no showable effects really.

Why are my error bars on my graph out of place?

I have a graph that I'm trying to make with ggplot and gridExtra, but my error bars are out of place. I want the error bars to be at the top of each bar, not where they are now. What can I do to correct them?
Also, what ggsave parameters will generate a graph with the same pixel parameters that I am using with the r png base function? ggsave seems to work more consistently than this function, so I need to use it.
Data:
###Open packages###
library(readxl)
library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(gridExtra)
#Dataframes
set1 <- data.frame(type = c(1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
2,
2,
2,
2,
2,
2,
2,
2,
3,
3,
3,
3,
3,
3,
3,
3,
3),
flowRate = c(24,
24,
24,
45,
45,
45,
58,
58,
58,
24,
24,
24,
45,
45,
45,
58,
58,
58,
24,
24,
24,
45,
45,
45,
58,
58,
58),
speed = c(0.563120137230256,
0.301721535875508,
0.170683367727845,
0.698874950490133,
0.158488731250147,
0.162788814307903,
0.105943103772245,
0.682354871986346,
0.17945825301837,
0.806637519498752,
0.599304186634932,
0.268788206619179,
0.518615600601962,
0.907628477211427,
0.144209408332705,
0.161586044320138,
0.946354993801663,
0.488881557759483,
0.497120443885793,
0.666120238846602,
0.264813203831783,
0.717007333314455,
0.95119232422312,
0.833669574933742,
0.450082932184122,
0.309570971522678,
0.732874401666482))
set2 <- data.frame(type = c(1,
1,
1,
1,
1,
1,
1,
1,
1,
2,
2,
2,
2,
2,
2,
2,
2,
2,
3,
3,
3,
3,
3,
3,
3,
3,
3),
flowRate = c(24,
24,
24,
45,
45,
45,
58,
58,
58,
24,
24,
24,
45,
45,
45,
58,
58,
58,
24,
24,
24,
45,
45,
45,
58,
58,
58),
speed = c(0.489966876244169,
0.535542121502899,
0.265940150225231,
0.399521957817437,
0.0831661276630631,
0.302201301891001,
0.78194419406759,
0.202331797255324,
0.192182716686147,
0.163038660094618,
0.658020173938572,
0.735633308902771,
0.480982144690572,
0.749452781972296,
0.491759702396918,
0.459610541236644,
0.397660083986082,
0.939983924945833,
0.128956722185581,
0.998492083119223,
0.440514184126494,
0.242917958355044,
0.350643319960552,
0.02613674288471,
0.71625407018877,
0.589325978787179,
0.649116781211748))
Code:
#Standard error of the mean function
sem <- function(x) sd(x)/sqrt(length(x))
#Aggregate dataframes, mean and Standard Error
mean_set1 <- aggregate(set1, by=list(set1$flowRate, set1$speed), mean)
mean_set1 <- select(mean_set1, -Group.1, -Group.2)
mean_set1 <- arrange(mean_set1, type, flowRate)
sem_set1 <- aggregate(set1, by=list(set1$flowRate, set1$speed), sem)
sem_set1 <- as.data.frame(sem_set1)
sem_set1 <- cbind(mean_set1$type, mean_set1$flowRate, sem_set1$Group.2)
sem_set1 <- as.data.frame(sem_set1)
mean_set2 <- aggregate(set2, by=list(set2$flowRate, set2$speed), mean)
mean_set2 <- select(mean_set2, -Group.1, -Group.2)
mean_set2 <- arrange(mean_set2, type, flowRate)
sem_set2 <- aggregate(set2, by=list(set2$flowRate, set2$speed), sem)
sem_set2 <- as.data.frame(sem_set2)
sem_set2 <- cbind(mean_set2$type, mean_set2$flowRate, sem_set2$Group.2)
sem_set2 <- as.data.frame(sem_set2)
#Graph sets
set1_graph <- ggplot(mean_set1, aes(x=type, y=speed, fill=factor(flowRate)))+
geom_bar(stat="identity",width=0.6, position="dodge", col="black")+
scale_fill_discrete(name="Flow Rate")+
xlab("type")+ylab("Speed")+
geom_errorbar(aes(ymin= mean_set1$speed,ymax=mean_set1$speed+sem_set1$V3), width=0.2, position = position_dodge(0.6))
set2_graph <- ggplot(mean_set2, aes(x=type, y=speed, fill=factor(flowRate)))+
geom_bar(stat="identity",width=0.6, position="dodge", col="black")+
scale_fill_discrete(name="Speed")+
xlab("type")+ylab("Flow Rate")+
geom_errorbar(aes(ymin= mean_set2$speed,ymax=mean_set2$speed+sem_set2$V3), width=0.2, position = position_dodge(0.6))
#Grid.arrange and save image
png("image.png", width = 1000, height = 700)
grid.arrange(set1_graph, set2_graph,nrow=1, ncol=2)
dev.off()

Specialised Boxplot: Plotting Lines to the Error Bars to Highlight the Data Range in R

Overview
I have a data frame called ANOVA.Dataframe.1 (see below) containing the dependent variable called 'Canopy_Index', and the independent variable called 'Urbanisation_index".
My aim is to produce a boxplot (exactly the same as the desired result below) for Canopy Cover (%) for each category of the Urbanisation Index with plotted lines pointing towards both the bottom and top of the error bars to highlight the data range.
I have searched intensively in order to find the code to produce the desired boxplot this (please see the desired result), but I was unsuccessful, and I'm also unsure if these boxplots have a specialised name.
Perhaps this can be achieved in either ggplot or Base R
If anyone can help, I would be deeply appreciative.
Desired Result ( Reference)
I can produce an ordinary boxplot with the R-code below, but I cannot figure out how to implement the lines pointing towards the ends of the error bars.
R-code
Boxplot.obs1.Canopy.Urban<-boxplot(ANOVA.Dataframe.1$Canopy_Index~ANOVA.Dataframe.1$Urbanisation_index,
main="Mean Canopy Index (%) for Categories of the Urbansiation Index",
xlab="Urbanisation Index",
ylab="Canopy Index (%)")
Boxplot produced from R-code
Data frame 1
structure(list(Urbanisation_index = c(2, 2, 4, 4, 3, 3, 4, 4,
4, 2, 4, 3, 4, 4, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2,
2, 2, 2, 4, 4, 3, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 4, 4, 4,
4, 4, 4, 4), Canopy_Index = c(65, 75, 55, 85, 85, 85, 95, 85,
85, 45, 65, 75, 75, 65, 35, 75, 65, 85, 65, 95, 75, 75, 75, 65,
75, 65, 75, 95, 95, 85, 85, 85, 75, 75, 65, 85, 75, 65, 55, 95,
95, 95, 95, 45, 55, 35, 55, 65, 95, 95, 45, 65, 45, 55)), row.names = c(NA,
-54L), class = "data.frame")
Dataframe 2
structure(list(Urbanisation_index = c(2, 2, 4, 4, 3, 3, 4, 4,
4, 3, 4, 4, 4, 4, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2,
2, 2, 2, 4, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 4, 4, 4, 4, 4, 4, 4
), Canopy_Index = c(5, 45, 5, 5, 5, 5, 45, 45, 55, 15, 35, 45,
5, 5, 5, 5, 5, 5, 35, 15, 15, 25, 25, 5, 5, 5, 5, 5, 5, 15, 25,
15, 35, 25, 45, 5, 25, 5, 5, 5, 5, 55, 55, 15, 5, 25, 15, 15,
15, 15)), row.names = c(NA, -50L), class = "data.frame")
Alice, is this what you are looking for?
You can do everything with ggplot2, but for non standard things you have to play with it for a while. My code:
library(tidyverse)
library(wrapr)
df %.>%
ggplot(data = ., aes(
x = Urbanisation_index,
y = Canopy_Index,
group = Urbanisation_index
)) +
stat_boxplot(
geom = 'errorbar',
width = .25
) +
geom_boxplot() +
geom_line(
data = group_by(., Urbanisation_index) %>%
summarise(
bot = min(Canopy_Index),
top = max(Canopy_Index)
) %>%
gather(pos, val, bot:top) %>%
select(
x = Urbanisation_index,
y = val
) %>%
mutate(gr = row_number()) %>%
bind_rows(
tibble(
x = 0,
y = max(.$y) * 1.15,
gr = 1:8
)
),
aes(
x = x,
y = y,
group = gr
)) +
theme_light() +
theme(panel.grid = element_blank()) +
coord_cartesian(
xlim = c(min(.$Urbanisation_index) - .5, max(.$Urbanisation_index) + .5),
ylim = c(min(.$Canopy_Index) * .95, max(.$Canopy_Index) * 1.05)
) +
ylab('Company Index (%)') +
xlab('Urbanisation Index')

Calculating the peak function for group wise(Tag) in a data frame and further Rbinding it into new data frame

I am new at asking questions on Stack, so please pardon me if I get it wrong. Here is the scenario (I have tried to reproduce it with a simple example):
library("pracma")
Tag<- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5,5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6)
Temp<- c(43, 44, 45, 41, 43, 38, 40, 41, 39, 37, 37, 39, 45, 42, 41, 43, 44, 39,38,
37, 43, 44, 45, 41, 43, 38, 40, 41, 39, 37, 37, 39, 45, 42, 41, 43, 44,
39, 38, 37, 43, 44, 45, 41, 43, 38, 40, 41, 39, 37, 37, 39, 45, 42, 41,43,
44, 39, 38, 37)
dfr=data.frame(Tag=Tag,Temp=Temp)
DATA Description - We have two columns:
Tag [group wise variable]
Temp (numerical variable where peak function has to be performed)
for (i in 1:6) {
df=filter(dfr , dfr$Tag == i)
pik =findpeaks(df$Temp, nups = 1, ndowns = 0, zero = "+", peakpat = NULL,
minpeakheight = 33, minpeakdistance = 4,
threshold =0.42, npeaks = 11, sortstr = FALSE)#Peak Function
pik<- as.data.frame(pik)#Converting into data frame as it is in matrix form
names(pik) <- c("Temp","Peak_Mid","Peak_start","Peak_End")# renaming the header
pik <- arrange(pik , Peak_Mid)#Rearranging with Peak_Mid
attach(pik)#attaching pik df
j=1#initializing for loop
s=0#initializing for loop
for (j in 1:nrow(pik))#for loop for calculating slope individual points
s[[j]]=((Temp[j+1]-Temp[j])/(Peak_Mid[j+1]-Peak_Mid[j]))
pik$Trend <- 0#creating new column(Trend) filled with zero
pik$Trend <- s# inserting the calculated s variable onto pik df
w[[i]]=as.data.frame(pik)
}
I was trying to turn the above code into a for loop, such that at every ith value i:e (Tag[i] in our case i ranges from 1 to 6 as per our data). So every time for Tag[i] will compute through peak function and then we will calculate the slope among the points and we will get a new data frame with 4 columns.
This computation will be performed on each Tag[i] which is a subset of main data frame. So, we will get i different data frames, these data frame will be rbind with tag no along with it.
This is visual of input with the expected output:
Using the tidyverselibrary we can do:
result <- dfr %>%
split(.$Tag) %>%
map(~findpeaks(.$Temp, nups = 1, ndowns = 0, zero = "+", peakpat = NULL, minpeakheight = 33, minpeakdistance = 4, threshold = 0.42, npeaks = 11, sortstr = FALSE)) %>%
map_df(~data_frame(Temp = parse_number(.x[,1]),
Peak_Mid = parse_number(.x[,2]),
Peak_start = parse_number(.x[,3]),
Peak_End= parse_number(.x[,4])),
.id = 'Tag') %>%
arrange(Tag, Peak_Mid) %>%
group_by(Tag) %>%
mutate(Trend= (lead(Temp)-Temp)/(lead(Peak_Mid)-Peak_Mid))
This will, in order:
Split the original dataset into a list of datasets, based on the Tag value. (split)
For each dataset in the list, execute the findpeaks function, with the provided arguments, the result is a matrix. (map)
For each matix cast as data.frame, rename. (data_frame)
Reduce to a single data.frame. (map_df)
Arrange in desired order. (arrange)
Compute Trent column. (mutate)
Hope this helps
Update
As of 2021, map_df call should be re-written as:
map_df(~tibble(
Temp = .x[,1],
Peak_mid = .x[,2],
Peak_start = .x[,3],
Peak_End = .x[,4]),
.id = "Tag")

Resources