Fitting function y ~ a*x^b*z^c - r

I have following dataframe:
df <- structure(list(y = c(0.82, 0.77, 0.46, 0.7, 0.82, 0.92, 0.84, 0.88, 0.86, 0.92, 0.91, 0.96, 0.91, 0.92, 0.89, 0.95, 0.95, 0.88, 0.92, 0.88, 0.94, 0.72, 0.9, 0.95, 0.96, 0.92, 0.94, 0.93, 0.93, 0.94, 0.93, 0.89, 0.94, 0.94, 0.91, 0.88, 0.96, 0.91, 0.9, 0.95, 0.83, 0.95, 0.92, 0.91, 0.86, 0.94, 0.93, 0.83, 0.87, 0.76), x = c(0, 0.03, 0.07, 0.1, 2.2, 2.18, 2.33, 2.48, 2.63, 2.77, 2.92, 3.07, 3.22, 3.37, 3.52, 3.66, 3.81, 3.96, 4.11, 4.16, 4.21, 4.26, 4.31, 4.36, 4.41, 4.46, 4.51, 4.55, 4.6, 4.65, 4.7, 4.75, 4.8, 4.85, 4.9, 4.96, 5.01, 5.07, 5.12, 5.18, 5.24, 5.29, 5.35, 5.4, 5.46, 5.51, 5.57, 5.27, 4.98, 4.68), z = c(1.54, 1.48, 1.51, 1.05, 1.29, 0.6, 1.03, 0.95, 0.98, 0.89, 0.81, 0.91, 0.31, 0.69, 0.17, 0.48, 0.51, 0.74, 0.79, 0.77, 0.69, 0.5, 0.75, 0.85, 0.77, 0.7, 0.66, 1.02, 0.69, 0.51, 0.63, 0.45, 0.46, 0.7, 0.74, 0.68, 0.72, 0.84, 0.5, 0.62, 0.32, 0.74, 0.52, 0.65, 1.07, 0.96, 1.03, 1.41, 1.88, 0.83)), row.names = c(1L, 2L, 3L, 4L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L), class = "data.frame")
I need to fit equation y ~ ax^bz^c for the data given above. I tried the code below:
Log transforms the above equation to log(y) ~ log(a) + blog(x) + clog(z), so, one can fit a linear model as below:
x <- log(df$x)
y = log(df$y)
z = log(df$z)
x[!(is.finite(x))] <- NA
y[!(is.finite(y))] = NA
z[!(is.finite(z))] = NA
# Model fitting
m <- lm(y~x+z, na.action = na.exclude)
coeff <- list(a = coef(m)[1], b = coef(m)[2], c = coef(m)[3])
# Prediction
y_pred <- coeff[[1]] + coeff[[2]]*x + coeff[[3]]*z # or predict(m)
CORR_1 <- cor(y,y_pred, use = "pairwise.complete.obs")
Converting back to original scale
x <- df$x
y = df$y
z = df$z
y_pred <- exp(coeff[[1]])*x^coeff[[2]]*z^coeff[[3]]
CORR_2 <- cor(y,y_pred, use = "pairwise.complete.obs")
I was expecting CORR_1 and CORR_2 to be same but their values are different. Why is that so? What is the best way to fit y ~ ax^bz^c?

Related

ggplot: How to color/fill area between ROC curves and diagonal?

I have this ROC curve
Written with this code:
ggplot(a, aes(y = TPR, x = FPR, color = model)) +
geom_line() +
geom_segment(aes(y = 0, yend = 1, x = 0, xend = 1), color = "grey50")
I want to color the space between red and green curve, and the area between the green curve and the diagonal.
I tried to color the expected output manually in free hand (my apologies for the artistic skills)
I sought solutions using geom_area() but could not get it work.
How can I fill these area?
Here is my data sample. My apologies for many datapoints, but that was the only way I could reproduce "the full curves" reaching (0,0) and (1,1).
a <- structure(list(model = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("Null model",
"SSA+", "SSA-"), class = "factor"), risk = c(1, 1, 1, 1, 1, 0.99,
0.99, 0.99, 0.98, 0.98, 0.97, 0.97, 0.97, 0.96, 0.95, 0.95, 0.94,
0.93, 0.92, 0.91, 0.91, 0.91, 0.91, 0.9, 0.89, 0.89, 0.88, 0.87,
0.87, 0.85, 0.85, 0.81, 0.81, 0.8, 0.78, 0.77, 0.76, 0.76, 0.76,
0.76, 0.75, 0.74, 0.72, 0.69, 0.69, 0.69, 0.67, 0.66, 0.65, 0.65,
0.64, 0.63, 0.63, 0.6, 0.59, 0.58, 0.58, 0.57, 0.57, 0.57, 0.53,
0.53, 0.52, 0.5, 0.46, 0.46, 0.46, 0.45, 0.44, 0.42, 0.41, 0.4,
0.4, 0.39, 0.38, 0.37, 0.35, 0.31, 0.29, 0.27, 0.27, 0.26, 0.24,
0.23, 0.2, 0.19, 0.19, 0.18, 0.18, 0.16, 0.15, 0.15, 0.11, 0.11,
0.09, 0.07, 0.06, 0.04, 0.93, 0.92, 0.92, 0.91, 0.91, 0.9, 0.9,
0.9, 0.9, 0.89, 0.86, 0.86, 0.86, 0.86, 0.86, 0.85, 0.85, 0.84,
0.83, 0.82, 0.81, 0.81, 0.81, 0.8, 0.79, 0.78, 0.78, 0.77, 0.77,
0.76, 0.75, 0.74, 0.74, 0.74, 0.73, 0.72, 0.71, 0.7, 0.66, 0.65,
0.65, 0.64, 0.63, 0.61, 0.6, 0.59, 0.56, 0.54, 0.52, 0.51, 0.51,
0.5, 0.47, 0.45, 0.45, 0.43, 0.42, 0.42, 0.38, 0.36, 0.34, 0.32,
0.32, 0.31, 0.3, 0.3, 0.29, 0.28, 0.27, 0.27, 0.26, 0.24, 0.23,
0.18, 0.16, 0.14, 0.13, 0.13, 0.12, 0.09), TPR = c(0.02, 0.03,
0.05, 0.07, 0.08, 0.1, 0.11, 0.13, 0.15, 0.16, 0.18, 0.2, 0.21,
0.23, 0.25, 0.26, 0.28, 0.3, 0.31, 0.33, 0.34, 0.34, 0.36, 0.38,
0.38, 0.39, 0.41, 0.43, 0.44, 0.44, 0.44, 0.46, 0.48, 0.49, 0.49,
0.51, 0.52, 0.54, 0.56, 0.57, 0.59, 0.61, 0.62, 0.62, 0.64, 0.66,
0.67, 0.69, 0.7, 0.72, 0.74, 0.74, 0.75, 0.75, 0.77, 0.77, 0.79,
0.8, 0.8, 0.82, 0.82, 0.82, 0.84, 0.84, 0.84, 0.85, 0.85, 0.87,
0.89, 0.9, 0.92, 0.92, 0.93, 0.93, 0.95, 0.95, 0.95, 0.97, 0.98,
0.98, 0.98, 0.98, 0.98, 0.98, 0.98, 0.98, 0.98, 0.98, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0.03, 0.05, 0.07, 0.08, 0.1, 0.11, 0.11,
0.13, 0.15, 0.15, 0.16, 0.18, 0.21, 0.23, 0.25, 0.25, 0.26, 0.26,
0.28, 0.31, 0.33, 0.33, 0.33, 0.34, 0.38, 0.39, 0.43, 0.49, 0.51,
0.56, 0.59, 0.61, 0.62, 0.66, 0.69, 0.7, 0.7, 0.72, 0.72, 0.74,
0.75, 0.75, 0.77, 0.77, 0.79, 0.79, 0.79, 0.8, 0.82, 0.84, 0.84,
0.85, 0.87, 0.89, 0.89, 0.89, 0.89, 0.9, 0.92, 0.93, 0.93, 0.93,
0.93, 0.93, 0.93, 0.95, 0.98, 0.98, 0.98, 0.98, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1), FPR = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0.03, 0.03, 0.03, 0.05, 0.05, 0.05, 0.05,
0.05, 0.08, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.16, 0.16,
0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18,
0.18, 0.21, 0.21, 0.24, 0.24, 0.26, 0.26, 0.26, 0.29, 0.29, 0.32,
0.34, 0.34, 0.37, 0.39, 0.39, 0.42, 0.42, 0.42, 0.42, 0.42, 0.45,
0.45, 0.47, 0.47, 0.5, 0.53, 0.53, 0.53, 0.55, 0.58, 0.61, 0.63,
0.66, 0.68, 0.71, 0.74, 0.76, 0.76, 0.79, 0.82, 0.84, 0.87, 0.89,
0.92, 0.95, 0.97, 1, 0, 0, 0, 0, 0, 0, 0.03, 0.03, 0.03, 0.05,
0.05, 0.05, 0.05, 0.05, 0.05, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11,
0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18,
0.18, 0.18, 0.21, 0.24, 0.26, 0.26, 0.29, 0.29, 0.29, 0.32, 0.32,
0.34, 0.34, 0.37, 0.39, 0.39, 0.39, 0.39, 0.42, 0.42, 0.45, 0.45,
0.47, 0.5, 0.53, 0.53, 0.53, 0.53, 0.55, 0.58, 0.61, 0.63, 0.66,
0.66, 0.66, 0.71, 0.74, 0.76, 0.76, 0.79, 0.82, 0.84, 0.87, 0.89,
0.92, 0.95, 0.97, 1)), row.names = c(NA, -178L), class = c("data.table",
"data.frame"))
You can use geom_ribbon. The ymax will be TPR, and since the diagonal occurs at TPR = FPR, the ymin will be FPR.
ggplot(a, aes(y = TPR, x = FPR)) +
geom_ribbon(aes(ymin = FPR, ymax = TPR, fill = model)) +
geom_line(aes(group = model), color = "black") +
geom_segment(aes(y = 0, yend = 1, x = 0, xend = 1), color = "grey50") +
scale_fill_manual(values = c("#ba6329", "#5f7c37")) +
coord_equal() +
theme_light(base_size = 16)

Extract and add to the data values of the probability density function based on a stan linear model

Given the sample data sampleDT and models lm.fit and brm.fit below, I would like to:
estimate, extract and add to the data frame the values of the density
function for a conditional normal distribution evaluated at the
observed level of the variable dollar.wage_1.
I can do this using a frequentist linear regression lm.fit and dnorm but my attempt to do the same using a bayesian brm.fit model fails. Therefore, any help would be much appreciated.
##sample data
sampleDT<-structure(list(id = 1:10, N = c(10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L), A = c(62L, 96L, 17L, 41L, 212L, 143L, 143L,
143L, 73L, 73L), B = c(3L, 1L, 0L, 2L, 170L, 21L, 0L, 33L, 62L,
17L), C = c(0.05, 0.01, 0, 0.05, 0.8, 0.15, 0, 0.23, 0.85, 0.23
), employer = c(1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L), F = c(0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L), G = c(1.94, 1.19, 1.16,
1.16, 1.13, 1.13, 1.13, 1.13, 1.12, 1.12), H = c(0.14, 0.24,
0.28, 0.28, 0.21, 0.12, 0.17, 0.07, 0.14, 0.12), dollar.wage_1 = c(1.94,
1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_2 = c(1.93,
1.18, 3.15, 3.15, 1.12, 1.12, 2.12, 1.12, 1.11, 1.11), dollar.wage_3 = c(1.95,
1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.13, 1.13), dollar.wage_4 = c(1.94,
1.18, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_5 = c(1.94,
1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_6 = c(1.94,
1.18, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_7 = c(1.94,
1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_8 = c(1.94,
1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_9 = c(1.94,
1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_10 = c(1.94,
1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12)), row.names = c(NA,
-10L), class = "data.frame")
##frequentist model: this works
lm.fit <-lm(dollar.wage_1 ~ A + B + C + employer + F + G + H,
data=sampleDT)
sampleDT$dens1 <-dnorm(sampleDT$dollar.wage_1,mean=lm.fit$fitted,
sd=summary(lm.fit)$sigma)
##bayesian model: this is my attempt - it does not work
//this works
brm.fit <-brm(dollar.wage_1 ~ A + B + C + employer + F + G + H,
data=sampleDT, iter = 4000, family = gaussian())
//this does not work
sampleDT$dens1_bayes <-dnorm(sampleDT$dollar.wage_1, mean = fitted(brm.fit), sd=summary(brm.fit)$sigma)
Error in dnorm(sampleDT$dollar.wage_1, mean = brm.fit$fitted, sd =
summary(brm.fit)$sigma) : Non-numeric argument to mathematical
function
Thanks in advance for any help.
We have that now fitted(brm.fit) is a matrix, so we want to use only its first column - that of estimates. Also, as there is no reason for the object structure to be the same, summary(brm.fit)$sigma gives nothing. Instead you want summary(brm.fit)$spec_pars[1]. Hence, you may use
sampleDT$dens1_bayes <- dnorm(sampleDT$dollar.wage_1,
mean = fitted(brm.fit)[, 1],
sd = summary(brm.fit)$spec_pars[1])

R scatterplot loop using two variables

I'm trying to make a for loop to automatically generate ~50 scatterplots comparing two sets of data. It's a quality control analysis, so I'm looking at geochemical values that were analyzed twice (duplicates). So I have a list of 53 elements (periodic table elements) labeled Al1, Ag1, Au1..... and another list of 53 labeled Al2, Ag2....etc.
I've successfully gotten my loop to work for generating graphs that only need one variable, with the x axis being fixed, like below.
for(i in colNames){
plt <- ggplot(YGS_Dupes, mapping = aes_string(x=Dup_Num, y = i)) +
geom_bar() + theme_calc() + ggtitle(paste(i, "Duplicate Comparison", sep=" -
"))
print(plt)
ggsave(paste0(i,".png"))
Sys.sleep(2)
}
I set colNames to be the element columns, and the function runs through the different elements and generates a bar plot for each, where it's just showing Sample 1 or Sample 2 as the X axis (so it produces two bar plots side by side).
What I need to make now is a scatterplot where I compare the data from Al1 to Al2 or Fe1 to Fe2, so I need the for loop to run using two parallel sets of changing variables. I made the function for a single graph like so:
ggplot(YGS_Dup_Scatter, mapping = aes(x = Fe_pct1, y =
Fe_pct2))+geom_point()
and it looks like this:
Fe vs Fe Scatterplot
So what I have done is made a similar set of colNames groups, like below:
colNames_scatter_dup <- names(YGS_Dup_Scatter)[4:56]
colNames_scatter_dup2 <- names(YGS_Dup_Scatter)[57:109]
Where 4-56 are all the element 1 set and 57-109 are the element 2 set. They are ordered the same so I want 4/57, 5/58....etc to be pairs.
How do I set up my for loop equation to do this?
Thank you for any help
Edit: Adding the dput data for people to try. I had too many observations and variables so I cut most of them out:
Edit 2: Ok, so I made a nested loop and it makes what I want, but it also makes way too many graphs, shown below:
for (j in colNames_scatter_dup2) {
for(i in colNames_scatter_dup){
plt <- ggplot(YGS_Dup_Scatter, mapping = aes_string(x=j, y = i)) +
geom_point()
print(plt)
ggsave(paste0(i,".png"))
Sys.sleep(2)
}
}
The issue I have now is that it does Al1 vs Al2, then Ag1 vs Al2, ......then gets to Al1 vs Ag2.....and make hundreds of graphs. I only want to make the actual 53 element pairs, and I can't figure out how to restrict it to just those.
thanks
structure(list(DUP_COMP_ID = structure(c(1L, 12L, 23L, 34L, 45L,
56L, 67L, 78L, 89L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 24L, 25L, 26L,
27L, 28L, 29L, 30L, 31L, 32L, 33L, 35L, 36L, 37L, 38L, 39L, 40L,
41L, 42L, 43L, 44L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L,
55L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 68L, 69L,
70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 79L, 80L, 81L, 82L, 83L,
84L, 85L, 86L, 87L, 88L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L,
98L, 99L), .Label = c("DCI_1", "DCI_10", "DCI_11", "DCI_12",
"DCI_13", "DCI_14", "DCI_15", "DCI_16", "DCI_17", "DCI_18", "DCI_19",
"DCI_2", "DCI_20", "DCI_21", "DCI_22", "DCI_23", "DCI_24", "DCI_25",
"DCI_26", "DCI_27", "DCI_28", "DCI_29", "DCI_3", "DCI_30", "DCI_31",
"DCI_32", "DCI_33", "DCI_34", "DCI_35", "DCI_36", "DCI_37", "DCI_38",
"DCI_39", "DCI_4", "DCI_40", "DCI_41", "DCI_42", "DCI_43", "DCI_44",
"DCI_45", "DCI_46", "DCI_47", "DCI_48", "DCI_49", "DCI_5", "DCI_50",
"DCI_51", "DCI_52", "DCI_53", "DCI_54", "DCI_55", "DCI_56", "DCI_57",
"DCI_58", "DCI_59", "DCI_6", "DCI_60", "DCI_61", "DCI_62", "DCI_63",
"DCI_64", "DCI_65", "DCI_66", "DCI_67", "DCI_68", "DCI_69", "DCI_7",
"DCI_70", "DCI_71", "DCI_72", "DCI_73", "DCI_74", "DCI_75", "DCI_76",
"DCI_77", "DCI_78", "DCI_79", "DCI_8", "DCI_80", "DCI_81", "DCI_82",
"DCI_83", "DCI_84", "DCI_85", "DCI_86", "DCI_87", "DCI_88", "DCI_89",
"DCI_9", "DCI_90", "DCI_91", "DCI_92", "DCI_93", "DCI_94", "DCI_95",
"DCI_96", "DCI_97", "DCI_98", "DCI_99"), class = "factor"), Dup_Code = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "Sample 1", class = "factor"), Dup_Code.1 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "Sample 2", class = "factor"), Ag_ppb1 = c(56L,
58L, 52L, 59L, 68L, 318L, 50L, 70L, 398L, 114L, 38L, 52L, 63L,
64L, 65L, 81L, 66L, 62L, 86L, 146L, 67L, 70L, 49L, 69L, 74L,
55L, 55L, 47L, 109L, 41L, 78L, 115L, 65L, 373L, 59L, 47L, 85L,
72L, 86L, 72L, 77L, 554L, 68L, 85L, 105L, 70L, 67L, 127L, 69L,
67L, 38L, 59L, 284L, 94L, 57L, NA, 92L, 88L, 74L, 73L, 50L, NA,
63L, 57L, 111L, 71L, 47L, 69L, 81L, 45L, 52L, 42L, 34L, 176L,
73L, 140L, 87L, 41L, 36L, 204L, 272L, 52L, 37L, 45L, 187L, 180L,
100L, 60L, 39L, 71L, 92L, 29L, 308L, 157L, 78L, 91L, NA, 60L,
217L), As_ppm1 = c(4.3, 4.8, 4.6, 5, 1.9, 14.3, 3, 5.8, 49.7,
9.2, 3.8, 3.1, 5.9, 5.4, 5, 4.3, 5.3, 4.2, 3.8, 35, 5.8, 6.6,
3.3, 11.2, 3.5, 3.8, 3.8, 4.4, 8.8, 4.9, 3.6, 18.3, 3.6, 6.1,
4.2, 4.4, 9, 7.3, 3.7, 3.4, 13.7, 21.9, 3.9, 5.8, 3.6, 4.4, 2.9,
5.2, 4.9, 5.4, 4.4, 4.3, 5.5, 8.3, 3.4, NA, 6.2, 4.2, 3.5, 5.5,
5, NA, 3.4, 4.2, 7.1, 5.1, 3.8, 6.9, 6.7, 3.2, 4.8, 4.3, 2.6,
4.6, 4.8, 9.3, 7.5, 2.8, 4.2, 4.9, 17, 3.1, 3.9, 4.7, 9.7, 883.2,
7.8, 5.1, 2.4, 10.4, 7.2, 2.9, 6.7, 9.3, 3.7, 7.3, NA, 4.8, 21.5
), Au_ppb1 = c(0.7, 4.6, 1.5, 0.6, 11.9, 2.4, 0.8, 0.8, 2.2,
3.5, 0.4, 0.8, 0.9, 1.7, 1.2, 3.5, 1.4, 1.4, 2.2, 2.6, 3, 0.9,
0.6, 1.5, 0.9, 0.7, 1.4, 3.5, 8.7, 0.4, 0.6, 2.4, 1.1, 1.7, 1.5,
1.3, 0.1, 0.1, 4.5, 44.5, 0.8, 6.6, 48.7, 1.5, 0.7, 0.3, 0.8,
1.1, 1.2, 5.5, 1.4, 1.4, 2.7, 1.9, 1, NA, 0.4, 1, 1.6, 0.3, 0.4,
NA, 0.8, 1.8, 1.9, 0.1, 0.5, 1.4, 0.8, 0.2, 0.8, 0.6, 0.3, 1.1,
1, 2.1, 0.8, 0.4, 0.9, 0.9, 1.2, 1.2, 1.2, 1.3, 1.2, 1.6, 1.8,
0.5, 1.4, 1.3, 1.4, 0.1, 0.6, 1.9, 0.8, 1.5, NA, 0.6, 3.4), B_ppm1 = c(10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 21L, NA, 10L, 10L, 10L, 10L, 10L, NA, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, NA, 10L, 10L), Ba_ppm1 = c(141, 124.2, 171.9,
171, 246.8, 359.3, 96, 205.4, 187.4, 195.3, 115.2, 134.9, 162.9,
156.9, 186.7, 148.4, 164.9, 165.5, 329.1, 106.8, 137.3, 150.7,
180.9, 123.4, 150.6, 122.7, 230.4, 176.1, 208.9, 154.5, 147.2,
242.2, 184.2, 465.5, 217.2, 171.3, 286.6, 248, 243.1, 265.9,
273.3, 317.4, 150.7, 272.7, 332.1, 293.1, 185.7, 262.9, 203.4,
333, 185.2, 203.4, 300.8, 227.3, 193.2, NA, 328, 293.2, 225.7,
286.9, 237.6, NA, 193.5, 293.8, 294.5, 252.2, 160.5, 277, 349.2,
184.5, 231.3, 251.4, 150, 372.4, 237.7, 227.9, 271.8, 66.6, 92.8,
53.4, 112.5, 172.6, 188.5, 177, 315.5, 193.8, 300.2, 132.9, 199.4,
221.4, 375.6, 128.7, 82.7, 157.4, 175.5, 297.9, NA, 190.9, 206.4
), Be_ppm1 = c(0.3, 0.5, 0.2, 0.2, 0.2, 0.2, 0.3, 0.3, 0.6, 0.3,
0.4, 0.4, 0.3, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0.3, 0.9, 0.4, 0.6,
0.3, 0.5, 0.3, 0.3, 0.2, 0.3, 0.3, 0.4, 0.6, 0.3, 0.2, 0.3, 0.3,
0.3, 0.2, 0.6, 0.4, 0.4, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.4,
0.2, 0.3, 0.3, 0.2, 0.3, 0.3, NA, 0.3, 0.05, 0.3, 0.3, 0.2, NA,
0.3, 0.5, 0.3, 0.5, 0.3, 0.3, 0.3, 0.3, 0.2, 0.3, 0.2, 0.4, 0.3,
0.5, 0.4, 0.2, 0.1, 1.8, 1.8, 0.4, 0.2, 0.2, 0.8, 35.9, 0.3,
0.4, 0.2, 0.4, 0.2, 0.4, 0.2, 0.4, 0.3, 0.4, NA, 0.4, 1.2), Bi_ppm1 = c(0.24,
0.29, 0.21, 0.19, 0.13, 0.28, 0.15, 0.16, 0.73, 0.14, 0.12, 0.39,
0.1, 0.12, 0.4, 0.42, 0.13, 0.13, 0.11, 6.67, 0.14, 0.22, 0.15,
0.18, 0.09, 0.06, 0.09, 0.1, 0.18, 0.08, 0.08, 0.14, 0.06, 0.23,
0.1, 0.09, 0.08, 0.14, 0.13, 0.06, 0.08, 0.13, 0.08, 0.15, 0.11,
0.1, 0.07, 0.11, 0.1, 0.06, 0.11, 0.08, 0.11, 0.11, 0.08, NA,
0.12, 0.22, 0.1, 0.13, 0.08, NA, 0.06, 0.18, 0.13, 0.1, 0.16,
0.15, 0.13, 0.07, 0.09, 0.08, 0.06, 0.14, 0.07, 0.21, 0.17, 0.01,
0.05, 2.07, 0.35, 0.13, 0.08, 0.09, 0.23, 0.55, 0.17, 1.1, 0.06,
0.07, 0.14, 0.04, 0.06, 0.15, 0.08, 0.12, NA, 0.09, 0.97), Ca_pct1 = c(0.69,
0.58, 0.46, 0.46, 0.42, 0.41, 0.51, 0.5, 0.6, 0.83, 0.42, 0.34,
0.69, 0.98, 0.51, 0.43, 0.78, 0.44, 0.38, 0.56, 1.07, 0.46, 0.72,
0.77, 1.08, 0.64, 0.46, 0.57, 0.5, 0.5, 0.88, 0.65, 0.67, 0.28,
0.75, 0.59, 0.49, 0.72, 0.31, 0.42, 0.71, 0.14, 0.42, 0.69, 0.29,
0.39, 0.31, 0.94, 0.7, 0.47, 0.71, 0.38, 0.31, 0.5, 0.47, NA,
0.47, 0.37, 0.67, 0.68, 0.32, NA, 0.64, 0.31, 0.83, 0.52, 0.33,
0.71, 0.91, 0.49, 0.58, 0.35, 0.34, 0.5, 0.54, 0.92, 0.4, 3.74,
1.69, 0.21, 0.4, 0.45, 0.66, 0.49, 0.56, 0.88, 0.41, 0.41, 0.31,
0.53, 0.96, 1.13, 0.35, 0.58, 0.33, 0.56, NA, 0.68, 0.32), Cd_ppm1 = c(0.13,
0.22, 0.12, 0.15, 0.09, 0.99, 0.13, 0.19, 0.88, 0.34, 0.1, 0.15,
0.17, 0.16, 0.14, 0.2, 0.14, 0.11, 0.15, 0.2, 0.14, 0.17, 0.1,
0.17, 0.18, 0.13, 0.11, 0.13, 0.2, 0.12, 0.13, 0.27, 0.13, 0.37,
0.21, 0.12, 0.18, 0.08, 0.14, 0.11, 0.15, 0.41, 0.19, 0.3, 0.23,
0.15, 0.1, 0.34, 0.13, 0.13, 0.09, 0.15, 0.25, 0.17, 0.12, NA,
0.17, 0.22, 0.14, 0.21, 0.11, NA, 0.1, 0.16, 0.27, 0.19, 0.13,
0.22, 0.26, 0.05, 0.17, 0.15, 0.1, 0.39, 0.16, 0.47, 0.21, 0.17,
0.14, 0.59, 1.11, 0.12, 0.13, 0.1, 0.63, 0.47, 0.33, 0.2, 0.11,
0.26, 0.28, 0.11, 0.1, 0.55, 0.37, 0.29, NA, 0.18, 0.82), Ag_ppb2 = c(59L,
73L, 69L, 75L, 85L, 319L, 43L, 73L, 405L, 121L, 33L, 45L, 71L,
67L, 67L, 80L, 50L, 45L, 68L, 140L, 56L, 69L, 51L, 71L, 79L,
51L, 36L, 52L, 93L, 31L, 98L, 134L, 67L, 386L, 47L, 46L, 90L,
63L, 86L, 54L, 59L, 478L, 61L, 114L, 108L, 74L, 72L, 147L, 60L,
74L, 40L, 56L, 256L, 112L, 62L, 87L, 71L, 104L, 109L, 55L, 45L,
84L, 69L, 63L, 107L, 70L, 57L, 73L, 100L, 45L, 43L, 36L, 39L,
161L, 108L, 100L, 93L, 32L, 45L, 187L, 267L, 68L, 37L, 57L, 228L,
74L, 69L, 47L, 65L, 101L, 33L, 32L, 139L, 77L, 78L, NA, 59L,
214L, 410L), As_ppm2 = c(3.9, 3.8, 4.4, 5.4, 1.7, 14.4, 3.1,
5.9, 52.3, 9.7, 3.5, 2.7, 6.7, 5.2, 5, 4.3, 4.8, 4, 3.9, 31.9,
5.3, 6.5, 3.6, 10.4, 3.5, 3.9, 3.6, 4.3, 8.9, 5.3, 3.8, 16.7,
3.7, 6.1, 3.7, 4, 9.6, 6.4, 4, 3.1, 13.2, 22.1, 4.3, 6.9, 3.6,
4.9, 3.4, 4.8, 4.1, 4.8, 4.2, 3.8, 5.3, 9.2, 3.3, 12.5, 5.3,
4.4, 4.8, 5.7, 5, 5.5, 3.4, 4.4, 6.5, 4.8, 4, 6.5, 6.2, 3.4,
4.5, 3.8, 2.6, 4.7, 8, 8.5, 7.6, 2.6, 4.7, 5.2, 15.8, 4, 3.1,
5.3, 343.7, 7.4, 5.1, 3, 11, 7.3, 3, 6.8, 21.1, 4.1, 9.1, NA,
4.4, 21, 122.1), Au_ppb2 = c(0.9, 1.6, 0.1, 1.3, 0.7, 1.8, 0.6,
0.8, 1.6, 2.7, 0.4, 0.9, 0.9, 1.8, 1.5, 1.6, 1.5, 0.9, 2, 1.3,
0.3, 3, 0.8, 2.5, 1.5, 0.4, 1.2, 1.4, 1, 1.1, 0.4, 113.3, 0.6,
2.2, 1.9, 0.7, 0.5, 0.1, 1.8, 0.9, 1.4, 4.3, 1.6, 0.8, 0.7, 0.9,
0.6, 2.4, 5.6, 1.2, 0.9, 1.1, 2.1, 1.1, 0.9, 0.8, 0.9, 1, 4,
0.3, 1.5, 0.5, 1.2, 1, 1.5, 0.1, 1.2, 19.8, 32.8, 0.1, 0.7, 0.7,
1, 0.5, 2.3, 1.6, 1.6, 0.6, 0.9, 1.7, 1.9, 1.3, 1.1, 1.1, 0.9,
4.8, 0.5, 0.4, 1.6, 1, 0.1, 0.9, 1.3, 0.8, 2.7, NA, 0.8, 4, 3.6
), B_ppm2 = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 22L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 23L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, NA, 10L, 10L, 10L), Ba_ppm2 = c(137.5,
128, 175, 205.6, 262.7, 356.1, 91.2, 212.8, 207, 217.4, 111,
132.4, 179.4, 139.8, 188.9, 164.4, 136, 158.7, 348.9, 96.6, 141.3,
143.7, 187, 121.2, 166.9, 131, 235.9, 189.5, 201.4, 158.7, 148.3,
227, 190, 415.9, 197.2, 178, 268, 221.1, 251.5, 243.3, 260.4,
310, 165.8, 308.2, 342.8, 317, 185, 241.7, 189.2, 291.4, 199.4,
214.7, 312.2, 273, 197.8, 265, 255, 315.2, 281.7, 326, 236.5,
229.7, 197.8, 308.4, 277.2, 258.7, 185.7, 261.2, 354.7, 177.7,
213.2, 226.7, 159.2, 369.5, 359.1, 224.9, 275.4, 54, 106.7, 53.4,
100.9, 194.7, 188.4, 187.4, 162.9, 237.7, 146.9, 189, 214.9,
368.1, 134.8, 82.4, 130.4, 187.8, 291.2, NA, 171.9, 209.5, 318.5
), Be_ppm2 = c(0.2, 0.3, 0.4, 0.3, 0.3, 0.4, 0.1, 0.3, 0.6, 0.4,
0.4, 0.5, 0.4, 0.3, 0.5, 0.7, 0.3, 0.3, 0.2, 0.4, 0.7, 0.4, 0.4,
0.3, 0.4, 0.2, 0.3, 0.3, 0.5, 0.6, 0.5, 0.4, 0.3, 0.3, 0.3, 0.2,
0.2, 0.2, 0.5, 0.2, 0.3, 0.3, 0.2, 0.4, 0.3, 0.2, 0.2, 0.2, 0.3,
0.2, 0.3, 0.2, 0.3, 0.5, 0.3, 0.4, 0.3, 0.3, 0.2, 0.3, 0.1, 0.5,
0.2, 0.6, 0.3, 0.4, 0.4, 0.2, 0.4, 0.3, 0.3, 0.2, 0.2, 0.3, 0.5,
0.3, 0.3, 0.2, 0.2, 1.6, 1.8, 0.5, 0.2, 0.6, 33.1, 0.1, 0.6,
0.05, 0.2, 0.3, 0.7, 0.2, 1.5, 0.3, 0.3, NA, 0.3, 1.2, 1.4),
Bi_ppm2 = c(0.23, 0.28, 0.23, 0.21, 0.12, 0.26, 0.14, 0.16,
0.69, 0.16, 0.12, 0.34, 0.11, 0.11, 0.41, 0.36, 0.12, 0.11,
0.11, 2.86, 0.14, 0.23, 0.19, 0.18, 0.1, 0.05, 0.08, 0.11,
0.15, 0.08, 0.09, 0.15, 0.06, 0.24, 0.08, 0.09, 0.09, 0.12,
0.14, 0.07, 0.07, 0.12, 0.09, 0.18, 0.1, 0.1, 0.09, 0.09,
0.11, 0.06, 0.1, 0.07, 0.1, 0.12, 0.08, 0.09, 0.1, 0.2, 0.09,
0.1, 0.09, 0.17, 0.06, 0.15, 0.12, 0.1, 0.17, 0.13, 0.12,
0.05, 0.08, 0.08, 0.07, 0.17, 0.12, 0.21, 0.17, 0.01, 0.05,
1.93, 0.33, 0.15, 0.05, 0.08, 0.68, 0.12, 0.3, 0.06, 0.06,
0.14, 0.05, 0.08, 0.4, 0.09, 0.12, NA, 0.07, 0.98, 2.21),
Ca_pct2 = c(0.6, 0.56, 0.48, 0.53, 0.4, 0.41, 0.47, 0.51,
0.58, 0.86, 0.41, 0.33, 0.7, 0.9, 0.51, 0.45, 0.67, 0.44,
0.39, 0.56, 1.05, 0.48, 1.21, 0.83, 1.1, 0.66, 0.45, 0.62,
0.5, 0.47, 1.04, 0.66, 0.64, 0.3, 0.74, 0.58, 0.49, 0.65,
0.31, 0.42, 0.62, 0.13, 0.42, 0.84, 0.29, 0.4, 0.32, 1.01,
0.6, 0.46, 0.71, 0.41, 0.3, 0.58, 0.5, 1.02, 0.4, 0.39, 0.87,
0.79, 0.34, 0.44, 0.67, 0.31, 0.79, 0.47, 0.33, 0.67, 0.86,
0.5, 0.49, 0.29, 0.35, 0.5, 0.87, 0.8, 0.39, 3.36, 1.78,
0.22, 0.36, 0.5, 0.57, 0.53, 0.58, 0.37, 0.43, 0.3, 0.46,
1.03, 1.12, 0.36, 0.48, 0.38, 0.52, NA, 0.52, 0.33, 1.21),
Cd_ppm2 = c(0.13, 0.19, 0.12, 0.15, 0.1, 0.97, 0.1, 0.21,
0.92, 0.35, 0.1, 0.09, 0.16, 0.18, 0.16, 0.17, 0.11, 0.11,
0.2, 0.16, 0.11, 0.16, 0.13, 0.17, 0.2, 0.13, 0.14, 0.15,
0.25, 0.05, 0.18, 0.28, 0.09, 0.3, 0.22, 0.09, 0.18, 0.12,
0.1, 0.1, 0.15, 0.3, 0.17, 0.33, 0.2, 0.15, 0.1, 0.59, 0.16,
0.16, 0.1, 0.13, 0.24, 0.21, 0.11, 0.46, 0.12, 0.24, 0.23,
0.17, 0.11, 0.22, 0.13, 0.18, 0.24, 0.16, 0.17, 0.18, 0.23,
0.09, 0.12, 0.1, 0.1, 0.35, 0.37, 0.43, 0.24, 0.16, 0.17,
0.62, 1, 0.13, 0.12, 0.11, 0.56, 0.23, 0.22, 0.15, 0.23,
0.28, 0.12, 0.1, 0.97, 0.36, 0.3, NA, 0.19, 0.89, 3.59)), class = "data.frame", row.names = c(NA,
-99L))
Consider Map (wrapper to mapply) which is the iteration function to run elementwise between equal length lists and saves output into a list. Doing so, you avoid extraneous looping as seen with nested for loops approach.
# EXTRACT NEEDED NAMES
samples1 <- names(YGS_Dupes)[grep("1$", names(YGS_Dupes))][-1] # -1 TO REMOVE Dupe_Code.1
samples2 <- names(YGS_Dupes)[grep("2$", names(YGS_Dupes))]
# SET UP LOOPING FUNCTION
plot_fct <- function(s1, s2) {
s_title <- gsub("1", "", s1)
p <- ggplot(YGS_Dupes, aes_string(x=s1, y=s2)) + geom_point(color="#0072B2") +
ggtitle(paste(s_title, "Duplicate Comparison", sep=" - ")) +
theme(plot.title = element_text(hjust = 0.5), legend.position="top",
axis.text.x = element_text(angle = 90, hjust = 1, vjust=0.5))
ggsave(paste0(s_title,".png"))
return(p)
}
# BUILD LIST LOOPING ELEMENTWISE
plot_list2 <- Map(plot_fct, samples1, samples2)
# OUTPUT PLOTS BY NAME
plot_list2$Ag_ppb1
plot_list2$As_ppm1
plot_list2$Au_ppb1
Output (first three plots)
As a general solution to plot scatterplots using for loop, you can use following flow.
Step 1: Create a plotting function
In the following code, I have explicitly provided my dataframe and the x-axis target variable. For the variable on the y-axis I pass the column number in the function so that it would be to run a for-loop later.
sct_plot_function <- function(dataset = car.c2.num, target_x = car.c2.num$price, target_y_num){
ggplot(dataset, aes(x = target_x, y = car.c2.num[,target_y_num])) +
geom_point() +
geom_smooth(level = 0.95) +
theme_bw() +
labs(title = paste("Scatter plot of Price Vs ", colnames(car.c2.num)[target_y_num]), y = colnames(car.c2.num)[target_y_num], x = "Price") +
theme(plot.title = element_text(hjust = 0.5))
}
Step 2: Use a for loop to plot multiple scatter plots in one go.
Using dim(car.c2.num)[2] - 1 to extract the number of columns minus from the dataframe and loop it using i in 1:(dim(car.c2.num)[2] - 1)
The reason I have done is that the 14 variable for me is the target variable which is fixed for the x-axis.
for(i in 1:(dim(car.c2.num)[2] - 1) ){
plot(sct_plot_function(target_y_num = i))
}
you can use this as a basic structure to re-define for your multiple x and y axes.
Can further use nested for loop if you are aiming to plot all variable combinations on x and y axes respectively.
Sample Image:
Scatter plot of Price Vs compression_ratio for Automobile dataset UCI
Try this:
For (i in 1:length(colNames_scatter_dup)){
print(ggplot(YGS_Dup_Scatter, mapping = aes(x = YGS_Dup_Scatter[,names(YGS_Dup_Scatter) %in% colNames_scatter_dup[i]], y = YGS_Dup_Scatter[,names(YGS_Dup_Scatter) %in% colNames_scatter_dup2[i]]))+geom_point())
}
ok Parfait, thank you for helping, discussing your answer with a colleague got me to where I needed to be.
The final result was the following:
YGS_Dup_Scatter = read.csv(file.choose(), header=TRUE, sep=",")
colNames_scatter_dup <- names(YGS_Dup_Scatter)[4:56]
colNames_scatter_dup2 <- names(YGS_Dup_Scatter)[57:109]
for (j in 1:length(colNames_scatter_dup)) {
plt <- ggplot(YGS_Dup_Scatter, mapping = aes_string(x=colNames_scatter_dup[j], y =colNames_scatter_dup2[j])) +
geom_point() + theme_calc() + ggtitle(paste(colNames_scatter_dup[j], "Duplicate Comparison", sep=" - ")) + theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 16), axis.text.x = element_text(face = "bold", size = "14"), axis.text.y = element_text(face = "bold", size = "12"), plot.margin = margin(10, 30, 2, 2), axis.title.y=element_text(face = "bold", size = "14"), plot.background = element_rect(fill = "lightskyblue2"))
print(plt)
ggsave(paste0(i,".png"))
Sys.sleep(2)
}
The key was using the length function and structuring my columns so that it went A1, A2...A53, then B1, B2....etc.
Being the same length allowed the length function to keep them paired.
Thanks for the help everyone!

How to make a better surface 3dplot with discontinuities in data in R?

I would like to make a surface 3d plot of the data below in R. I already made some figures. However, there are some discontinuities and this effects the plots. Do you have any idea how to make 2 surfaces of it or make it look nice? Thank you! :)
library(scatterplot3d)
library(rgl)
library(car)
x<- b[1:11,2]
y<-seq(-1,1,length.out=21)
z<-matrix(data=b$A,nrow=11,ncol=21)
persp(x,y,z,theta=30, phi=30, col="lightblue",expand = 0.5,shade = 0.2,xlab="c", ylab="h", zlab="A")
scatterplot3d(b$c,b$h,b$A)
surfaceplot
scatterplot
structure(list(h = c(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9,
-0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8,
-0.8, -0.7, -0.7, -0.7, -0.7, -0.7, -0.7, -0.7, -0.7, -0.7, -0.7,
-0.7, -0.6, -0.6, -0.6, -0.6, -0.6, -0.6, -0.6, -0.6, -0.6, -0.6,
-0.6, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5, -0.5,
-0.5, -0.4, -0.4, -0.4, -0.4, -0.4, -0.4, -0.4, -0.4, -0.4, -0.4,
-0.4, -0.3, -0.3, -0.3, -0.3, -0.3, -0.3, -0.3, -0.3, -0.3, -0.3,
-0.3, -0.2, -0.2, -0.2, -0.2, -0.2, -0.2, -0.2, -0.2, -0.2, -0.2,
-0.2, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1, -0.1,
-0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0.1, 0.1, 0.1, 0.1,
0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2,
0.2, 0.2, 0.2, 0.2, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3,
0.3, 0.3, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6,
0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.7, 0.7, 0.7, 0.7,
0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8,
0.8, 0.8, 0.8, 0.8, 0.8, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9,
0.9, 0.9, 0.9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), c = c(0, 0.1,
0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0, 0.1, 0.2, 0.3,
0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0, 0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7,
0.8, 0.9, 1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9,
1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0, 0.1,
0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0, 0.1, 0.2, 0.3,
0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0, 0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7,
0.8, 0.9, 1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9,
1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0, 0.1,
0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0, 0.1, 0.2, 0.3,
0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0, 0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7,
0.8, 0.9, 1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9,
1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0, 0.1,
0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0, 0.1, 0.2, 0.3,
0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0, 0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 1), A = c(-0.75, -0.81, -0.87, -0.92, -0.915,
-0.92, -0.995, -0.985, -0.985, -0.99, -0.99, -0.72, -0.76, -0.82,
-0.9, -0.92, -0.97, -0.975, -0.98, -0.99, -1, -1, -0.685, -0.775,
-0.8, -0.87, -0.89, -0.965, -0.945, -0.98, -0.975, -0.995, -1,
-0.6, -0.695, -0.735, -0.76, -0.89, -0.89, -0.94, -0.93, -0.995,
-0.99, -0.99, -0.52, -0.595, -0.745, -0.785, -0.885, -0.945,
-0.935, -0.975, -0.98, -1, -0.99, -0.47, -0.525, -0.645, -0.76,
-0.785, -0.845, -0.91, -0.925, -0.965, -0.985, 0.76, -0.405,
-0.42, -0.53, -0.68, -0.775, -0.86, -0.915, -0.945, -0.975, -0.97,
-1, -0.21, -0.445, -0.465, -0.615, -0.665, -0.785, -0.875, -0.895,
-0.915, -0.97, -0.98, -0.175, -0.415, -0.365, -0.355, -0.655,
-0.68, -0.815, -0.86, -0.955, -0.945, 0.91, -0.155, -0.16, -0.145,
-0.215, -0.36, -0.675, -0.745, 0.76, 0.835, -0.93, 0.935, 0.105,
-0.055, 0, 0.03, -0.02, -0.075, -0.58, 0.805, -0.845, 0.95, -0.965,
0.085, 0.19, 0.135, 0.22, 0.42, 0.555, 0.755, 0.875, -0.87, 0.93,
-0.955, 0.265, 0.345, 0.29, 0.535, 0.535, 0.69, 0.855, 0.875,
0.915, 0.97, 0.975, 0.31, 0.37, 0.46, 0.595, 0.725, 0.825, 0.875,
0.92, 0.94, -0.82, 0.98, 0.415, 0.46, 0.495, 0.66, 0.745, 0.81,
0.92, 0.955, 0.955, 0.975, 0.985, 0.405, 0.575, 0.58, 0.7, 0.775,
0.89, 0.91, 0.955, 0.98, 0.985, 0.985, 0.515, 0.61, 0.73, 0.795,
0.865, 0.925, 0.935, 0.955, 0.95, 0.965, 0.995, 0.625, 0.7, 0.745,
0.845, 0.89, 0.94, 0.94, 0.96, 0.97, 0.98, 0.985, 0.6, 0.72,
0.745, 0.835, 0.925, 0.91, 0.94, 0.99, 0.98, 0.98, 0.99, 0.8,
0.705, 0.82, 0.91, 0.93, 0.945, 0.985, 0.98, 0.99, 0.975, 0.99,
0.805, 0.815, 0.885, 0.91, 0.935, 0.945, 0.98, 0.98, 0.985, 0.995,
1)), .Names = c("h", "c", "A"), row.names = c(15L, 23L, 43L,
63L, 82L, 104L, 126L, 148L, 168L, 192L, 212L, 4L, 34L, 46L, 66L,
87L, 114L, 128L, 147L, 169L, 190L, 211L, 1L, 22L, 47L, 71L, 86L,
107L, 129L, 149L, 174L, 195L, 213L, 7L, 24L, 42L, 65L, 88L, 108L,
131L, 151L, 171L, 191L, 214L, 2L, 25L, 45L, 64L, 93L, 109L, 132L,
152L, 172L, 193L, 217L, 5L, 27L, 44L, 69L, 89L, 113L, 130L, 150L,
175L, 194L, 216L, 6L, 26L, 48L, 68L, 91L, 110L, 135L, 154L, 173L,
196L, 215L, 3L, 29L, 55L, 67L, 99L, 112L, 133L, 153L, 176L, 198L,
218L, 9L, 28L, 53L, 72L, 94L, 111L, 139L, 155L, 178L, 197L, 219L,
8L, 30L, 50L, 74L, 90L, 116L, 134L, 157L, 184L, 199L, 220L, 10L,
32L, 54L, 73L, 92L, 115L, 136L, 158L, 177L, 202L, 221L, 14L,
31L, 51L, 76L, 95L, 117L, 137L, 159L, 182L, 200L, 223L, 17L,
37L, 52L, 75L, 96L, 118L, 141L, 161L, 179L, 201L, 222L, 12L,
35L, 56L, 77L, 97L, 119L, 140L, 162L, 180L, 207L, 224L, 13L,
33L, 70L, 84L, 98L, 120L, 138L, 160L, 183L, 203L, 226L, 11L,
36L, 57L, 78L, 100L, 121L, 143L, 163L, 185L, 205L, 225L, 16L,
38L, 59L, 79L, 101L, 127L, 142L, 170L, 181L, 204L, 227L, 19L,
39L, 58L, 81L, 102L, 122L, 144L, 165L, 186L, 206L, 228L, 18L,
40L, 61L, 80L, 103L, 123L, 145L, 164L, 187L, 208L, 229L, 20L,
41L, 60L, 85L, 105L, 125L, 146L, 166L, 188L, 209L, 230L, 21L,
49L, 62L, 83L, 106L, 124L, 156L, 167L, 189L, 210L, 231L), class = "data.frame")
First of all you are not really making a wireframe, use the wireframe from the lattice package.
wireframe(A ~ c + h,drape = TRUE,pretty = TRUE, data = df)
This is the result with your data
You still have some wonky Data inbetween, but this should solve the empty spaces.

Summary of Dataset using lapply

This is a novice question, however, I am finding it very difficult to understand how to use lapply correctly, especially when the ID used is not numeric.
There are possibly better methods to trying to find the summary I have in mind, but for now, I'm trying to use lapply. Essentially, I have a large df with 17 columns. Two of the column are ID and Date. Not all IDs have a recorded value in a given column name. What I am interested in is finding the total number of rows available for each column, and the number of unique IDs that exist for that column. I have a dput example that makes things clearer. For example, Var8 has only 6 rows of data available, as a result it has 6 unique IDs. Also, Var15 has 20 rows and 12 unique IDs. But I want to know this for all Var15. I can do this manually using
Var8=df[!(is.na(df$Var8)),]
length(df$ID)
length(unique(df$ID))
remove(Var8)
But trying to automate:
lapply(COL.NAMES, function(x){
temp=df[!(is.na(df$paste(x))),]
rows=length(temp$ID)
num_comp=length(unique(temp$ID))
return(rows)
return(num_comp)
remove(temp)
})
leaves me with an error: attempt to apply non-function.
COL.NAMES<-c("Var1","Var2","Var3","Var4","Var5","Var6","Var7","Var8","Var9","Var10","Var11","Var12","Var13","Var14","Var15")
structure(list(ID = structure(c(1L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 2L, 3L, 4L, 1L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("Comp1",
"Comp10", "Comp11", "Comp12", "Comp2", "Comp3", "Comp4", "Comp5",
"Comp6", "Comp7", "Comp8", "Comp9"), class = "factor"), Date = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("0/1/2014", "0/1/2015"), class = "factor"),
Var1 = c(0.57, 0.34, 0.38, 0.93, 0.54, 0.17, 0.08, 0.28,
0.99, 1, 0.61, 0.73, 0.15, 0.09, 0.64, 0.3, 0.12, 0.79, 0.79,
0.15), Var2 = c(0.7, 0.77, 0.93, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.46, 0.26), Var3 = c(0.65,
0.7, 0.83, 0.7, 0.43, 0.81, 0.21, 0.44, 0.25, 0.77, 0.24,
0.29, 0.87, 0.42, 1, NA, NA, NA, NA, 0.79), Var4 = c(1, 0.7,
0.69, NA, NA, NA, NA, 0.2, 0.61, 0.89, 0.45, 0.02, 0.97,
0.33, 0.34, 0.81, 0.99, 0.35, 0.48, 0.33), Var5 = c(0.47,
0.95, 0.38, 0.69, 0.84, 0.21, 0.62, 0.59, 0.45, 0.63, 0.18,
0.49, NA, NA, NA, NA, 0.17, 0.15, 0.6, 0.44), Var6 = c(NA,
NA, NA, NA, 0.24, 0.07, 0.75, 0.24, 0.82, 0.14, 0.86, 0.63,
0.82, 0.92, 0.55, 0.22, 0.87, 0.69, 0.64, 0.73), Var7 = c(0.2,
0.11, 0.82, 0.31, 0.97, NA, NA, NA, NA, 0.83, 0.84, 0.81,
0.72, 0.36, 0.09, 0.15, 0.46, 0.79, 0.75, 0.39), Var8 = c(0.28,
0.55, NA, NA, NA, NA, 0.56, 0.89, 0.92, 0.46, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), Var9 = c(0.11, 0.36, 1, 0.44,
0.53, 0.6, 0.24, 0.56, 0.6, 0.55, 0.55, 0.05, 0.77, 0.9,
NA, NA, NA, NA, 0.4, 0.33), Var10 = c(0.74, 0.13, 0.09, 0.61,
NA, NA, NA, NA, 0.27, 0.71, 0.56, 0.3, 0.36, 0.44, 0.78,
0.9, 0.46, 0.49, 0.87, 0.36), Var11 = c(0.58, 0.99, 0.07,
0.83, 0.45, 0.07, 0.16, 0.43, 0.34, 0.31, 0.06, 0.67, 0.02,
0.52, 0.19, 0.49, 0.31, 0.02, 0.62, 0.21), Var12 = c(0.93,
0.26, 0.77, 0.8, 0.67, 0.83, 0.12, 0.39, 0.78, 0.75, 0.44,
NA, NA, NA, NA, 0.42, 0.49, 0.06, 0.8, 0.54), Var13 = c(0.44,
0.75, NA, NA, NA, NA, 0.58, 0.3, 0.47, 0.88, 0.36, 0.21,
0.87, 0.33, 0.12, 0.31, 0.95, 0.59, 0.18, 0.43), Var14 = c(0.55,
0.03, 0.37, 0.66, NA, 0.91, 0.78, 0.84, 0.96, 0.34, 0.25,
0.92, 0.71, 0.41, 0.23, 0.54, 0.8, 0.87, 0.3, 0.37), Var15 = c(0.71,
0.66, 0.01, 0.7, 0.4, 0.04, 0.3, 1, 0.59, 0.69, 0.88, 0.28,
0.44, 0.51, 0.2, 0.17, 0.6, 0.11, 0.85, 0.04)), .Names = c("ID",
"Date", "Var1", "Var2", "Var3", "Var4", "Var5", "Var6", "Var7",
"Var8", "Var9", "Var10", "Var11", "Var12", "Var13", "Var14",
"Var15"), class = "data.frame", row.names = c(NA, -20L))
I would advise getting yourself familiar with data wrangling using dplyr. The magrittr pipes %>% implemented will help you with understanding the usage of apply.
Here's how I would change your function:
library(dplyr)
tmp<-lapply(COL.NAMES, function(x) df[,c("ID", x)] %>% na.omit) # loop and extract 15 data.frames, each with 2 columns; remove rows with missing value
rows <- sapply(tmp, nrow)
num_comp <- lapply(tmp, '[[', "ID") %>% lapply(., unique) %>% sapply(., length) #extract only ID column from list of 15 data.frame; loop across each vector to retain unique values; count length of vector.
Another approach would be,
df1 <- data.frame(n_rows = colSums(!is.na(df[,-(1:2)]), na.rm = TRUE),
unique_IDs = sapply(df[,-2], function(i) length(unique(df$ID[!is.na(i)])))[-1])
head(df1)
# n_rows unique_IDs
#Var1 20 12
#Var2 5 5
#Var3 16 12
#Var4 16 12
#Var5 16 12
#Var6 16 12
I am not sure if I have understood correctly but this could be your solution .
x is your dataframe
try1 <- function(df){
temp <- sum(!is.na(df)) ## no of non na entries
temp2 <- length(unique(df)) # length unique entries `
temp <- list("x"=temp,"y"=temp2)
temp
}
> lapply(x,try1)
Here is a data.table soln
library(data.table)
dd <- as.data.table(x)
COL.NAMES<-c("Var1","Var2","Var3","Var4","Var5","Var6","Var7","Var8","Var9","Var10","Var11","Var12","Var13","Var14","Var15")
dd[,lapply(.SD, try1),.SDcols=COL.NAMES]
However, I didn't use lapply,this solution does work
find.uniques<- function(df){
for(i in 1:ncol(df)){
uniques<- data.frame()
uniques[i,1]<- length(!is.na(unique(df[,i])))
uniques[i,2]<- length(which(!is.na(unique(df[,i]))))
}
return(uniques)
}
Result is a data.frame with V1 as how many rows are available, V2 how many IDs there are for each column.
You can also return(as.data.frame(t(uniques))) to change the rows to columns to see what is available for each column.

Resources