operating on pairs of elements in a data frame

operating on pairs of elements in a data frame - r

I have two data frames, x and weights, in which columns are paired. Here are example data frames:
x = read.table(text = "
yr1 yr2 yr3 yr4
10 15 6 8
10 20 30 NA
NA 5 2 3
100 100 NA NA",
sep = "", header = TRUE)
weights = read.table(text = "
yr1 yr2 yr3 yr4
2 4 1 3
2 2 4 2
3 2 2 3
4 2 2 4",
sep = "", header = TRUE)
The columns yr1 and yr2 are one pair and the columns yr3 and yr4 are another pair. With my actual data the columns go up to yr100 and there are 50 pairs of columns.
If yr1 or yr2 is missing in x I want to fill the missing observation with, for example:
(5 / 2) * 3
Likewise for yr3 or yr4:
(30 / 4) * 2
where 5 (or 30) is the element in the column in x that is not missing for a given pair of elements. The values 2 and 3 for the first example (and the values 4 and 2 in the second example) are the corresponding elements in the weights data frame for a given pair of elements in the x data frame. If both elements in a pair are missing in x I want to leave them as missing.
Here is R code that does the above operations using nested for loops. However, there are 2000 or 3000 rows in my actual data set and the nested for loops have been running now for >10 hours.
for(i in 1: (ncol(x)/2)) {
for(j in 1: nrow(x)) {
if( is.na(x[j,(1 + (i-1)*2)]) & !is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 0)] = (x[j,(1 + ((i-1)*2 + 1))] / weights[j,(1 + ((i-1)*2 + 1))]) * weights[j,(1 + (i-1)*2 + 0)]
if(!is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 1)] = (x[j,(1 + ((i-1)*2 + 0))] / weights[j,(1 + ((i-1)*2 + 0))]) * weights[j,(1 + (i-1)*2 + 1)]
if( is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 0)] = NA
if( is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 1)] = NA
}
}
I have realized that the third and fourth if statements probably are not necessary. Perhaps the time to run this code will be reduced substantially if I simply remove those two if statements.
However, I also came up with the following alternative solution that uses reshape instead of nested for loops:
n.years <- 4
x2 <- reshape(x , direction="long", varying = list(seq(1,(n.years-1),2), seq(2,n.years,2)), v.names = c("yr1", "yr2"), times = c("t1", "t2"))
wt2 <- reshape(weights, direction="long", varying = list(seq(1,(n.years-1),2), seq(2,n.years,2)), v.names = c("yr1", "yr2"), times = c("t1", "t2"))
x2$yr1 <- ifelse(is.na(x2$yr1), (x2$yr2 / wt2$yr2) * wt2$yr1, x2$yr1)
x2$yr2 <- ifelse(is.na(x2$yr2), (x2$yr1 / wt2$yr1) * wt2$yr2, x2$yr2)
x3 <- reshape(x2, direction="wide", varying = list(seq(1,3,2), seq(2,4,2)), v.names = c("yr1", "yr2"), times = c("t1", "t2"))
x3
Before I shut the current R session down and try one of the above approaches please suggest possible alternatives that might be more efficient. I have used microbenchmark a little bit, but have not yet attempted to do so here, partially because writing a function for each possible solution is a little intimidating to me. I also tried coming up with a solution using the apply family of functions, but could not come up with one.
My reshape solution was derived from this question:
Reshaping a data frame with more than one measure variable
In addition to computation time I am also concerned about possible memory exhaustion.
I try hard to stick with base R, but will consider using other options to obtain desired output. Thank you for any suggestions.

Does this work for you?
Note that I didn't use your replacement function because I found it a bit confusing, so you will have to fix how you replace the yr1 and yr2 variables with your formula. Also, you'll probably want to reshape the result if you need to be able to attach it to your original dataframe.
newx <-
reshape(x, direction="long",varying=list(1:50*2-1,1:50*2), v.names=c("v1","v2"))
newwt <-
reshape(weights, direction="long",varying=list(1:50*2-1,1:50*2), v.names=c("w1","w2"))
condwtmean <- function(x,y,wtx,wty){
if(xor(is.na(x),is.na(y))){
if(is.na(x))
x <- y # replacement function
if(is.na(y))
y <- x # replacement function
return(weighted.mean(c(x,y),c(wtx,wty)))
}
else if(!is.na(x) & !is.na(y))
return(weighted.mean(c(x,y),c(wtx,wty)))
else
return(NA)
}
newx$wtmean <- mapply(condwtmean, newx$v1, newx$v2, newwt$w1, newwt$w2)

Thomas's answer is much better than any of the three approaches I tried. Here I compare the four approaches with microbenchmark. I have not yet tried Thomas's answer with the actual data. My original nested for-loops approach is still running after 22 hours.
Unit: milliseconds
expr min lq median uq max neval
fn.1(x, weights) 98.69133 99.47574 100.5313 101.7315 108.8757 20
fn.2(x, weights) 755.51583 758.12175 762.3775 776.0558 801.9615 20
fn.3(x, weights) 564.21423 567.98822 568.5322 571.0975 575.1809 20
fn.4(x, weights) 367.05862 370.52657 371.7439 373.7367 395.0423 20
#########################################################################################
# create data
set.seed(1234)
n.rows <- 40
n.cols <- 40
n.sample <- n.rows * n.cols
x <- sample(20, n.sample, replace=TRUE)
x.NA <- sample(n.rows*n.cols, 10*(n.sample / n.rows), replace=FALSE)
x[x.NA] <- NA
x <- as.data.frame(matrix(x, nrow = n.rows))
weights <- sample(4, n.sample, replace=TRUE)
weights <- as.data.frame(matrix(weights, nrow = n.rows))
weights
#########################################################################################
# Thomas's function
fn.1 <- function(x, weights){
newx <- reshape(x, direction="long", varying = list(seq(1,(n.cols-1),2), seq(2,n.cols,2)), v.names=c("v1", "v2"))
newwt <- reshape(weights, direction="long", varying = list(seq(1,(n.cols-1),2), seq(2,n.cols,2)), v.names=c("w1", "w2"))
condwtmean <- function(x,y,wtx,wty){
if(xor(is.na(x),is.na(y))){
if(is.na(x))
x <- (y / wty) * wtx # replacement function
if(is.na(y))
y <- (x / wtx) * wty # replacement function
return(weighted.mean(c(x,y),c(wtx,wty)))
}
else if(!is.na(x) & !is.na(y))
return(weighted.mean(c(x,y),c(wtx,wty)))
else
return(NA)
}
newx$wtmean <- mapply(condwtmean, newx$v1, newx$v2, newwt$w1, newwt$w2)
newx2 <- reshape(newx[,c(1,4:5)], v.names = "wtmean", timevar = "time", direction = "wide")
newx2 <- newx2[,2:(n.cols/2+1)]
names(newx2) <- paste('X', 1:(n.cols/2), sep = "")
return(newx2)
}
fn.1.output <- fn.1(x, weights)
#########################################################################################
# nested for-loops with 4 if statements
fn.2 <- function(x, weights){
for(i in 1: (ncol(x)/2)) {
for(j in 1: nrow(x)) {
if( is.na(x[j,(1 + (i-1)*2)]) & !is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 0)] = (x[j,(1 + ((i-1)*2 + 1))] / weights[j,(1 + ((i-1)*2 + 1))]) * weights[j,(1 + (i-1)*2 + 0)]
if(!is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 1)] = (x[j,(1 + ((i-1)*2 + 0))] / weights[j,(1 + ((i-1)*2 + 0))]) * weights[j,(1 + (i-1)*2 + 1)]
if( is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 0)] = NA
if( is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 1)] = NA
}
}
x.weights = x * weights
numerator <- sapply(seq(1,ncol(x.weights),2), function(i) {
apply(x.weights[,c(i, i+1)], 1, sum, na.rm=T)
})
denominator <- sapply(seq(1,ncol(weights),2), function(i) {
apply(weights[,c(i, i+1)], 1, sum, na.rm=T)
})
weighted.x <- numerator/denominator
for(i in 1: (ncol(x)/2)) {
for(j in 1: nrow(x) ) {
if( is.na(x[j,(1 + (i-1)*2)]) & !is.na(x[j,(1 + (i-1)*2 + 1)])) weighted.x[j,i] = sum(c(x[j,(1 + ((i-1)*2))], x[j,(1 + ((i-1)*2 + 1))]), na.rm = TRUE)
if(!is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) weighted.x[j,i] = sum(c(x[j,(1 + ((i-1)*2))], x[j,(1 + ((i-1)*2 + 1))]), na.rm = TRUE)
if( is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) weighted.x[j,i] = NA
}
}
return(weighted.x)
}
fn.2.output <- fn.2(x, weights)
fn.2.output <- as.data.frame(fn.2.output)
names(fn.2.output) <- paste('X', 1:(n.cols/2), sep = "")
#########################################################################################
# nested for-loops with 2 if statements
fn.3 <- function(x, weights){
for(i in 1: (ncol(x)/2)) {
for(j in 1: nrow(x)) {
if( is.na(x[j,(1 + (i-1)*2)]) & !is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 0)] = (x[j,(1 + ((i-1)*2 + 1))] / weights[j,(1 + ((i-1)*2 + 1))]) * weights[j,(1 + (i-1)*2 + 0)]
if(!is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) x[j,(1 + (i-1)*2 + 1)] = (x[j,(1 + ((i-1)*2 + 0))] / weights[j,(1 + ((i-1)*2 + 0))]) * weights[j,(1 + (i-1)*2 + 1)]
}
}
x.weights = x * weights
numerator <- sapply(seq(1,ncol(x.weights),2), function(i) {
apply(x.weights[,c(i, i+1)], 1, sum, na.rm=T)
})
denominator <- sapply(seq(1,ncol(weights),2), function(i) {
apply(weights[,c(i, i+1)], 1, sum, na.rm=T)
})
weighted.x <- numerator/denominator
for(i in 1: (ncol(x)/2)) {
for(j in 1: nrow(x) ) {
if( is.na(x[j,(1 + (i-1)*2)]) & !is.na(x[j,(1 + (i-1)*2 + 1)])) weighted.x[j,i] = sum(c(x[j,(1 + ((i-1)*2))], x[j,(1 + ((i-1)*2 + 1))]), na.rm = TRUE)
if(!is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) weighted.x[j,i] = sum(c(x[j,(1 + ((i-1)*2))], x[j,(1 + ((i-1)*2 + 1))]), na.rm = TRUE)
if( is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) weighted.x[j,i] = NA
}
}
return(weighted.x)
}
fn.3.output <- fn.3(x, weights)
fn.3.output <- as.data.frame(fn.3.output)
names(fn.3.output) <- paste('X', 1:(n.cols/2), sep = "")
#########################################################################################
# my reshape solution
fn.4 <- function(x, weights){
new.x <- reshape(x , direction="long", varying = list(seq(1,(n.cols-1),2), seq(2,n.cols,2)), v.names = c("v1", "v2"))
wt <- reshape(weights, direction="long", varying = list(seq(1,(n.cols-1),2), seq(2,n.cols,2)), v.names = c("w1", "w2"))
new.x$v1 <- ifelse(is.na(new.x$v1), (new.x$v2 / wt$w2) * wt$w1, new.x$v1)
new.x$v2 <- ifelse(is.na(new.x$v2), (new.x$v1 / wt$w1) * wt$w2, new.x$v2)
x2 <- reshape(new.x, direction="wide", varying = list(seq(1,3,2), seq(2,4,2)), v.names = c("v1", "v2"))
x <- x2[,2:(n.cols+1)]
x.weights = x * weights
numerator <- sapply(seq(1,ncol(x.weights),2), function(i) {
apply(x.weights[,c(i, i+1)], 1, sum, na.rm=T)
})
denominator <- sapply(seq(1,ncol(weights),2), function(i) {
apply(weights[,c(i, i+1)], 1, sum, na.rm=T)
})
weighted.x <- numerator/denominator
for(i in 1: (ncol(x)/2)) {
for(j in 1: nrow(x) ) {
if( is.na(x[j,(1 + (i-1)*2)]) & !is.na(x[j,(1 + (i-1)*2 + 1)])) weighted.x[j,i] = sum(c(x[j,(1 + ((i-1)*2))], x[j,(1 + ((i-1)*2 + 1))]), na.rm = TRUE)
if(!is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) weighted.x[j,i] = sum(c(x[j,(1 + ((i-1)*2))], x[j,(1 + ((i-1)*2 + 1))]), na.rm = TRUE)
if( is.na(x[j,(1 + (i-1)*2)]) & is.na(x[j,(1 + (i-1)*2 + 1)])) weighted.x[j,i] = NA
}
}
return(weighted.x)
}
fn.4.output <- fn.4(x, weights)
fn.4.output <- as.data.frame(fn.4.output)
names(fn.4.output) <- paste('X', 1:(n.cols/2), sep = "")
#########################################################################################
rownames(fn.1.output) <- NULL
rownames(fn.2.output) <- NULL
rownames(fn.3.output) <- NULL
rownames(fn.4.output) <- NULL
all.equal(fn.1.output, fn.2.output)
all.equal(fn.1.output, fn.3.output)
all.equal(fn.1.output, fn.4.output)
all.equal(fn.2.output, fn.3.output)
all.equal(fn.2.output, fn.4.output)
all.equal(fn.3.output, fn.4.output)
library(microbenchmark)
microbenchmark(fn.1(x, weights), fn.2(x, weights), fn.3(x, weights), fn.4(x, weights), times=20)
#########################################################################################

Related

ggplot's stat_function() giving wrong result

I generated some data to perfome a regression on it:
library(tidyverse)
library(nnet)
# Generating the data --------------------------
set.seed(100)
helicopter <- rnorm(20, mean = 35, sd = 3)
car <- rnorm(20, mean = 30, sd = 3)
bus <- rnorm(20, mean = 25, sd = 3)
bike <- rnorm(20, mean = 20, sd = 3)
transportation_data <- data.frame(helicopter, car, bus, bike) %>%
pivot_longer(cols = 1:4, values_to = "income", names_to = "mode")
# Setting up the regression -------------------
transportation_regression <- multinom(mode~income, data = transportation_data)
So far, so good. I now want to plot the regression results (probability of choosing a certain mode of transportation based on income) using stat_function:
ins <- coef(transportation_regression)[1:3]
betas <- coef(transportation_regression)[4:6]
transportation_data %>%
ggplot(aes(x = income))+
stat_function(fun = function(x) { 1 / (1 + sum(exp(ins + betas * x))) }, aes(color = "bike"))+
stat_function(fun = function(x) { exp(ins[1] + betas[1] * x) / (1 + sum(exp(ins + betas * x))) }, aes(color = "bus"))+
stat_function(fun = function(x) { exp(ins[2] + betas[2] * x) / (1 + sum(exp(ins + betas * x))) }, aes(color = "car"))+
stat_function(fun = function(x) { exp(ins[3] + betas[3] * x) / (1 + sum(exp(ins + betas * x))) }, aes(color = "helicopter"))
I get this output, which is obviously wrong, and a warning Warning: longer object length is not a multiple of shorter object length where I don't know what it means.
When I use the same functions, but predict data points first, everything works just fine:
income <- seq(0,50,0.1)
result <- matrix( , nrow = length(income), ncol = 4)
i <- 1
for(x in income){
result[i,1] <- 1 / (1 + sum(exp(ins + betas * x))) # bike
result[i,2] <- exp(ins[1] + betas[1] * x) / (1 + sum(exp(ins + betas * x))) # bus
result[i,3] <- exp(ins[2] + betas[2] * x) / (1 + sum(exp(ins + betas * x))) # car
result[i,4] <- exp(ins[3] + betas[3] * x) / (1 + sum(exp(ins + betas * x))) # helicopter
i <- i + 1
}
cbind(income, as.data.frame(result)) %>%
pivot_longer(cols = V1:V4) %>%
ggplot(aes(x = income, y = value, color = name))+
geom_line()
Why don't the stat_function() in ggplot work?

I think it's just a misunderstanding of how the function works. Here's an example of using stat_function() to generate the right result:
library(tidyverse)
library(nnet)
# Generating the data --------------------------
set.seed(100)
helicopter <- rnorm(20, mean = 35, sd = 3)
car <- rnorm(20, mean = 30, sd = 3)
bus <- rnorm(20, mean = 25, sd = 3)
bike <- rnorm(20, mean = 20, sd = 3)
transportation_data <- data.frame(helicopter, car, bus, bike) %>%
pivot_longer(cols = 1:4, values_to = "income", names_to = "mode")
# Setting up the regression -------------------
transportation_regression <- multinom(mode~income, data = transportation_data)
#> # weights: 12 (6 variable)
#> initial value 110.903549
#> iter 10 value 48.674542
#> iter 20 value 46.980349
#> iter 30 value 46.766625
#> iter 40 value 46.734782
#> iter 50 value 46.732249
#> final value 46.732163
#> converged
ins <- coef(transportation_regression)[1:3]
betas <- coef(transportation_regression)[4:6]
transportation_data %>%
ggplot(aes(x = income))+
stat_function(fun = function(x) { 1 / (1 + exp(ins[1] + betas[1] * x) + exp(ins[2] + betas[2] * x) + exp(ins[3] + betas[3] * x)) }, aes(color = "bike"))+
stat_function(fun = function(x) { exp(ins[1] + betas[1] * x) / (1 + exp(ins[1] + betas[1] * x) + exp(ins[2] + betas[2] * x) + exp(ins[3] + betas[3] * x)) }, aes(color = "bus"))+
stat_function(fun = function(x) { exp(ins[2] + betas[2] * x) / (1 + exp(ins[1] + betas[1] * x) + exp(ins[2] + betas[2] * x) + exp(ins[3] + betas[3] * x)) }, aes(color = "car"))+
stat_function(fun = function(x) { exp(ins[3] + betas[3] * x) / (1 + exp(ins[1] + betas[1] * x) + exp(ins[2] + betas[2] * x) + exp(ins[3] + betas[3] * x)) }, aes(color = "helicopter"))
There were a couple of problems originally. Take, for example, the first instance of stat_function(),
stat_function(fun = function(x) {
1 / (1 + sum(exp(ins + betas * x))) },
aes(color = "bike"))
You're expecting ins + betas * x to be equivalent to ins[1] + betas[1] * x + ins[2] + betas[2] * x + ins[3] + betas[3] * x, but it isn't essentially recycling ins and betas to make them vectors as long as x and then multiplying betas by x and adding ins.
The other problem was the sum() around exp(ins ...) Rather than summing the rows, it's summing all rows and columns of the output, making a scalar value.
You could also make it a bit more general using matrix calculations:
b <- coef(transportation_regression)
transportation_data %>%
ggplot(aes(x = income))+
stat_function(fun = function(x) { 1 / (1 + rowSums(exp(cbind(1, x) %*% t(b)))) }, aes(color = "bike"))+
stat_function(fun = function(x) { exp(ins[1] + betas[1] * x) / (1 + rowSums(exp(cbind(1, x) %*% t(b)))) }, aes(color = "bus"))+
stat_function(fun = function(x) { exp(ins[2] + betas[2] * x) / (1 + rowSums(exp(cbind(1, x) %*% t(b)))) }, aes(color = "car"))+
stat_function(fun = function(x) { exp(ins[3] + betas[3] * x) / (1 + rowSums(exp(cbind(1, x) %*% t(b)))) }, aes(color = "helicopter"))
Created on 2023-02-04 by the reprex package (v2.0.1)

how to solve the NaNs produced when i use the log()

> LL_LN<-function(para){
+ ww=para[1] # w
+ alpha=para[2] # alpha
+ beta=para[3]# beta
+ sigma=para[4]
+ n=length(R)
+ psi <- rep(0, n)
+ # read data from file , named as RR
+ psi[1]=0.5 # replace with mean (R) or R(1)
+ sum1 = -n/2*log(2*pi*sigma^2)
+ sum4 = -log(R[1])
+ sum3 = -((log(R[1])-log(psi[1]))+0.5*(sigma^2))^2/(2*(sigma^2))
+ sum=0
+
+ for (i in 2:n) {
+ psi[i]=ww+alpha*psi[i-1]+beta*R[i-1]
+ sum4=sum4-log(R[i])
+ sum3=sum3-((log(R[i])-log(psi[i]))+0.5*(sigma^2))^2/(2*(sigma^2))
+ }
+ sum=sum1+sum3+sum4
+ return(-sum)
+
+ }
> n=10000
> k=5
> sigma=0.25
> mu=-(sigma^2)/2
> result=c(0,0,0,0)
> output<-matrix(0,k,4)
> output<-as.data.frame(output)
> for(j in 1:k){
+ #E<-rexp(n,1)
+ E=rlnorm(n,mu,sigma)
+ w=1
+ a=0.2
+ b=0.3
+ R <- rep(0, n) # To Store R1,R2....R1000
+ X <- rep(0, n) # To Store X1,X2....X1000
+ X[1] = 0.5
+ R[1]=X[1]*E[1]
+
+ for (i in 2:n) # To Find R1,R2....R1000
+ {
+ X[i]=w+a*X[i-1]+b*R[i-1]
+ R[i]=X[i]*E[i]
+ }
+ hist(R)
+ mean(E)
+
+ initial =c(1,0.2,0.3,2)
+ op<-optim(initial,LL_LN)
+ output$V1[j] <- op$par[1]
+ output$V2[j] <- op$par[2]
+ output$V3[j] <- op$par[3]
+ output$V4[j] <- op$par[4]
+
+ print(output)
+ }
V1 V2 V3 V4
1 1.0202418 0.1989128 0.2927711 0.2484346
2 0.9725745 0.2159796 0.2970536 0.2529665
3 1.0648460 0.1692719 0.2987034 0.2492417
4 1.0186746 0.1819587 0.3039954 0.2517418
5 1.0022230 0.2103271 0.2858050 0.2484962
There were 50 or more warnings (use warnings() to see the first 50)
> mean<-cbind(mean(output$V1),mean(output$V2),mean(output$V3),mean(output$V4))
> mean
[,1] [,2] [,3] [,4]
[1,] 1.015712 0.19529 0.2956657 0.2501762

ctree R not displaying levels

Having trouble using ctree in R, sometimes it does not display any levels at all, here is an example below.
Can anyone explain why. thanks
library("party")
df <- data.frame(matrix(ncol = 3, nrow = 0))
colnames(df) <- c("a", "b", "c")
df[nrow(df) + 1,] = c("Y","M","N")
df[nrow(df) + 1,] = c("Y","F","N")
df[nrow(df) + 1,] = c("Y","M","Y")
df[nrow(df) + 1,] = c("Y","F","N")
df[nrow(df) + 1,] = c("N","F","Y")
df[nrow(df) + 1,] = c("N","M","N")
df[nrow(df) + 1,] = c("N","M","Y")
df[nrow(df) + 1,] = c("N","M","N")
df[nrow(df) + 1,] = c("N","F","N")
df[nrow(df) + 1,] = c("N","F","N")
df$a <- as.factor(df$a)
df$b <- as.factor(df$b)
df$c <- as.factor(df$c)
TMDT<-ctree(a ~ b + c,data=df)
plot(TMDT,type="simple")

ggplot2 making a line graph that records constantly changing variables

I have a graph and code like the following:
library(igraph)
g <- make_empty_graph (2) %>%
set_vertex_attr("a", value = 1) %>%
add_vertices(2, color = 2, "a" = 2) %>%
add_vertices(2, color = 4, "a" = 3) %>%
add_edges(c(1,2, 2,1, 1,5, 5,1, 1,4 ,4,1)) %>%
set_vertex_attr("xyz", value = 3)
i <- 1
repeat {
prev_value <- mean(V(g)$a == 1)
print(V(g)$a)
print(mean(V(g)$a == 1))
print(i)
V(g)$xyz = sapply(V(g), function(x) {
NeighborList = neighbors(g, x)
length(NeighborList[NeighborList$a == 2])
})
V(g)$a[V(g)$xyz == 1] = 2
i <- i + 1
aa <- mean(V(g)$a == 1)
if (aa == prev_value) {
break
}
}
df <- data.frame(time=i, prop=aa)
library(ggplot2)
ggplot(data=df, aes(x=time, y=prop, group=1)) +
geom_line() +
geom_point()
Whenever I try to run this however, it doesn't work. Ideally I would like an output where ggplot2 would plot a line graph that depicts the change in the proportion of nodes with an attribute "aa" value of 1 as the time variable "i" changes.

I am not sure what exactly you want. But if you want to depict the difference in aa for each iteration then you can add the ggplot inside the repeat.
repeat {
prev_value <- mean(V(g)$a == 1)
print(V(g)$a)
print(mean(V(g)$a == 1))
print(i)
V(g)$xyz = sapply(V(g), function(x) {
NeighborList = neighbors(g, x)
length(NeighborList[NeighborList$a == 2])
})
V(g)$a[V(g)$xyz == 1] = 2
i <- i + 1
aa <- mean(V(g)$a == 1)
if(i == 1){
df <- data.frame(time=i, prop=aa)
print( ggplot(data=df, aes(x=time, y=prop, group=1)) +
geom_line() +
geom_point() )}else{
df <- rbind(df,data.frame(time=i, prop=aa))
print( ggplot(data=df, aes(x=time, y=prop, group=1)) +
geom_line() +
geom_point() )}
if (aa == prev_value) {
break
}
}

Trying to print indvidual plots outside loop in R

I am breaking my data frames into 3 other data frames. I am iterating through each of them and plot histogram based on it. I am storing these plots into separate ones. However, outside the loop, I am able to print only the plot from the last iteration and not the first two. I am able to print all three plots inside the loop though. Here is my code:
catCust1 <- myData[(myData$meanVal > 0 & myData$meanVal <= 20),]
catCust2 <- myData[(myData$meanVal > 20 & myData$meanVal <= 40),]
catCust3 <- myData[(myData$meanVal > 40 & myData$meanVal <= 60),]
for(i in 1:3) {
if(i == 1) {
catCust <- catCust1
} else if(i == 2) {
catCust <- catCust2
} else if(i == 3) {
catCust <- catCust3
}
catCust <- na.omit(catCust)
numOrdersCatCust <- ddply(catCust, .(ORDERDATE), nrow)
numOrdersCatCust$numDay <- 1:nrow(numOrdersCatCust)
catCust$orderCount <- numOrdersCatCust[match(catCust$ORDERDATE, numOrdersCatCust$ORDERDATE), 2]
catCust$numDay <- numOrdersCatCust[match(catCust$ORDERDATE, numOrdersCatCust$ORDERDATE), 3]
setDT(catCust)[, uniqueCounter := .GRP, by = CUSTOMERID]
apply(catCust, 2, function(x)length(unique(x)))
threshold25catOne <- catCust[(catCust$uniqueCounter == as.integer(0.25 * max(uniqueCounter))), ]
threshold25catOneVal <- threshold25catOne$numDay
catCustPlot <- ggplot(data=catCust, aes(catCust$numDay)) +
geom_histogram(binwidth = 1, col="black", fill="white", alpha=0.1) +
labs(x="Day Number (Since 01-09-2016)", y="Orders") +
ggtitle("GRAPH TITLE") +
theme(plot.title = element_text(hjust = 0.5), panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
geom_vline(xintercept = threshold25catOneVal[1], color="purple") +
annotate("text", x = threshold25catOneVal[1]-7, y = max(catCust$orderCount) + 1000, angle = 0, label = threshold25catOneVal[1], vjust = 1.2, parse = TRUE)
# ABLE TO PRINT DIFFERENT PLOTS HERE
if(i == 1) {
catCustPlot1 <- catCustPlot
print(catCustPlot1)
} else if(i == 2) {
catCustPlot2 <- catCustPlot
print(catCustPlot2)
} else if(i == 3) {
catCustPlot3 <- catCustPlot
print(catCustPlot3)
}
}
# PRINTS ONLY catCustPlot3
print(catCustPlot1)
print(catCustPlot2)
print(catCustPlot3)
The other two plots gives me an error:
Error: Aesthetics must be either length 1 or the same as the data
UPDATE: head(myData)
meanVal sumVal countCat
75.98 75.98000 (60,80]
36.37 80.55727 (80,100]
50.96 52.67500 (40,60]
15.33 15.33000 (0,20]
17.48 27.65000 (20,40]
51.35 101.64900 (100,1e+04]

I just simplified you loop and stored ggplot objects to resPlots (no printing out). When loop finishes you can access/print them.
myData <- na.omit(myData)
resPlots <- list()
for(i in 1:3) {
if (i == 1) {
catCust <- myData[(myData$meanVal > 0 & myData$meanVal <= 20), ]
} else if (i == 2) {
catCust <- myData[(myData$meanVal > 20 & myData$meanVal <= 40), ]
} else if (i == 3) {
catCust <- myData[(myData$meanVal > 40 & myData$meanVal <= 60), ]
}
numOrdersCatCust <- ddply(catCust, .(ORDERDATE), nrow)
numOrdersCatCust$numDay <- 1:nrow(numOrdersCatCust)
catCust$orderCount <- numOrdersCatCust[match(catCust$ORDERDATE, numOrdersCatCust$ORDERDATE), 2]
catCust$numDay <- numOrdersCatCust[match(catCust$ORDERDATE, numOrdersCatCust$ORDERDATE), 3]
setDT(catCust)[, uniqueCounter := .GRP, by = CUSTOMERID]
apply(catCust, 2, function(x)length(unique(x)))
threshold25catOne <- catCust[(catCust$uniqueCounter == as.integer(0.25 * max(uniqueCounter))), ]
threshold25catOneVal <- threshold25catOne$numDay
resPlots[[i]] <- ggplot(catCust, aes(catCust$numDay)) +
geom_histogram(binwidth = 1, col = "black", fill = "white", alpha = 0.1) +
labs(x="Day Number (Since 01-09-2016)", y="Orders") +
ggtitle("GRAPH TITLE") +
theme(plot.title = element_text(hjust = 0.5), panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
geom_vline(xintercept = threshold25catOneVal[1], color="purple") +
annotate("text", x = threshold25catOneVal[1]-7, y = max(catCust$orderCount) + 1000, angle = 0, label = threshold25catOneVal[1], vjust = 1.2, parse = TRUE)
}
# First plot
resPlots[[1]]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

operating on pairs of elements in a data frame - r

Related

ggplot's stat_function() giving wrong result

how to solve the NaNs produced when i use the log()

ctree R not displaying levels

ggplot2 making a line graph that records constantly changing variables

Trying to print indvidual plots outside loop in R

Categories

Resources