I have two vectors. I need to find the intersection between these two, and do a nice plot of it.
So, here is a very simple data frame example:
df <- data.frame( id <- c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
p <-c(5,7,9,11,13,15,17,19,21,23,20,18,16,14,12,10,8,6,4,2 ),
q <-c(3,5,7,13,19,31,37,53,61,67,6,18,20,24,40,46,66,70,76,78))
colnames(df) <- c("id","price","quantity")
supply <- df[df$id == 1,]
demand <- df[df$id == 2,]
plot( x = supply$quantity, y = supply$price, type = "l", ylab = "price", xlab = "quantity")
lines(x = demand$quantity , y = demand$price, type = "l")
grid()
Now, I can plot them and find the intersection manually, but can you make R calculate the intersection between these two lines?
The data can take huge jumps, and the lines can go from very step to nearly horizontal.
Be careful creating your data frame. You want =, not <-. Also, make id a factor, for clarity.
df <- data.frame(
id = factor(rep(c("supply", "demand"), each = 10)),
price = c(5,7,9,11,13,15,17,19,21,23,20,18,16,14,12,10,8,6,4,2 ),
quantity = c(3,5,7,13,19,31,37,53,61,67,6,18,20,24,40,46,66,70,76,78)
)
First we define common, frequent points to evaluate the quantity at.
quantity_points <- with(
df,
seq(min(quantity), max(quantity), length.out = 500)
)
Now split the dataset into supply/demand parts.
by_id <- split(df[, c("price", "quantity")], df$id)
Then we use approx to calculate the price at each of these quantities, for supply and demand separately.
interpolated_price <- lapply(
by_id,
function(x)
{
with(
x,
approx(
quantity,
price,
xout = quantity_points
)
)$y
}
)
Finally, the crossing point is where the absolute value of the supply price minus the demand price is minimised.
index_of_equality <- with(interpolated_price, which.min(abs(supply - demand)))
quantity_points[index_of_equality]
Related
I have a function that I have written to create a simulation that demonstrates the central limit theorem. I'm not sure if its possible or if I am better off just making separate functions but currently it only stores that data frame containing the mean values of all the trials.
# create function to perform CLT simulation
# where n = sample size, t = number of trials, pop = which population is being used, popmean = population mean,
cltsim <- function(n, t, pop, popmean, popsd, poptitle){
popsim <- data.frame()
# Run the simulation
for(i in n) { # for each value of n
col <- c()
for(j in t) { #loop through each value of t
trial <- 1:j
counter <- j #set up an egg timer based on whichever t value we're on
value <- c()
while(counter > 0) { # and extract n samples from the population
bucket <- sample(pop, i, replace = TRUE)
xbar <- mean(bucket) #calculate the sample mean
value <- c(value, xbar) # and add it to a vector
counter <- counter - 1 #egg timer counts down and loops back until it hits 0
}
sbar <- sd(value) #calculate the sample standard deviation
col <- cbind(trial, value, sbar, i, j) #merge all info together
popsim <- rbind(popsim, col) # attach it to empty dataframe
}
}
#clean up so just the finished data frame is left
rm(col, bucket, value, counter, i, j, n, sbar, t, xbar, trial)
#tidy up data frame in order to graph it
names(popsim) <- c("trial#", "value", "sdev", "samples", "trials")
#view the rows of data in popsim data table
popsim
}
when I try to add any more code that requires creating datatables it doesnt store them, below are the blocks of code I wish to add to the function
g1 <- ggplot(popsim, aes(x = value)) + geom_density(fill = "#09AB30") +
facet_grid(samples ~ trials, labeller = label_both) +
ggtitle(paste("Demonstrating The Central Limit Theorem with Simulation using", poptitle)) +
geom_vline(xintercept = popmean, linetype = "dashed")
g1
and
#create data frame of simulated sample standard deviations \
sdmatrix <- matrix(unique(popsim$sdev), nrow = 4, ncol = 4)
sdf <- as.data.frame(sdmatrix, row.names = c("t10", "t100", "t1000", "t10000"))
names(sdf) <- c("s1", "s10", "s30", "s50")
sdf <- t(sdf)
rm(sdmatrix)
sdf
exvals <- pop1sd/sqrt(c(1, 10, 30, 50))
dfex <- as.data.frame(exvals, row.names = c("s1", "s10", "s30", "s50"))
names(dfex) <- "Predicted Standard Deviations"
dfex
Ive had a look around and I cant find a solution anywhere, am I better off just writing different functions for them? Any advice or input on how to make this lot of code more effective/efficient would be greatly appreciated.
thanks in advance
What I would like to do is creating several boxplots (all displayed in a single boxplot) only from certain values of my original data frame.
My data frame looks as follows:
enter image description here
So now I want R to visualise Parameter ~ Station (Parameter are all variables coloured green and Station is the "station id")
Is there a way to tell R that I want all my Parameters on the x-axis ONLY for BB0028 for example, which would mean that I only take the first 6 values of mean_area, mean_area_exc, esd, feret, min and max into account in the boxplot?
That would look like this:
enter image description here
I tried it in very complicated way to add single boxplots one by one but I am sure there must be a more simple way.
This is what I tried:
bb28 <- df[c(1:6),]
bb28area <- boxplot(bb28$mean_area ~ bb28$BBnr)
bb28area_exc <- boxplot(bb28$mean_area_exc ~ bb28$BBnr)
bb28esd <- boxplot(bb28$mean_esd ~ bb28$BBnr)
bb28feret <- boxplot(bb28$mean_feret ~ bb28$BBnr)
bb28min <- boxplot(bb28$mean_min ~ bb28$BBnr)
bb28max <- boxplot(bb28$mean_max ~ bb28$BBnr)
boxplot(bb28$mean_area ~ bb28$BBnr)
boxplot(bb28$mean_area_exc ~ bb28$BBnr, add=TRUE, at = 1:1+0.45)
Also it doesn't look very nice because in the plot the x-axis does not adjust to the new boxplot which is cut off then:
enter image description here
I hope you can help me with simple a proper code to get my plot.
Thank you!
Cheers, Merle
Maybe the function multi.boxplot below is what you are looking for. It uses base R only.
Data.
First, make up a dataset, since you have not provided us with one in a copy&paste friendly format.
set.seed(1234)
n <- 50
BBnr <- sort(sprintf("BB%04d", sample(28:30, n, TRUE)))
bb28 <- data.frame(col1 = 1:n, col2 = n:1, BBnr = BBnr)
tmp <- matrix(runif(3*n), ncol = 3)
colnames(tmp) <- paste("mean", c("this", "that", "other"), sep = "_")
bb28 <- cbind(bb28, tmp)
rm(BBnr, tmp)
Code.
multi.boxplot <- function(x, by, col=0, ...){
x <- as.data.frame(x)
uniq.by <- unique(by)
len <- length(uniq.by) - 1
n <- ncol(x)
n1 <- n + 1
col <- rep(col, n)[seq_len(n)]
boxplot(x[[ 1 ]] ~ by, at = 0:len*n1 + 1,
xlim = c(0, (len + 1)*n1), ylim = range(unlist(x)), xaxt = "n", col=col[1], ...)
for(i in seq_len(n)[-1])
boxplot(x[[i]] ~ by, at = 0:len*n1 + i, xaxt = "n", add = TRUE, col=col[i], ...)
axis(1, at = 0:len*n1 + n1/2, labels = uniq.by, tick = TRUE)
}
inx <- grep("mean", names(bb28))
multi.boxplot(bb28[inx], by = bb28$BBnr, col = rainbow(length(inx)))
I'm trying to control the color of data points in a custom Lattice function. The idea is the response may be a function of one or two factors. The coloring of the data points should be determined by fac1. Here is the function:
compareCats <-
function(formula = NULL, data = NULL, cols = NULL, ...) {
TwoFac <- FALSE
res <- as.character(formula[[2]])
if (length(formula[[3]]) == 1) {
fac1 <- as.character(formula[[3]])
}
if (length(formula[[3]]) == 3) {
fac2 <- as.character(formula[[3]][3])
fac1 <- as.character(formula[[3]][2])
TwoFac <- TRUE
}
args <- as.list(match.call(expand.dots = FALSE)[-1]) # used a bit later
if (TwoFac) keep <- c(res, fac1, fac2) # Reduce the df and clean it of NAs
if (!TwoFac) keep <- c(res, fac1)
data <- data[, keep]
data <- na.omit(data)
# cols2 is used for the data points according to levels in fac1
if (!TwoFac) cols2 <- cols[data[,fac1]] # works fine
if (TwoFac) {
# In this case, the points and panels are drawn in an order I don't understand
cols2 <- rep(NA_character_, nrow(data))
for (i in 1:nlevels(data[,fac1])) { # make the colors correspond to the original order
tmp <- which(data[,fac1] == levels(data[,fac1])[i])
cols2[tmp] <- cols[i]
}
}
data$cols <- cols2
p <- lattice::xyplot(as.formula(args$formula), # now the plot
data = eval(args$data), ...,
panel = function(x, y, ...) {
lattice::panel.xyplot(x, y, col = data$cols, ...)
}
)
return(p)
}
And here is data and two function calls:
set.seed(13)
mydf <- data.frame(
resp = rnorm(40),
cat1 = sample(LETTERS[1:3], 40, replace = TRUE),
cat2 = sample(letters[1:2], 40, replace = TRUE))
library("lattice")
library("plyr")
# One factor / works fine
p <- compareCats(formula = resp~cat1, data = mydf,
cols = c("red", "orange", "blue"))
print(p)
# Two factors / colors not assigned correctly
p <- compareCats(formula = resp~cat1 | cat2, data = mydf,
cols = c("red", "orange", "blue"))
print(p)
The first produces this diagram:
and the second this one:
How do I get the colors in the 2nd example to be red, orange, blue, red, orange, blue, from left to right? I've tried numerous approaches and can get the first panel to behave sometimes but the 2nd panel seems random. Obviously I don't quite see what order Lattice uses in the 2 factor case, the docs suggest the interaction is used but that still leaves several possibilities, none of which I have been able to figure out.
If you want to change colors of points consistently across panels, I suggested you do that via the more standard groups= argument. I would change these two linesin your compareCats function
cols2 <- factor(cols2, levels=cols) # no need to attach to data
p <- lattice::xyplot(as.formula(args$formula), groups=cols2,
data = eval(args$data), ...,
par.settings=list(superpose.symbol=list(col=cols))
)
Here we use groups= to assign a points to different groups and we use superpose.symbol to specify which color to assign to each group.
Using a simple data frame to illustrate this problem:
df <- data.frame(x=c(1,2,3), y1=c(1,2,3), y2=c(3,4,5))
Single time series plot is easy:
hPlot(y="y1", x="x", data=df)
Cannot figure out how to plot both y1 and y2 together. Tried this but it returns an error
> hPlot(x='x', y=c('y1','y2'), data=df)
Error in .subset2(x, i, exact = exact) : subscript out of bounds
Checking the code in hPlot where it uses [[ to extract one column from input data.frame, does it mean it only works for single time series?
hPlot <- highchartPlot <- function(..., radius = 3, title = NULL, subtitle = NULL, group.na = NULL){
rChart <- Highcharts$new()
# Get layers
d <- getLayer(...)
data <- data.frame(
x = d$data[[d$x]],
y = d$data[[d$y]]
)
Try to use long formatt data with group:
hPlot(x = "x", y = "value", group = "variable", data = reshape2::melt(df, id.vars = "x"))
I have a data frame and I'm trying to create a new variable in the data frame that has the quantiles of a continuous variable var1, for each level of a factor strata.
# some data
set.seed(472)
dat <- data.frame(var1 = rnorm(50, 10, 3)^2,
strata = factor(sample(LETTERS[1:5], size = 50, replace = TRUE))
)
# function to get quantiles
qfun <- function(x, q = 5) {
quantile <- cut(x, breaks = quantile(x, probs = 0:q/q),
include.lowest = TRUE, labels = 1:q)
quantile
}
I tried using two methods, neither of which produce a usable result. Firstly, I tried using aggregate to apply qfun to each level of strata:
qdat <- with(dat, aggregate(var1, list(strata), FUN = qfun))
This returns the quantiles by factor level, but the output is hard to coerce back into a data frame (e.g., using unlist does not line the new variable values up with the correct rows in the data frame).
A second approach was to do this in steps:
tmp1 <- with(dat, split(var1, strata))
tmp2 <- lapply(tmp1, qfun)
tmp3 <- unlist(tmp2)
dat$quintiles <- tmp3
Again, this calculates the quantiles correctly for each factor level, but obviously, as with aggregate they aren't in the correct order in the data frame. We can check this by putting the quantile "bins" into the data frame.
# get quantile bins
qfun2 <- function(x, q = 5) {
quantile <- cut(x, breaks = quantile(x, probs = 0:q/q),
include.lowest = TRUE)
quantile
}
tmp11 <- with(dat, split(var1, strata))
tmp22 <- lapply(tmp11, qfun2)
tmp33 <- unlist(tmp22)
dat$quintiles2 <- tmp33
Many of the values of var1 are outside of the bins of quantile2. I feel like i'm missing something simple. Any suggestions would be greatly appreciated.
I think your issue is that you don't really want to aggregate, but use ave, (or data.table or plyr)
qdat <- transform(dat, qq = ave(var1, strata, FUN = qfun))
#using plyr
library(plyr)
qdat <- ddply(dat, .(strata), mutate, qq = qfun(var1))
#using data.table (my preference)
dat[, qq := qfun(var1), by = strata]
Aggregate usually implies returning an object that is smaller that the original. (inthis case you were getting a data.frame where x was a list of 1 element for each strata.
Use ave on your dat data frame. Full example with your simulated data and qfun function:
# some data
set.seed(472)
dat <- data.frame(var1 = rnorm(50, 10, 3)^2,
strata = factor(sample(LETTERS[1:5], size = 50, replace = TRUE))
)
# function to get quantiles
qfun <- function(x, q = 5) {
quantile <- cut(x, breaks = quantile(x, probs = 0:q/q),
include.lowest = TRUE, labels = 1:q)
quantile
}
And my addition...
dat$q <- ave(dat$var1,dat$strata,FUN=qfun)