In my function below, from time to time (please run a few times to see), I get an error message from inside the data.frame that says x and y differ by 2 rows.
I was wondering how this occasional error could be fixed?
x = rnorm(1e2)
h = hist(x = x, plot = F)
DF = data.frame(
x = unlist(sapply(1:length(h$mids), function(i) rep(h$mids[i], each = h$counts[i]))),
y = unlist(sapply(h$counts, function(c) 1:c)))
plot(DF$x, DF$y)
Error in data.frame(x = unlist(sapply(1:length(h$mids), function(i) rep(h$mids[i], :
arguments imply differing number of rows: 100, 102
You get some h$counts as 0 and when you run unlist(sapply(h$counts, function(c) 1:c))) it generates a sequence from 1:0 which is unwanted. You can modify the way you create the dataframe and it should work ok.
DF1 <- data.frame(x = rep(h$mids, h$counts),y = sequence(h$counts))
Related
I have a dataframe that has two columns, x and y (both populated with numbers). I am trying to look at a moving window within the data, and I've done it like this (source):
# Extract just x and y from the original data frame
df <- dat_fin %>% select(x, y)
# Moving window creation
nr <- nrow(df)
windowSize <- 10
windfs <- lapply(seq_len(nr - windowSize + 1), function(i) df[i:(i + windowSize - 1), ])
This lapply creates a list of tibbles that are each 10 (x, y) pairs. At this point, I am trying to compute a single quantity using each of the sets of 10 pairs; my current (not working) code looks like this:
library(shotGroups)
for (f in 1:length(windfs)) {
tsceps[f] = getCEP(windfs[f], accuracy = TRUE)
}
When I run this, I get the error:
Error in getCEP.default(windfs, accuracy = TRUE) : xy must be numeric
My goal is that the variable that I've called tsceps should be a 1 x length(windfs) data frame, each value in which comes from the getCEP calculation for each of the windowed subsets.
I've tried various things with unnest and unlist, all of which were unsuccessful.
What am I missing?
Working code:
df <- dat_fin %>% select(x, y)
nr <- nrow(df)
windowSize <- 10
windfs <- lapply(seq_len(nr - windowSize + 1), function(i) df[i:(i + windowSize - 1), ])
tsceps <- vector(mode = "numeric", length = length(windfs))
library(shotGroups)
for (j in 1:length(windfs)) {
tsceps[j] <- getCEP(windfs[[j]], type = "CorrNormal", CEPlevel = 0.50, accuracy = TRUE)
}
ults <- unlist(tsceps)
ults_cep <- vector(mode = "numeric", length = length(ults))
for (k in 1:length(ults)) {
ults_cep[k] <- ults[[k]]
}
To get this working with multiple type arguments to getCEP, just use additional code blocks for each type required.
I am working on an assignment, which tasks me to generate a list of data, using the below code.
##Use the make_data function to generate 25 different datasets, with mu_1 being a vector
x <- seq(0, 3, len=25)
make_data <- function(a){
n = 1000
p = 0.5
mu_0 = 0
mu_1=a
sigma_0 = 1
sigma_1 = 1
y <- rbinom(n, 1, p)
f_0 <- rnorm(n, mu_0, sigma_0)
f_1 <- rnorm(n, mu_1, sigma_1)
x <- ifelse(y == 1, f_1, f_0)
test_index <- createDataPartition(y, times = 1, p = 0.5, list = FALSE)
list(train = data.frame(x = x, y = as.factor(y)) %>% slice(-test_index),
test = data.frame(x = x, y = as.factor(y)) %>% slice(test_index))
}
dat <- sapply(x,make_data)
The code looks good to go, and 'dat' appears to be a 25 column, 2 row table, each with its own data frame.
Now, each data frame within a cell has 2 columns.
And this is where I get stuck.
While I can get to the data frame in row 1, column 1, just fine (i.e. just use dat[1,1]), I can't reach the column of 'x' values within dat[1,1]. I've experimented with
dat[1,1]$x
dat[1,1][1]
But they only throw weird responses: error/null.
Any idea how I can pull the column? Thanks.
dat[1, 1] is a list.
class(dat[1, 1])
#[1] "list"
So to reach to x you can do
dat[1, 1]$train$x
Or
dat[1, 1][[1]]$x
As a sidenote, instead of having this 25 X 2 matrix as output in dat I would actually prefer to have a nested list.
dat <- lapply(x,make_data)
#Access `x` column of first list from `train` dataset.
dat[[1]]$train$x
However, this is quite subjective and you can chose whatever format you like the best.
I'd like to convert a tbl_cube with a list-valued measure into a tibble, but am running into problems.
For example
Y <- array(1:4, dim = c(2,2))
is a "cube" of integers (it only has two axes so it's really a matrix) which can be turned into a tbl_cube as follows
Y_cube <- tbl_cube(dimensions = list(A = c('foo','bar'),
B = c('x','y')),
measures = list(y = Y))
which in turn can be converted into a tibble as follows as_tibble(Y_cube).
However, if we start with a list-valued array then we can create the tbl_cube:
X <- array(lapply(1:4, seq_len), dim = c(2,2))
X_cube <- tbl_cube(dimensions = list(A = c('foo','bar'),
B = c('x','y')),
measures = list(x = X))
but as_tibble(X_cube) generates an error:
Error: Can't subset elements that don't exist.
x The locations 3 and 4 don't exist.
i There are only 2 elements.
I'm using dplyr version dplyr_0.8.5.
I have a function that uses matplot to plot some data. Data structure is like this:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
matplot(test[,-1])
matlines(test[,1], test[,-1])
So far so good. However, if there are missing values in the data set, then there are gaps in the resulting plot, and I would like to avoid those by connecting the edges of the gaps.
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
In the real situation this is inside a function, the dimension of the matrix is bigger and the number of rows, columns and the position of the non-overlapping missing values may change between different calls, so I'd like to find a solution that could handle this in a flexible way. I also need to use matlines
I was thinking maybe filling in the gaps with intrapolated data, but maybe there is a better solution.
I came across this exact situation today, but I didn't want to interpolate values - I just wanted the lines to "span the gaps", so to speak. I came up with a solution that, in my opinion, is more elegant than interpolating, so I thought I'd post it even though the question is rather old.
The problem causing the gaps is that there are NAs between consecutive values. So my solution is to 'shift' the column values so that there are no NA gaps. For example, a column consisting of c(1,2,NA,NA,5) would become c(1,2,5,NA,NA). I do this with a function called shift_vec_na() in an apply() loop. The x values also need to be adjusted, so we can make the x values into a matrix using the same principle, but using the columns of the y matrix to determine which values to shift.
Here's the code for the functions:
# x -> vector
# bool -> boolean vector; must be same length as x. The values of x where bool
# is TRUE will be 'shifted' to the front of the vector, and the back of the
# vector will be all NA (i.e. the number of NAs in the resulting vector is
# sum(!bool))
# returns the 'shifted' vector (will be the same length as x)
shift_vec_na <- function(x, bool){
n <- sum(bool)
if(n < length(x)){
x[1:n] <- x[bool]
x[(n + 1):length(x)] <- NA
}
return(x)
}
# x -> vector
# y -> matrix, where nrow(y) == length(x)
# returns a list of two elements ('x' and 'y') that contain the 'adjusted'
# values that can be used with 'matplot()'
adj_data_matplot <- function(x, y){
y2 <- apply(y, 2, function(col_i){
return(shift_vec_na(col_i, !is.na(col_i)))
})
x2 <- apply(y, 2, function(col_i){
return(shift_vec_na(x, !is.na(col_i)))
})
return(list(x = x2, y = y2))
}
Then, using the sample data:
test <- data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] <- NA
test$b[7] <- NA
lst <- adj_data_matplot(test[,1], test[,-1])
matplot(lst$x, lst$y, type = "b")
You could use the na.interpolation function from the imputeTS package:
test = data.frame(x = 1:10, a = 1:10, b = 11:20)
test$a[3:4] = NA
test$b[7] = NA
matplot(test[,-1])
matlines(test[,1], test[,-1])
library('imputeTS')
test <- na.interpolation(test, option = "linear")
matplot(test[,-1])
matlines(test[,1], test[,-1])
Had also the same issue today. In my context I was not permitted to interpolate. I am providing here a minimal, but sufficiently general working example of what I did. I hope it helps someone:
mymatplot <- function(data, main=NULL, xlab=NULL, ylab=NULL,...){
#graphical set up of the window
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)
#plot the data
for(i in 1:nrow(data)){
nin.na <- !is.na(data[i,])
lines(x=which(nin.na), y=data[i,nin.na], col = i,...)
}
}
The core 'trick' is in x=which(nin.na). It aligns the data points of the line consistently with the indices of the x axis.
The lines
plot.new()
plot.window(xlim=c(1,ncol(data)), ylim=range(data, na.rm=TRUE))
mtext(text = xlab,side = 1, line = 3)
mtext(text = ylab,side = 2, line = 3)
mtext(text = main,side = 3, line = 0)
axis(1L)
axis(2L)`
draw the graphical part of the window.
range(data, na.rm=TRUE) adapts the plot to a proper size being able to include all data points.
mtext(...) is used to label the axes and provides the main title. The axes themselves are drawn by the axis(...) command.
The following for-loop plots the data.
The function head of mymatplot provides the ... argument for an optional passage of typical plot parameters as lty, lwt, cex etc. via . Those will be passed on to the lines.
At last word on the choice of colors - they are up to your flavor.
I have a df as follows:
t r
1 0 100.00000
2 1 135.86780
3 2 149.97868
4 3 133.77316
5 4 97.08129
6 5 62.15988
7 6 50.19177
and so on...
I want to apply a rolling regression using lm(r~t).
However, I want to estimate one model for each iteration, where the iterations occur over a set time window t+k. Essentially, the first model should be estimated with t=0,t=1,...t=5, if k = 5, and the second model estimated with t=1, t=2,...,t=6, and so on.
In other words, it iterates from a starting point with a set window t+k where k is some pre-specified window length and applies the lm function over that particular window length iteratively.
I have tried using lapply like this:
mdls = lapply(df, function(x) lm(r[x,]~t))
However, I got the following error:
Error in r[x, ] : incorrect number of dimensions
If I remove the [x,], each iteration gives me the same model, in other words using all the observations.
If I use rollapply:
coefs = rollapply(df, 3, FUN = function(x) coef(lm(r~t, data =
as.data.frame(x))), by.column = FALSE, align = "right")
res = rollapply(df, 3, FUN = function(z) residuals(lm(r~t, data =
as.data.frame(z))), by.column = FALSE, align = "right")
Where:
t = seq(0,15,1)
r = (100+50*sin(0.8*t))
df = as.data.frame(t,r)
I get 15 models, but they are all estimated over the entire dataset, providing the same intercepts and coefficients. This is strange as I managed to make rollapply work just before testing it in a new script. For some reason it does not work again, so I am perplexed as to whether R is playing tricks on me, or whether there is something wrong with my code.
How can I adjust these methods to make sure they iterate according to my wishes?
I enclose a possible solution. The idea is to use a vector 1: nrow (df) in the function rollapply to indicate which rows we want to select.
df = data.frame(t = 0:6, r = c(100.00000, 135.86780, 149.97868, 133.77316, 97.08129, 62.15988, 50.19177))
N = nrow(df)
require(zoo)
# Coefficients
coefs <- rollapply(data = 1:N, width = 3, FUN = function(x){
r = df$r[x]
t = df$t[x]
out <- coef(lm(r~t))
return(out)
})
# Residuals
res <- rollapply(data = 1:N, width = 3, FUN = function(x){
r = df$r[x]
t = df$t[x]
out <- residuals(lm(r~t))
return(out)
})