I have vectors of different length
For example,
a1 = c(1,2,3,4,5,6,7,8,9,10) a2 = c(1,3,4,5) a3 = c(1,2,5,6,9)
I want to stretch out a2 and a3 to the length of a1, so I can run some algorithms on it that requires the lengths of the vectors to be the same. I would truncate a1 to be same as a2 and a3, but i end up losing valuable data.
ie perhaps a2 could look something like 1 1 1 3 3 3 4 4 5 5 ?
Any suggestions would be great!
thanks
EDIT: I need it to work for vectors with duplicate values, such as c(1,1,2,2,2,2,3,3) and the stretched out values to represent the number of duplicate values in the original vector, for example if i stretched the example vector out to a length of 100 i would expect more two's than one's.
It sounds like you're looking for something like:
lengthen <- function(vec, length) {
vec[sort(rep(seq_along(vec), length.out = length))]
}
lengthen(a2, length(a1))
# [1] 1 1 1 3 3 3 4 4 5 5
lengthen(a3, length(a1))
# [1] 1 1 2 2 5 5 6 6 9 9
lengthen(a4, length(a1))
# [1] 5 5 5 1 1 1 3 3 4 4
lengthen(a5, length(a1))
# [1] 1 1 1 1 1 1 4 4 5 5
Where:
a1 = c(1,2,3,4,5,6,7,8,9,10)
a2 = c(1,3,4,5)
a3 = c(1,2,5,6,9)
a4 = c(5,1,3,4)
a5 = c(1,1,4,5)
One way could be to create a sequence between two points with defined length.
#Put the data in a list
list_data <- list(a1 = a1, a2 = a2, a3 = a3)
#Get the max length
max_len <- max(lengths(list_data))
#Create a sequence
list_data <- lapply(list_data, function(x)
seq(min(x), max(x), length.out = max_len))
#$a1
# [1] 1 2 3 4 5 6 7 8 9 10
#$a2
# [1] 1.000 1.444 1.889 2.333 2.778 3.222 3.667 4.111 4.556 5.000
#$a3
# [1] 1.000 1.889 2.778 3.667 4.556 5.444 6.333 7.222 8.111 9.000
Get them in separate vectors if needed :
list2env(list_data, .GlobalEnv)
This however does not guarantee that your original data points would remain in the data. For example, a2 had 3 and 4 in data but it is not present in this modified vector.
Related
I have a list of factor, and an initial base number (i.e. 100), would like to multiply the base with the factor 1 to fill the first factored position, and then be able to use the first position and multiply it by factor position 2, output it, and multiply the result with factor 3, output accordingly. please see sample data and code below for details.
Sample Data:
base <- 100
myList3 <- list()
myList3[[1]]<- as.data.frame(matrix(c(1,1,1,1,2,3,1,0.9,0.8), nrow=3, ncol=3))
myList3[[2]]<- as.data.frame(matrix(c(2,2,2,1,2,3,1,0.8,0.7), nrow=3, ncol=3))
myList3[[3]]<- as.data.frame(matrix(c(3,3,3,1,2,3,1,0.8,0.9), nrow=3, ncol=3))
colnames <- c("path","month", "factor")
factor<-lapply(myList3, setNames,colnames)
print(factor)
> print(factor)
[[1]]
path month factor
1 1 1 1.0
2 1 2 0.9
3 1 3 0.8
[[2]]
path month factor
1 2 1 1.0
2 2 2 0.8
3 2 3 0.7
[[3]]
path month factor
1 3 1 1.0
2 3 2 0.8
3 3 3 0.9
Tried to write a function, not working
Function <- function(x,y) {
for (k in 2:100){
x[1,3] <- base
x[k,3] <- x[k-1,3]*y[k,3]
}
return(x)
}
x <- lapply(Function,x,y)
Desired Output:
myList3 <- list()
myList3[[1]]<- as.data.frame(matrix(c(1,1,1,1,2,3,100,90,72), nrow=3, ncol=3))
myList3[[2]]<- as.data.frame(matrix(c(2,2,2,1,2,3,100,80,56), nrow=3, ncol=3))
myList3[[3]]<- as.data.frame(matrix(c(3,3,3,1,2,3,100,80,72), nrow=3, ncol=3))
colnames <- c("path","month", "data")
data<-lapply(myList3, setNames,colnames)
print(data)
> print(data)
[[1]]
path month data
1 1 1 100
2 1 2 90
3 1 3 72
[[2]]
path month data
1 2 1 100
2 2 2 80
3 2 3 56
[[3]]
path month data
1 3 1 100
2 3 2 80
3 3 3 72
We can use transform with cumprod
lapply(factor, transform, factor = cumprod(factor) * 100)
#[[1]]
# path month factor
#1 1 1 100
#2 1 2 90
#3 1 3 72
#[[2]]
# path month factor
#1 2 1 100
#2 2 2 80
#3 2 3 56
#[[3]]
# path month factor
#1 3 1 100
#2 3 2 80
#3 3 3 72
Or another option is Reduce with *
lapply(factor, transform, factor = 100 * Reduce(`*`, factor, accumulate = TRUE))
The tidyverse option would be
library(dplyr)
library(purrr)
map(factor, ~ .x %>%
mutate(factor = cumprod(factor) * 100))
NOTE: It is better not to name object names or column names with function names (factor is a function)
Based on my previous question, I need help with using the mapply function correctly.
x <- data.frame(a = seq(1,3), b = seq(2,4), c = seq(3,5), d = seq(4,6), b2 = seq(5,7), c2 = seq(6,8), d2 = seq(7,9))
# a b c d b2 c2 d2
# 1 2 3 4 5 6 7
# 2 3 4 5 6 7 8
# 3 4 5 6 7 8 9
My goal is to look at the columns b2 to d2 and, based on their values, change the values in columns b to d respectively. I can do this for a single column quite easily:
x[which(x$b2 == 7),][b] <- NA_real_
My problem is that I want this applied across all my columns but I don't know how to convert this single column formula to work on multiple columns. I tried:
onez <- c(2:4)
twoz <- c(5:7)
f <- function(df, ones, twos) {
df[which(df[,twos] == 7),][ones] <- NA_real_
}
mapply(f, df = x, ones = onez, twos = twoz)
But I'm getting error messages (incorrect dimensions etc) and I see that my function is messy but I lack the knowledge how to fix it.
One way to do it is to tell it to:
Get the subset of the data frame with columns 5, 6, 7: x[5:7]
Check from that subset which values satisfy your condition: x[5:7] == 7
Replace those values with NA: ... <- NA
This gives the following,
x[5:7][x[5:7] == 7] <- NA
x
# a b c d b2 c2 d2
#1 1 2 3 4 5 6 NA
#2 2 3 4 5 6 NA 8
#3 3 4 5 6 NA 8 9
If you want the NAs to be replaced at x[2:4], then you can do,
x[2:4][x[5:7] == 7] <- NA
x
# a b c d b2 c2 d2
#1 1 2 3 NA 5 6 7
#2 2 3 NA 5 6 7 8
#3 3 NA 5 6 7 8 9
I want to bind more then two tables together with rbind() in R.
Based on my experience with R-problems I am sure the solution is easy. But I don't get it. Please see this example data
# create sample data
set.seed(0)
df <- data.frame(A = 0,
B1 = sample(c(1:3, NA), 10, replace=TRUE),
B2 = sample(c(1:3, NA), 10, replace=TRUE),
B3 = sample(c(1:3, NA), 10, replace=TRUE),
C = 0)
# names of relevant objects
n <- names(df)[startsWith(names(df), 'B')]
You see (in n) I just want to use a selection of objects of a data.frame.
No I create tables out of them and bind there rows to gether for better presentation.
t1 <- table(df$B1, useNA="always")
t2 <- table(df$B2, useNA="always")
t3 <- table(df$B3, useNA="always")
# this is a workaround
print( rbind(t1, t2, t3) )
But I would like to make this code more easier because my real data has a lot more tables then three.
This here doesn't work
# this is what I "want" but doesn't work
print( rbind( table(df[,n])) )
# another try
do.call('rbind', list(table(df[,n])))
Where is the error in my thinking?
We can lapply over selected columns and then use table on individual of them and join them together using rbind
do.call("rbind", lapply(df[n], function(x) table(x, useNA = "always")))
# 1 2 3 <NA>
#B1 1 2 3 4
#B2 3 3 2 2
#B3 3 3 1 3
This can also be done using apply with margin = 2 (column-wise)
t(apply(df[n], 2, function(x) table(x, useNA = "always")))
# 1 2 3 <NA>
#B1 1 2 3 4
#B2 3 3 2 2
#B3 3 3 1 3
You can do:
table(stack(df[n])[2:1],useNA = 'always')[-4,]
values
ind 1 2 3 <NA>
B1 1 2 3 4
B2 3 3 2 2
B3 3 3 1 3
well if you do not want to reverse by using [2:1], you can transpose:
t(table(stack(df[n]),useNA = 'always'))[-4,]
values
ind 1 2 3 <NA>
B1 1 2 3 4
B2 3 3 2 2
B3 3 3 1 3
if you want it as a data.frame:
as.data.frame.matrix(table(stack(df[n])[2:1],useNA = 'always')[-4,])
1 2 3 NA
B1 1 2 3 4
B2 3 3 2 2
B3 3 3 1 3
I want to process data frame as follows, where I want to get the sum of 2 vectors and append it to a data frame as a row vector. 2 vectors are row vector of considering row and column vector which start just below the considering row with a fixed length.
data
A b1 b2 b3
1 2 2 2
2 3 3 3
3 4 4 4
4 5 5 5
5 6 6 6
output (expected)
A b1 b2 b3
1 4 5 6
2 6 7 8
3 8 9 -
4 10 - -
5 - - -
In the example if 1st row is considered, two vectors are
row vector r- [2 2 2]
column vector c - [2,3,4]
After getting the transpose of column vector I can add tow vectors and append it to a new data frame. This process must be done to all the rows.
Easiest way to do this is looping, but in R loops are not efficient, instead apply function can be used. However in this scenario, to do that need to know what is the current row number.
Is there a way to do this efficiently in R
1) rollapply We can use rollapply to form the matrix of subvectors of A and then add that together with an initial column of zero to m. Note that we pad A with NA values so that the result of rollapply is the appropriate shape.
library(zoo)
m <- cbind(A = 1:5, b1 = 2:6, b2 = 2:6, b3 = 2:6) # input matrix
nc1 <- ncol(m) - 1
A <- c(m[, 1], rep(NA, nc1))
cbind(0, rollapply(A[-1], nc1, c)) + m
giving:
A b1 b2 b3
[1,] 1 4 5 6
[2,] 2 6 7 8
[3,] 3 8 9 NA
[4,] 4 10 NA NA
[5,] 5 NA NA NA
2) base This solution is similar but does not use any packages. The first two lines are the same as in (1).
nc1 <- ncol(m) - 1
A <- c(m[, 1], rep(NA, nc1))
cbind(0, embed(A[-1], nc1)[, seq(nc1, 1)]) + m
giving:
A b1 b2 b3
[1,] 1 4 5 6
[2,] 2 6 7 8
[3,] 3 8 9 NA
[4,] 4 10 NA NA
[5,] 5 NA NA NA
I have a vector that tells me, for each row in a date frame, the column index for which the value in this row should be updated.
> set.seed(12008); n <- 10000; d <- data.frame(c1=1:n, c2=2*(1:n), c3=3*(1:n))
> i <- sample.int(3, n, replace=TRUE)
> head(d); head(i)
c1 c2 c3
1 1 2 3
2 2 4 6
3 3 6 9
4 4 8 12
5 5 10 15
6 6 12 18
[1] 3 2 2 3 2 1
This means that for rows 1 and 4, c3 should be updated; for rows 2, 3 and 5, c2 should be updated (among others). What is the cleanest way to achieve this in R using vectorized operations, i.e, without apply and friends? EDIT: And, if at all possible, without R loops?
I have thought about transforming d into a matrix and then address the matrix elements using an one-dimensional vector. But then I haven't found a clean way to compute the one-dimensional address from the row and column indexes.
With your example data, and using only the first few rows (D and I below) you can easily do what you want via a matrix as you surmise.
set.seed(12008)
n <- 10000
d <- data.frame(c1=1:n, c2=2*(1:n), c3=3*(1:n))
i <- sample.int(3, n, replace=TRUE)
## just work with small subset
D <- head(d)
I <- head(i)
First, convert D into a matrix:
dmat <- data.matrix(D)
Next compute the indices of the vector representation of the matrix corresponding to rows and columns indicated by I. For this, it is easy to generate the row indices as well as the column index (given by I) using seq_along(I) which in this simple example is the vector 1:6. To compute the vector indices we can use:
(I - 1) * nrow(D) + seq_along(I)
where the first part ( (I - 1) * nrow(D) ) gives us the correct multiple of the number of rows (6 here) to index the start of the Ith column. We then add on the row index to get the index for the n-th element in the Ith column.
Using this we just index into dmat using "[", treating it like a vector. The replacement version of "[" ("[<-") allows us to do the replacement in a single line. Here I replace the indicated elements with NA to make it easier to see that the correct elements were identified:
> dmat
c1 c2 c3
1 1 2 3
2 2 4 6
3 3 6 9
4 4 8 12
5 5 10 15
6 6 12 18
> dmat[(I - 1) * nrow(D) + seq_along(I)] <- NA
> dmat
c1 c2 c3
1 1 2 NA
2 2 NA 6
3 3 NA 9
4 4 8 NA
5 5 NA 15
6 NA 12 18
If you are willing to first convert your data.frame to a matrix, you can index elements-to-be-replaced using a two-column matrix. (Beginning with R-2.16.0, this will be possible with data.frames directly.) The indexing matrix should have row indices in its first column and column indices in its second column.
Here's an example:
## Create a subset of the your data
set.seed(12008); n <- 6
D <- data.frame(c1=1:n, c2=2*(1:n), c3=3*(1:n))
i <- seq_len(nrow(D)) # vector of row indices
j <- sample(3, n, replace=TRUE) # vector of column indices
ij <- cbind(i, j) # a 2-column matrix to index a 2-D array
# (This extends smoothly to higher-D arrays.)
## Convert it to a matrix
Dmat <- as.matrix(D)
## Replace the elements indexed by 'ij'
Dmat[ij] <- NA
Dmat
# c1 c2 c3
# [1,] 1 2 NA
# [2,] 2 NA 6
# [3,] 3 NA 9
# [4,] 4 8 NA
# [5,] 5 NA 15
# [6,] NA 12 18
Beginning with R-2.16.0, you will be able to use the same syntax for dataframes (i.e. without having to first convert dataframes to matrices).
From the R-devel NEWS file:
Matrix indexing of dataframes by two column numeric indices is now supported for replacement as well as extraction.
Using the current R-devel snapshot, here's what that looks like:
D[ij] <- NA
D
# c1 c2 c3
# 1 1 2 NA
# 2 2 NA 6
# 3 3 NA 9
# 4 4 8 NA
# 5 5 NA 15
# 6 NA 12 18
Here's one way:
d[which(i == 1), "c1"] <- "one"
d[which(i == 2), "c2"] <- "two"
d[which(i == 3), "c3"] <- "three"
c1 c2 c3
1 1 2 three
2 2 two 6
3 3 two 9
4 4 8 three
5 5 two 15
6 one 12 18