R error when calling a function with lapply - r

I have a dataset with a column composed by numbers
dati$score[10:15]
[1] 7576 6362 764663 676164 764676 6364
I have this function which calculates the sums of the number in a cell which i found here on stackoverflow and works when i apply it singularly
digitsum <- function(x) sum(floor(x / 10^(0:(nchar(x) - 1))) %% 10)
I can't apply this to the column dati$score, i get this error, i've tried using lapply and a for cycle
for (i in 1:lunghscore){
f <- dati[i,"score"]
post <- sum(floor(f / 18^(0:(nchar(f) - 1))) %% 18)
dati[i,"score"] <- post
i <- i + 1
}
lapply
dati[,"score"] <- lapply(X = dati[,"score"],FUN = digitsum)
I get this error
2: In `[<-.data.frame`(`*tmp*`, , "score", value = list(20, 17, 26, :
provided 66121 variables to replace 1 variables
How can i apply the function digitsum to every cell in that column?

The problem is that the output of a list is always a list, and you fill a vector with elements of a list. Your code works if you unlist your lapply function as shown in the pet example below:
> digitsum <- function(x) sum(floor(x / 10^(0:(nchar(x) - 1))) %% 10)
> dati <- data.frame(matrix(250:255, ncol = 2))
> dati
X1 X2
1 250 253
2 251 254
3 252 255
> lapply(dati[, "X2"], digitsum)
[[1]]
[1] 10
[[2]]
[1] 11
[[3]]
[1] 12
> dati[, "X2"]<-lapply(dati[, "X2"], digitsum)
Warning message:
In `[<-.data.frame`(`*tmp*`, , "X2", value = list(10, 11, 12)) :
provided 3 variables to replace 1 variables
And the solution:
> dati[, "X2"]<-unlist(lapply(dati[, "X2"], digitsum))
Best, Thomas

Related

reading 0 in front of numeric

vec1 <- c(26, 12, 13, 20, 9)
vac1 <- decode_vec(vec1)
The result :
vac1 : "11010" "01100" "01101" "10100" "01001"
I'm changing vac1 into numeric but 0 keeps disappear.
test_1 <- as.numeric(vac1)
result
11010 1100 1101 10100 1001
I tried to recover 0 with sprintf() but it rechanged the vector into character.
test_2 <- sprintf("%05d", test_1)
"11010" "01100" "01101" "10100" "01001"
I want to make the morse codes into numeric form without losing any 0
Numeric vectors will not print with 0 prefixes but we can define our own S3 class that does. We have defined as.bin.numeric, as.data.frame.bin, format.bin and print.bin methods. We have not defined a [.bin method so subscripted values will have to be cast back as shown below. Internally the bin class is stored as ordinary numbers. Define other methods as needed.
library(dst)
library(zoo)
as.bin <- function(x, ...) UseMethod("as.bin")
as.bin.numeric <- function(x, ...) structure(x, class = "bin")
as.data.frame.bin <- zoo:::as.data.frame.yearmon
format.bin <- function(x, ...) {
x <- unclass(x)
n <- max(floor(log2(x)) + 1)
base <- rep(2, n)
sapply(x, function(y) paste0(encode(base, y), collapse = ""))
}
print.bin <- function(x, ...) print(format(x), ...)
Now test these
v0 <- c(26, 12, 13, 20, 9)
v <- as.bin(v0)
as.numeric(v)
## [1] 26 12 13 20 9
v
## [1] "11010" "01100" "01101" "10100" "01001"
v + 1
## [1] "11011" "01101" "01110" "10101" "01010"
as.bin(v[1])
## [1] "11010"
data.frame(v = v)
## v
## 1 11010
## 2 01100
## 3 01101
## 4 10100
## 5 01001

Sum all integers > 9 individually in R. E.g. 10 = 1+0, 11 = 1+1

Im trying to write a function based on the Luhn algorithm (mod 10 algorithm), and I need a function that sums all integers > 9 in my number vector individually. E.g. 10 should sum to 1+0=1, and 19 should sum to 1+9=10. Example code:
nmr <- ("1_9_8_2_0_5_0_1_3_3_4_8")
nmr <- strsplit(nmr, "_")
nmr <- as.numeric(as.character(unlist(nmr[[1]])))
luhn_alg <- c(0,0,2,1,2,1,2,1,2,1,2,0)
x <- nmr*luhn_alg
x
[1] 0 0 16 2 0 5 0 1 6 3 8 0
sum(x)
[1] 41
I dont want the sum of x to equal 41. Instead I want the sum to equal: 0+0+1+6+2+0+5+0+1+6+3+8+0=32. I tried with a for loop but doesn't seem to get it right. Any help is much appreciated.
You may need to split the data again after multiplying it with luhn_alg.
Luhn_sum <- function(x, y) {
nmr <- as.numeric(unlist(strsplit(x, "_")))
x1 <- nmr*y
x1 <- as.numeric(unlist(strsplit(as.character(x1), '')))
sum(x1)
}
nmr <- ("1_9_8_2_0_5_0_1_3_3_4_8")
luhn_alg <- c(0,0,2,1,2,1,2,1,2,1,2,0)
Luhn_sum(nmr, luhn_alg)
#[1] 32
You can use substring and seq to create a vector of single digit numbers, then you only need to do a sum over them:
sum(
as.numeric(
substring(
paste(x, collapse = ""),
seq(1, sum(nchar(x)), 1),
seq(1, sum(nchar(x)), 1)
)
)
)

Multiply values of column with itself in R

I am trying to multiply elements of column with itself but am unable to do it.
I have column A with values a, b, c, I want answer as (a*b + a*c + b*c).
For example, with
A <- c(2, 3, 5) the expected output is sum(6 + 10 + 15) = 31.
I am trying to run for loop to execute but was failing. Can anyone please provide R code to do this.
example data :
df1 <- data.frame(A=c(2,3,5))
combn will give you the combinations
combinations <- combn(df1$A,2)
# [,1] [,2] [,3]
# [1,] 2 2 3
# [2,] 3 5 5
apply with margin 2 (by columns), will do the multiplication
multiplied_terms <- apply(combinations,2,function(x) x[1]*x[2])
# [1] 6 10 15
Or shorter and more general, thanks to #zacdav :
multiplied_terms <- apply(combinations,2,prod)
then we can sum them
output <- sum(multiplied_terms)
# [1] 31
Piped for a compact solution:
library(magrittr)
df1$A %>% combn(2) %>% apply(2,prod) %>% sum
Here's another way. Approach by #Moody_Mudskipper maybe easier to extend to groups of 3 etc. But, I think this should be much faster since there isn't the need to actually find the combinations.
Using for loop
It just goes through the vector A multiplying the rest of the elements until the last one.
len <- length(A)
res <- numeric(0)
for (j in seq_len(len - 1))
res <- res + sum(A[j] * A[(j+1) : len]))
res
#[1] 31
Using lapply or sapply
The for loop can be replaced by using lapply
res <- sum(unlist(lapply(1 : (len - 1), function(j) sum(A[j] * A[(j+1) : len]))))
or sapply,
res <- sum(sapply(1 : (len - 1), function(j) sum(A[j] * A[(j+1) : len])))
I didn't check which of these is the fastest.
# If you need to store the pairwise multiplications, then use the following;
# res <- NULL
# for (j in 1 : (len-1))
# res <- c(res, A[j] * A[(j+1) : len])
# res
# [1] 6 10 15
# sum(res)
# [1] 31

R: find consecutive occurrence of a number

first define some function to bind list rowwise and column wise
# a function to append vectors row wise
rbindlist <- function(list) {
n <- length(list)
res <- NULL
for (i in seq(n)) res <- rbind(res, list[[i]])
return(res)
}
cbindlist <- function(list) {
n <- length(list)
res <- NULL
for (i in seq(n)) res <- cbind(res, list[[i]])
return(res)
}
# generate sample data
sample.dat <- list()
set.seed(123)
for(i in 1:365){
vec1 <- sample(c(0,1), replace=TRUE, size=5)
sample.dat[[i]] <- vec1
}
dat <- rbindlist(sample.dat)
dat has five columns. Each column is a location and has 365 days of the year (365 rows) with values 1 or 0.
I have another dataframe (see below) which has certain days of the year for each column (location) in dat.
# generate second sample data
set.seed(123)
sample.dat1 <- list()
for(i in 1:5){
vec1 <- sort(sample(c(258:365), replace=TRUE, size=4), decreasing = F)
sample.dat1[[i]] <- vec1
}
dat1 <- cbindlist(sample.dat1)
I need to use dat1 to subset days in dat to do a calculation. An example below:
1) For location 1 (first column in both dat1 and dat):
In column 1 of dat, select the days from 289 till 302 (using dat1), find the longest consecutive occurrence of 1.
Repeat it and this time select the days from 303 (302 + 1) till 343 from dat, find the longest consecutive occurrence of 1.
Repeat it for 343 till 353: select the days from 344 (343 + 1) till 353, find the longest consecutive occurrence of 1.
2) Do this for all the columns
If I want to do sum of 1s, I can do this:
dat <- as.tibble(dat)
dat1 <- as.tibble(dat1)
pmap(list(dat,dat1), ~ {
range1 <- ..2[1]
range2 <- ..2[2]
range3 <- ..2[3]
range4 <- ..2[4]
sum.range1 <- sum(..1[range1:range2]) # this will generate sum between range 1 and range 2
sum.range2 <- sum(..1[range2:range3]) # this will generate sum between range 2 and range 3
sum.range3 <- sum(..1[range3:range4]) # this will generate sum between range 3 and range 4
c(sum.range1=sum.range1,sum.range2=sum.range2,sum.range3=sum.range3)
})
For longest consequtive occurrence of 1 between each range, I thought of using the rle function. Example below:
pmap(list(dat,dat1), ~ {
range1 <- ..2[1]
range2 <- ..2[2]
range3 <- ..2[3]
range4 <- ..2[4]
spell.range1 <- rle(..1[range1:range2]) # sort the data, this shows the longest run of ANY type (0 OR 1)
spell.1.range1 <- tapply(spell.range1$lengths, spell.range1$values, max)[2] # this should select the maximum consequtive run of 1
spell.range2 <- rle(..1[range2:range3]) # sort the data, this shows the longest run of ANY type (0 OR 1)
spell.1.range2 <- tapply(spell.range2$lengths, spell.range2$values, max)[2] # this should select the maximum consequtive run of 1
spell.range3 <- rle(..1[range3:range4]) # sort the data, this shows the longest run of ANY type (0 OR 1)
spell.1.range3 <- tapply(spell.range3$lengths, spell.range3$values, max)[2] # this should select the maximum consequtive run of 1
c(spell.1.range1 = spell.1.range1, spell.1.range2 = spell.1.range2, spell.1.range3 = spell.1.range3)
})
I get an error which I think is because I am not using the rle function properly here. I would really like to keep the code as above since
my others code are in the same pattern and format of the outputs is suited for my need, so I would appreciate if someone can suggest how to fix it.
OP's code does work for me. So, without a specific error message it is impossible to understand why the code is not working for the OP.
However, the sample datasets created by the OP are matrices (before they were coerced to tibble) and I felt challenged to find a way to solve the task in base R without using purrr:
To find the number of consecutive occurences of a particular value val in a vector x we can use the following function:
max_rle <- function(x, val) {
y <- rle(x)
len <- y$lengths[y$value == val]
if (length(len) > 0) max(len) else NA
}
Examples:
max_rle(c(0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), 1)
[1] 4
max_rle(c(0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), 0)
[1] 2
# find consecutive occurrences in column batches
lapply(seq_len(ncol(dat1)), function(col_num) {
start <- head(dat1[, col_num], -1L)
end <- tail(dat1[, col_num], -1L) - 1
sapply(seq_along(start), function(range_num) {
max_rle(dat[start[range_num]:end[range_num], col_num], 1)
})
})
[[1]]
[1] 8 4 5
[[2]]
[1] 4 5 2
[[3]]
[1] NA 3 4
[[4]]
[1] 5 5 4
[[5]]
[1] 3 2 3
The first lapply() loops over the columns of dat and dat1, resp. The second sapply() loops over the row ranges stored in dat1 and subsets dat accordingly.

R Sum every k columns in matrix

I have a matrix temp1 (dimensions Nx16) (generally, NxM)
I would like to sum every k columns in each row to one value.
Here is what I got to so far:
cbind(rowSums(temp1[,c(1:4)]), rowSums(temp1[,c(5:8)]), rowSums(temp1[,c(9:12)]), rowSums(temp1[,c(13:16)]))
There must be a more elegant (and generalized) method to do it.
I have noticed similar question here:
sum specific columns among rows
couldn't make it work with Ananda's solution;
Got following error:
sapply(split.default(temp1, 0:(length(temp1)-1) %/% 4), rowSums)
Error in FUN(X[[1L]], ...) :
'x' must be an array of at least two dimensions
Please advise.
You can use by:
do.call(cbind, by(t(temp1), (seq(ncol(temp1)) - 1) %/% 4, FUN = colSums))
If the dimensions are equal for the sub matrices, you could change the dimensions to an array and then do the rowSums
m1 <- as.matrix(temp1)
n <- 4
dim(m1) <- c(nrow(m1), ncol(m1)/n, n)
res <- matrix(rowSums(apply(m1, 2, I)), ncol=n)
identical(res[,1],rowSums(temp1[,1:4]))
#[1] TRUE
Or if the dimensions are unequal
t(sapply(seq(1,ncol(temp2), by=4), function(i) {
indx <- i:(i+3)
rowSums(temp2[indx[indx <= ncol(temp2)]])}))
data
set.seed(24)
temp1 <- as.data.frame(matrix(sample(1:20, 16*4, replace=TRUE), ncol=16))
set.seed(35)
temp2 <- as.data.frame(matrix(sample(1:20, 17*4, replace=TRUE), ncol=17))
Another possibility:
x1<-sapply(1:(ncol(temp1)/4),function(x){rowSums(temp1[,1:4+(x-1)*4])})
## check
x0<-cbind(rowSums(temp1[,c(1:4)]), rowSums(temp1[,c(5:8)]), rowSums(temp1[,c(9:12)]), rowSums(temp1[,c(13:16)]))
identical(x1,x0)
# TRUE
Here's another approach. Convert the matrix to an array and then use apply with sum.
n <- 4
apply(array(temp1, dim=c(dim(temp1)/c(1,n), n)), MARGIN=c(1,3), FUN=sum)
Using #akrun's data
set.seed(24)
temp1 <- matrix(sample(1:20, 16*4, replace=TRUE), ncol=16)
a function which sums matrix columns with each group of size n columns
set.seed(1618)
mat <- matrix(rnorm(24 * 16), 24, 16)
f <- function(mat, n = 4) {
if (ncol(mat) %% n != 0)
stop()
cols <- split(colSums(mat), rep(1:(ncol(mat) / n), each = n))
## or use this to have n mean the number of groups you want
# cols <- split(colSums(mat), rep(1:n, each = ncol(mat) / n))
sapply(cols, sum)
}
f(mat, 4)
# 1 2 3 4
# -17.287137 -1.732936 -5.762159 -4.371258
c(sum(mat[,1:4]), sum(mat[,5:8]), sum(mat[,9:12]), sum(mat[,13:16]))
# [1] -17.287137 -1.732936 -5.762159 -4.371258
More examples:
## first 8 and last 8 cols
f(mat, 8)
# 1 2
# -19.02007 -10.13342
## each group is 16 cols, ie, the entire matrix
f(mat, 16)
# 1
# -29.15349
sum(mat)
# [1] -29.15349

Resources