R converting integer column to 3 factor columns based on digits - r

I have a column of int's like this:
idNums
2
101
34
25
8
...
I need to convert them to 3 factor columns like this:
digit1 digit2 digit3
0 0 2
1 0 1
0 3 4
0 2 5
0 0 8
... ... ...
Any suggestions?

Here's a fun solution using the modular arithmetic operators %% and %/%:
d <- c(2, 101, 34, 25, 8)
res <- data.frame(digit1 = d %/% 100,
digit2 = d %% 100 %/% 10,
digit3 = d %% 10)
# digit1 digit2 digit3
# 1 0 0 2
# 2 1 0 1
# 3 0 3 4
# 4 0 2 5
# 5 0 0 8
Note that it has the minor -- but nice -- side benefit of returning numeric values for each of the columns. If you do, however, want factor columns instead, just follow up with this command:
res[] <- lapply(res, as.factor)
all(sapply(res, class)=="factor")
#[1] TRUE

Use formatC and strsplit.
idNums <- c(2, 101, 34, 25, 8)
idChars <- formatC(idNums, width = 3, flag = "0")
idChars <- strsplit(idChars, "")
data.frame(
digits1 = sapply(idChars, function(x) x[1]),
digits2 = sapply(idChars, function(x) x[2]),
digits3 = sapply(idChars, function(x) x[3])
)
It's a little cleaner using the stringr package. Replace the call to strsplit with
str_split_fixed(idChars, "", 3)

I thought Richie Cottons use of formatC was kewl so I incorporated it:
testdat <- read.fwf(textConnection(formatC(idNums, width = 3, flag = "0") ),
widths=c(1,1,1),
col.names=c("digit1", "digit2", "digit3")
)
testdat
#------------
digit1 digit2 digit3
1 0 0 2
2 1 0 1
3 0 3 4
4 0 2 5
5 0 0 8

Related

Change the column of same values to column of all zeros in R

Assume I have a list called: LS1 and within the list I have 20 matrix of 100 by 5. Now some columns might have just one value repeated like one column is all 100. I want to make these all 100 to all zeros. I can write a for loop to do that but I want to do it more efficiently with lapply and apply. For example one example of this matrix is
1 2 3 4 5
1 3 4 5 6
1 5 6 8 9
I want the first column which is all ones is changed to all zeros.
This is what I have done :
A= lapply(LS1, function(x) {apply(x,2,function(x1) {if(max(x1)== min(x1))
{0}}}
but this makes all the values NULL. Can anyone suggest doing this with lapply and apply?
This should work, especially for integer matrices.
lapply(lst,
function(mat) {
all_dupes = apply(mat, 2, function(x) length(unique(x)) ==1)
mat[, all_dupes] = 0L
return(mat)
}
)
This is my solution:
df <- data.frame(a = c(1, 1, 1),
b = c(2, 3, 5),
c = c(4, 5, 8),
d = c(5, 6, 9),
e = c(5, 5, 5))
A = data.frame(lapply(df, function(x) x = (max(x)!=min(x))*x ))
A
> A
a b c d e
1 0 2 4 5 0
2 0 3 5 6 0
3 0 5 8 9 0
If use sapply:
A = sapply(df, function(x) x = (max(x)!=min(x))*x)
A
a b c d e
[1,] 0 2 4 5 0
[2,] 0 3 5 6 0
[3,] 0 5 8 9 0

Efficietly repeat data.table in a list, sequentially replacing columns with the same names from another data.table in a loop

I have two data.tables:
x <- data.table(a = c(1, 2, 3, 4, 1), b = c(2, 3, 4, 1, 2), c = c(3, 4, 1, 2, 3))
y <- data.table(a = c(1, 0, 0, 0, 1), b = c(0, 1, 0, 0, 0), c = c(0, 0, 0, 0, 1))
What I am trying to achieve is to create a list of y with length of the number of its columns where every next column is replaced by the values of the same column in x. The desired result shall look like this:
[[1]]
a b c
1: 1 0 0
2: 2 1 0
3: 3 0 0
4: 4 0 0
5: 1 0 1
[[2]]
a b c
1: 1 2 0
2: 0 3 0
3: 0 4 0
4: 0 1 0
5: 1 2 1
[[3]]
a b c
1: 1 0 3
2: 0 1 4
3: 0 0 1
4: 0 0 2
5: 1 0 3
What I tried:
z <- lapply(names(x), function(i) {
x[ , i, with = FALSE]
})
w <- rep(list(y), ncol(y))
myfun <- function(obj1, obj2) {
cbind(obj1, obj2)
}
u <- Map(myfun, obj1 = z, obj2 = w)
u <- lapply(u, function(i) {
setcolorder(i[ , unique(names(i)), with = FALSE], names(x))
})
It gives me the desired result, but is very clumsy and requires too many step, hence, it is probably inefficient with larger data.tables. I would like to have it more in the data.table way. I tried something which I assumed would work:
lapply(names(x), function(i) {
y[ , (i) := x[ , i, with = FALSE]]
})
However, it returns the first list component empty and copies all the values of x into the next list components.
Can someone help?
Here, we may need a copy of the 'y' while creating the list 'w' instead of
w <- rep(list(y), ncol(y))
It is tempting to go for the below expression of rep. However, that have an issue in the w elements as these are pointing to the same location in memory
w <- rep(list(copy(x)), ncol(y))
The assignment (:=) by reference changes the column values in each loop because they reference to the same object in memory. In the first case, after the assignment, it changes 'y' too along with 'w' list elements. Second case, it can change only 'w' and leave 'y' because we copyied. To understand the behavior, do a set assignment in a for loop
for(j in seq_along(x)) {print(w[[j]][[j]])
set(w[[j]], i = NULL, j =j, x[[j]])
print("----")
print(w[[j]])
}
Inorder to avoid that, use replicate
w <- replicate(ncol(y), copy(y), simplify = FALSE)
and then do the for loop (after recreating the objects again as the values were replaced from the previous run)
for(j in seq_along(x)) {print(w[[j]][[j]])
set(w[[j]], i = NULL, j =j, x[[j]])
print("----")
print(w[[j]])
}
Or a Map based assignment
Map(function(u, v) u[, (v) := x[[v]]][], w, names(x))
#[[1]]
# a b c
#1: 1 0 0
#2: 2 1 0
#3: 3 0 0
#4: 4 0 0
#5: 1 0 1
#[[2]]
# a b c
#1: 1 2 0
#2: 0 3 0
#3: 0 4 0
#4: 0 1 0
#5: 1 2 1
#[[3]]
# a b c
#1: 1 0 3
#2: 0 1 4
#3: 0 0 1
#4: 0 0 2
#5: 1 0 3
Instead of assignment by reference, it can be done with a simple Map from base R if we have not copyied the 'y' object while creating 'w'
Map(function(u, v) {u[[v]] <- x[[v]]
u}, w, names(x))

Converting counts to individual observations in r

I have a data set that looks as follows
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
I want to reshape the dataframe to look like this
# name judgement1 judgement2 judgement3
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# b 1 0 0
# b 0 1 0
# b 0 0 1
And so on. I have seen that untable is recommended on some other threads, but it does not appear to work with the current version of r. Is there a package that can convert summarised counts into individual observations?
You could try something like this:
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
rep.vec <- colSums(df[colnames(df) %in% paste0("judgement", (1:nrow(df)), sep="")], na.rm = TRUE)
want <- data.frame(name=df$name, cbind(diag(nrow(df))))
colnames(want)[-1] <- paste0("judgement", (1:nrow(df)), sep="")
(want <- want[rep(1:nrow(want), rep.vec), ])
I wrote a function that works to give you your desired output:
untabl <- function(df, id.col, count.cols) {
df[is.na(df)] <- 0 # replace NAs
out <- lapply(count.cols, function(x) { # for each column with counts
z <- df[rep(1:nrow(df), df[,x]), ] # replicate rows
z[, -c(id.col)] <- 0 # set all other columns to zero
z[, x] <- 1 # replace the count values with 1
z
})
out <- do.call(rbind, out) # combine the list
out <- out[order(out[,c(id.col)]),] # reorder (you can change this)
rownames(out) <- NULL # return to simple row numbers
out
}
untabl(df = df, id.col = 1, count.cols = c(2,3,4))
# name judgement1 judgement2 judgement3
#1 a 1 0 0
#2 a 1 0 0
#3 a 1 0 0
#4 a 1 0 0
#5 a 1 0 0
#6 a 0 1 0
#7 b 0 1 0
#8 a 0 0 1
#9 a 0 0 1
#10 b 0 0 1
And for your reference, reshape::untable consists of the following code:
function (df, num)
{
df[rep(1:nrow(df), num), ]
}

Make a vector with counts of rows that meet criteria

I want to make a vector that contains number of rows that meet my criteria^=:
leftE0 <- c(900,2000,4000,9000,15000,30000,53000,100000,160000)
rightE0 <- c(2000,4000,9000,15000,30000,53000,100000,160000,300000)
sum(datap$CF > 0 & (datap$E0.keV > leftE0[1]) & (datap$E0.keV < rightE0[1]), na.rm=TRUE)
I don't understand how to vectorise this action.
Use cut and table:
#some example data
set.seed(42)
datap <- data.frame(CF = rnorm(100), E0.keV = exp(runif(100, 0, log(4e6))))
breaks <- c(-Inf, 900,2000,4000,9000,15000,30000,53000,100000,160000, 300000, Inf)
table(cut(datap$E0.keV, breaks), datap$CF > 0)
# FALSE TRUE
# (-Inf,900] 21 32
# (900,2e+03] 6 3
# (2e+03,4e+03] 3 3
# (4e+03,9e+03] 6 0
# (9e+03,1.5e+04] 1 1
# (1.5e+04,3e+04] 0 1
# (3e+04,5.3e+04] 1 0
# (5.3e+04,1e+05] 2 0
# (1e+05,1.6e+05] 1 0
# (1.6e+05,3e+05] 2 1
# (3e+05, Inf] 3 13

Create counter within consecutive runs of certain values

I have an hourly value. I want to count how many consecutive hours the value has been zero since the last time it was not zero. This is an easy job for a spreadsheet or for loop, but I am hoping for a snappy vectorized one-liner to accomplish the task.
x <- c(1, 0, 1, 0, 0, 0, 1, 1, 0, 0)
df <- data.frame(x, zcount = NA)
df$zcount[1] <- ifelse(df$x[1] == 0, 1, 0)
for(i in 2:nrow(df))
df$zcount[i] <- ifelse(df$x[i] == 0, df$zcount[i - 1] + 1, 0)
Desired output:
R> df
x zcount
1 1 0
2 0 1
3 1 0
4 0 1
5 0 2
6 0 3
7 1 0
8 1 0
9 0 1
10 0 2
William Dunlap's posts on R-help are the place to look for all things related to run lengths. His f7 from this post is
f7 <- function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)}
and in the current situation f7(!x). In terms of performance there is
> x <- sample(0:1, 1000000, TRUE)
> system.time(res7 <- f7(!x))
user system elapsed
0.076 0.000 0.077
> system.time(res0 <- cumul_zeros(x))
user system elapsed
0.345 0.003 0.349
> identical(res7, res0)
[1] TRUE
Here's a way, building on Joshua's rle approach: (EDITED to use seq_len and lapply as per Marek's suggestion)
> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
[1] 0 1 0 1 2 3 0 0 1 2
UPDATE. Just for kicks, here's another way to do it, around 5 times faster:
cumul_zeros <- function(x) {
x <- !x
rl <- rle(x)
len <- rl$lengths
v <- rl$values
cumLen <- cumsum(len)
z <- x
# replace the 0 at the end of each zero-block in z by the
# negative of the length of the preceding 1-block....
iDrops <- c(0, diff(v)) < 0
z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
# ... to ensure that the cumsum below does the right thing.
# We zap the cumsum with x so only the cumsums for the 1-blocks survive:
x*cumsum(z)
}
Try an example:
> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
[1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0
Now compare times on a million-length vector:
> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
user system elapsed
0.15 0.00 0.14
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
user system elapsed
0.75 0.00 0.75
Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!
rle will "count how many consecutive hours the value has been zero since the last time it was not zero", but not in the format of your "desired output".
Note the lengths for the elements where the corresponding values are zero:
rle(x)
# Run Length Encoding
# lengths: int [1:6] 1 1 1 3 2 2
# values : num [1:6] 1 0 1 0 1 0
A simple base R approach:
ave(!x, cumsum(x), FUN = cumsum)
#[1] 0 1 0 1 2 3 0 0 1 2
One-liner, not exactly super elegant:
x <- c(1, 0, 1, 0, 0, 0, 1, 1, 0, 0)
unlist(lapply(split(x, c(0, cumsum(abs(diff(!x == 0))))), function(x) (x[1] == 0) * seq(length(x))))
Using purr::accumulate() is very straightforward, so this tidyverse solution may add some value here. I must acknowledge it is definitely not the fastest, as it calls the same function length(x)times.
library(purrr)
accumulate(x==0, ~ifelse(.y!=0, .x+1, 0))
[1] 0 1 0 1 2 3 0 0 1 2

Resources