I want to make every element in the dataframe (except fot the ID column) become a 0 if it is any number other than 1.
I have:
ID A B C D E
abc 5 3 1 4 1
def 4 1 3 2 5
I want:
ID A B C D E
abc 0 0 1 0 1
def 0 1 0 0 0
I am having trouble figuring out how to specify for this to be done to do to every entry in every column and row.
Here is my code:
apply(dat.lec, 2 , function(y)
if(!is.na(y)){
if(y==1){y <- 1}
else{y <-0}
}
else {y<- NA}
)
Thank you for your help!
No need for implicit or explicit looping.
# Sample data
set.seed(2016);
df <- as.data.frame(matrix(sample(10, replace = TRUE), nrow = 2));
df <- cbind.data.frame(id = sample(letters, 2), df);
df;
# id V1 V2 V3 V4 V5
#1 k 2 9 5 7 1
#2 g 2 2 2 9 1
# Replace all entries != 1 with 0's
df[, -1][df[, -1] != 1] <- 0;
df;
# id V1 V2 V3 V4 V5
#1 k 0 0 0 0 1
#2 g 0 0 0 0 1
Related
I want to do the equivalent of find and replace 1=0;2=0;3=0;4=1;5=2;6=3 for many different variables in my data set.
Things I've tried:
making 1=0;2=0;3=0;4=1;5=2;6=3 into a function and using sapply. I changed the ; to , and changed the = to <- and no combination of these were recognized as a function. I tried creating a function with that definition and putting it into sapply and it didn't work.
I tried using recode and it did not work:
wdata[ ,cols2] = recode(wdata[ ,cols2], 1=0;2=0;3=0;4=1;5=2;6=3)
Assuming you are working with a data.frame or matrix you can use direct indexing:
# Sample data
set.seed(2017);
df <- as.data.frame(matrix(sample(1:6, 20, replace = T), ncol = 4));
df;
#V1 V2 V3 V4
#1 6 5 5 3
#2 4 1 1 3
#3 3 3 1 5
#4 2 3 3 6
#5 5 2 3 5
df[df == 1 | df == 2 | df == 3] <- 0;
df[df == 4] <- 1;
df[df == 5] <- 2;
df[df == 6] <- 3;
df;
# V1 V2 V3 V4
#1 3 2 2 0
#2 1 0 0 0
#3 0 0 0 2
#4 0 0 0 3
#5 2 0 0 2
Note that the order of the substitutions matters. For example, df[df == 4] = 1; df[df == 1] <- 0; will give a different output from df[df == 1] <- 0; df[df == 4] <- 1;
Alternative solution using recode from dplyr with sapply or mutate_all:
set.seed(2017);
df <- as.data.frame(matrix(sample(1:6, 20, replace = T), ncol = 4));
df
library(dplyr)
f = function(x) recode(x, `1`=0, `2`=0, `3`=0, `4`=1, `5`=2, `6`=3)
sapply(df, f)
# V1 V2 V3 V4
# [1,] 3 2 2 0
# [2,] 1 0 0 0
# [3,] 0 0 0 2
# [4,] 0 0 0 3
# [5,] 2 0 0 2
df %>% mutate_all(f)
# V1 V2 V3 V4
# 1 3 2 2 0
# 2 1 0 0 0
# 3 0 0 0 2
# 4 0 0 0 3
# 5 2 0 0 2
A looping alternative with lapply and match is as follows:
dat[] <- lapply(dat, function(x) c(0, 0, 0, 1, 2, 3)[match(x, 1:6)])
This uses a lookup table on the vector c(0,0,0,1,2,3) with match selecting the indices. Using the data.frame created by Maurits Evers, we get
dat
V1 V2 V3 V4
1 3 2 2 0
2 1 0 0 0
3 0 0 0 2
4 0 0 0 3
5 2 0 0 2
To do this for a subset of the columns, just select them on each side, like
dat[, cols2] <-
lapply(dat[, cols2], function(x) c(0, 0, 0, 1, 2, 3)[match(x, 1:6)])
or
dat[cols2] <- lapply(dat[cols2], function(x) c(0, 0, 0, 1, 2, 3)[match(x, 1:6)])
I have a sequence which looks like this
SEQENCE
1 A
2 B
3 B
4 C
5 A
Now from this sequence, I want to get the matrix like this where i the row and jth column element denotes how many times movement occurred from ith row node to jth column node
A B C
A 0 1 0
B 0 1 1
C 1 0 0
How Can I get this in R
1) Use table like this:
s <- DF[, 1]
table(tail(s, -1), head(s, -1))
giving:
A B C
A 0 0 1
B 1 1 0
C 0 1 0
2) or like this. Since embed does not work with factors we convert the factor to character,
s <- as.character(DF[, 1])
do.call(table, data.frame(embed(s, 2)))
giving:
X2
X1 A B C
A 0 0 1
B 1 1 0
C 0 1 0
3) xtabs also works:
s <- as.character(DF[, 1])
xtabs(data = data.frame(embed(s, 2)))
giving:
X2
X1 A B C
A 0 0 1
B 1 1 0
C 0 1 0
Note: The input DF in reproducible form is:
Lines <- " SEQENCE
1 A
2 B
3 B
4 C
5 A"
DF <- read.table(text = Lines, header = TRUE)
I have a data frame such as this:
df <- data.frame(
ID = c('123','124','125','126'),
Group = c('A', 'A', 'B', 'B'),
V1 = c(1,2,1,0),
V2 = c(0,0,1,0),
V3 = c(1,1,0,3))
which returns:
ID Group V1 V2 V3
1 123 A 1 0 1
2 124 A 2 0 1
3 125 B 1 1 0
4 126 B 0 0 3
and I would like to return a table that indicates if a variable is represented in the group or not:
Group V1 V2 V3
A 1 0 1
B 1 1 1
In order to count the number of distinct variables in each group.
We can do this with base R
aggregate(.~Group, df[-1], function(x) as.integer(sum(x)>0))
# Group V1 V2 V3
#1 A 1 0 1
#2 B 1 1 1
Or using rowsum from base R
+(rowsum(df[-(1:2)], df$Group)>0)
# V1 V2 V3
#A 1 0 1
#B 1 1 1
Or with by from base R
+(do.call(rbind, by(df[3:5], df['Group'], FUN = colSums))>0)
# V1 V2 V3
#A 1 0 1
#B 1 1 1
Have you tried
unique(group_by(mtcars,cyl)$cyl).
Output:[1] 6 4 8
(preamble)
I don't know if this is the right place for that...I actually have a problem solving/optimization issue for the counting over a table. So if it's not. very sorry and deserve the minusrating.
Here's the data frame
dat <- data.frame(id=letters[1:5],matrix(c(0,0,1,0,0, 0,1,0,1,1, 0,0,2,1,0, 1,0,2,1,1, 0,0,2,0,0, 0,1,2,1,0),5,6))
#
# id X1 X2 X3 X4 X5 X6
# 1 a 0 0 0 1 0 0
# 2 b 0 1 0 0 0 1
# 3 c 1 0 2 2 2 2
# 4 d 0 1 1 1 0 1
# 5 e 0 1 0 1 0 0
I would like to count along every row, how many times we get to 1 and how many times from 1 we go to 0. so the final results should be
# id N1 N0
# a 1 1
# b 2 1
# c 1 1
# d 2 1
# e 2 2
I actually found an algorithm but it's more C/FORTRAN style (here below) and I can't believe there's not an esaier and more elegant way to get this in R. Thanks a lot for any help or hint.
nr <- nrow(dat)
nc <- ncol(dat)
rownames(dat) <- seq(1,nr,1)
colnames(dat) <- seq(1,nc,1)
dat$N1 <- NULL
dat$N2 <- NULL
for (i in 1:nr) {
n1 <- 0
n0 <- 0
j <- 2
while (!(j>nc)) {
k <- j
if (dat[i,k] == 1) {
n1 <- n1 + 1
k <- j + 1
while (!(k>nc)) {
if (dat[i,k] == 0) {
n0 <- n0 + 1
break
}
k <- k + 1
}
}
j <- k
j <- j + 1
}
dat$N1[i] <- n1
dat$N0[i] <- n0
}
Not sure if I totally got it, but you can try:
cbind(dat["id"],N0=rowSums(dat[,3:7]==1 & dat[,2:6]!=1)+(dat[,2]==1),
N1=rowSums(dat[,3:7]==0 & dat[,2:6]==1))
# id N0 N1
#1 a 1 1
#2 b 2 1
#3 c 1 1
#4 d 2 1
#5 e 2 2
Here's another way, using rle wrapped in data.table syntax:
library(data.table)
setDT(dat)
melt(dat, id="id")[, with(rle(value), list(
n1 = sum(values==1),
n1to0 = sum("10" == do.call(paste0, shift(values, 1:0, fill=0)))
)), by=id]
# id n1 n1to0
# 1: a 1 1
# 2: b 2 1
# 3: c 1 1
# 4: d 2 1
# 5: e 2 2
Notes.
shift with n=1:0 returns the lagged vector (lag of 1) and the vector itself (lag of 0).
melt creates a value column; and rle contains a values vector.
I'm trying to multiply column and get its names.
I have a data frame:
v1 v2 v3 v4 v5
0 1 1 1 1
0 1 1 0 1
1 0 1 1 0
I'm trying to multiplying each column with other, like:
v1v2
v1v3
v1v4
v1v5
and
v2v3
v2v4
v2v5
etc, and
v1v2v3
v1v2v4
v1v2v5
v2v3v4
v2v3v5
4 combination and 5 combination...if there is n column then n combination.
I'm try to use following code in while loop, but it is not working:
i<-1
while(i<=ncol(data)
{
results<-data.frame()
v<-i
results<- t(apply(data,1,function(x) combn(x,v,prod)))
comb <- combn(colnames(data),v)
colnames(results) <- apply(comb,v,function(x) paste(x[1],x[2],sep="*"))
results <- colSums(results)
}
but it is not working.
sample out put..
if n=3
v1v2 v1v3 v2v3
0 0 1
0 0 1
0 1 0
and colsum
v1v2 v1v3 v2v3
0 1 2
then
v1v2=0
v1v3=1
v2v3=2
this one is I'm trying?
Try this:
df <- read.table(text = "v1 v2 v3 v4 v5
0 1 1 1 1
0 1 1 0 1
1 0 1 1 0", skip = 1)
df
ll <- vector(mode = "list", length = ncol(df)-1)
ll <- lapply(2:ncol(df), function(ncols){
tmp <- t(apply(df, 1, function(rows) combn(x = rows, m = ncols, prod)))
if(ncols < ncol(df)){
tmp <- colSums(tmp)
}
else{
tmp <- sum(tmp)
}
names1 <- t(combn(x = colnames(df), m = ncols))
names(tmp) <- apply(names1, 1, function(rows) paste0(rows, collapse = ""))
ll[[ncols]] <- tmp
})
ll
# [[1]]
# V1V2 V1V3 V1V4 V1V5 V2V3 V2V4 V2V5 V3V4 V3V5 V4V5
# 0 1 1 0 2 1 2 2 2 1
#
# [[2]]
# V1V2V3 V1V2V4 V1V2V5 V1V3V4 V1V3V5 V1V4V5 V2V3V4 V2V3V5 V2V4V5 V3V4V5
# 0 0 0 1 0 0 1 2 1 1
#
# [[3]]
# V1V2V3V4 V1V2V3V5 V1V2V4V5 V1V3V4V5 V2V3V4V5
# 0 0 0 0 1
#
# [[4]]
# V1V2V3V4V5
# 0
Edit following comment
The results of the different set of column combinations can then be accessed by indexing (subsetting) the list. E.g. to access the "2 combinations", select the first element of the list, to access the "3rd combination", select the second element of the list, et c.
ll[[1]]
# V1V2 V1V3 V1V4 V1V5 V2V3 V2V4 V2V5 V3V4 V3V5 V4V5
# 0 1 1 0 2 1 2 2 2 1