Replacing a column with 0 and saving it to a different name - r

I am having problem with changing a specific column in a matrix and giving it a new name. Let's say
A =
2 2 2
2 2 2
2 2 2
I would like to make 3 matrix which are
A1 =
0 2 2
0 2 2
0 2 2
A2 =
2 0 2
2 0 2
2 0 2 and so on.
I tried a for loop with A[,i] <- 0 but this changes all the elements in A to 0.
I have tried A - A[,i] but this all the column of A are being subtracted by the vector A[,i].... Please help me!

Building on #lmo's comment you can do:
a <- matrix(2, 3, 3)
x <- lapply(seq_len(ncol(a)), function(i) {a[, i] <- 0; a})
names(x) <- paste("a",1:length(x), sep = "")
list2env(x, envir=.GlobalEnv)

Related

nested for loop in R, where the second index counts inside the first one

I have for example a datset like this:
data <- data.frame(matrix(c(1,2,2,3,4,5,5,"a","a","b","a","a","a","b"), nrow = 7, ncol = 2, byrow = F))
X1 X2
1 a
2 a
2 b
3 a
4 a
5 a
5 b
then I add another variable "tag", initially set to 0.
data$tag <- 0
X1 X2 tag
1 a 0
2 a 0
2 b 0
3 a 0
4 a 0
5 a 0
5 b 0
I'd like to have "tag" equal to 1 for each row that is repeated, like:
X1 X2 tag
1 a 0
2 a 1
2 b 1
3 a 0
4 a 0
5 a 1
5 b 1
I used the followed code:
for (i in data$X1) {
for (j in 1:length(data$X1)) {
if (j==2) {data$tag[j] <- 1}
}
}
but it doesn't work like I would like to. I'd like the second loop (j) to work inside the previous one in order to obtain what I want, where j starts from 1 every time X1 changes.
How can I manage it?
Thanks a lot
Maybe you can try ave
within(
data,
tag <- +(ave(X1, X1, FUN = length) > 1)
)
which gives
X1 X2 tag
1 1 a 0
2 2 a 1
3 2 b 1
4 3 a 0
5 4 a 0
6 5 a 1
7 5 b 1
You can use duplicated from both the ends in base R :
data$tag <- as.integer(duplicated(data$X1) |
duplicated(data$X1, fromLast = TRUE))
data
# X1 X2 tag
#1 1 a 0
#2 2 a 1
#3 2 b 1
#4 3 a 0
#5 4 a 0
#6 5 a 1
#7 5 b 1
An option with add_count
library(dplyr)
data %>%
add_count(X1) %>%
mutate(n = +(n > 1))

Add X number of columns to a data.frame

I would like to add a varying number (X) of columns with 0 to an existing data.frame within a function.
Here is an example data.frame:
dt <- data.frame(x=1:3, y=4:6)
I would like to get this result if X=1 :
a x y
1 0 1 4
2 0 2 5
3 0 3 6
And this if X=3 :
a b c x y
1 0 0 0 1 4
2 0 0 0 2 5
3 0 0 0 3 6
What would be an efficient way to do this?
We can assign multiple columns to '0' based on the value of 'X'
X <- 3
nm1 <- names(dt)
dt[letters[seq_len(X)]] <- 0
dt[c(setdiff(names(dt), nm1), nm1)]
Also, we can use add_column from tibble and create columns at a specific location
library(tibble)
add_column(dt, .before = 1, !!!setNames(as.list(rep(0, X)),
letters[seq_len(X)]))
A second option is cbind
f <- function(x, n = 3) {
cbind.data.frame(matrix(
0,
ncol = n,
nrow = nrow(x),
dimnames = list(NULL, letters[1:n])
), x)
}
f(dt, 5)
# a b c d e x y
#1 0 0 0 0 0 1 4
#2 0 0 0 0 0 2 5
#3 0 0 0 0 0 3 6
NOTE: because letters has a length of 26 the function would need some adjustment regarding the naming scheme if n > 26.
You can try the code below
dt <- cbind(`colnames<-`(t(rep(0,X)),letters[seq(X)]),dt)
If you don't care the column names of added columns, you can use just
dt <- cbind(t(rep(0,X)),dt)
which is much shorter

removing columns equal to 0 from multiple data frames in a list; lapply not actually removing columns when applying function to a list

I have a list of three data frames that are similar (same number of columns but different number of rows), and were split from a larger data set.
Here is some example code to make three data frames and put them in a list. It is really hard to make an exact replicate of my data since the files are so large (over 400 columns and the first 6 columns are not numerical)
a <- c(0,1,0,1,0,0,0,0,0,1,0,1)
b <- c(0,0,0,0,0,0,0,0,0,0,0,0)
c <- c(1,0,1,1,1,1,1,1,1,1,0,1)
d <- c(0,0,0,0,0,0,0,0,0,0,0,0)
e <- c(1,1,1,1,0,1,0,1,0,1,1,1)
f <- c(0,0,0,0,0,0,0,0,0,0,0,0)
g <- c(1,0,1,0,1,1,1,1,1,1)
h <- c(0,0,0,0,0,0,0,0,0,0)
i <- c(1,0,0,0,0,0,0,0,0,0)
j <- c(0,0,0,0,1,1,1,1,1,0)
k <- c(0,0,0,0,0)
l <- c(1,0,1,0,1)
m <- c(1,0,1,0,0)
n <- c(0,0,0,0,0)
o <- c(1,0,1,0,1)
df1 <- data.frame(a,b,c,d,e,f)
df2 <- data.frame(g,h,i,j)
df3 <- data.frame(k,l,m,n,o)
my.list <- list(df1,df2,df3)
I am looking to remove all the columns in each data frame whose total == 0. The code is below:
list2 <- lapply(my.list, function(x) {x[, colSums(x) != 0];x})
list2 <- lapply(my.list, function(x) {x[, colSums(x != 0) > 0];x})
Both of the above codes will run, but neither actually remove the columns == 0.
I am not sure why that is, any tips are greatly appreciated
The OP found a solution by exchanging comments with me. But I wanna drop the following. In lapply(my.list, function(x) {x[, colSums(x) != 0];x}), the OP was asking R to do two things. The first thing was subsetting each data frame in my.list. The second thing was showing each data frame. I think he thought that each data frame was updated after subsetting columns. But he was simply asking R to show each data frame as it is in the second command. So R was showing the result for the second command. (On the surface, he did not see any change.) If I follow his way, I would do something like this.
lapply(my.list, function(x) {foo <- x[, colSums(x) != 0]; foo})
He wanted to create a temporary object in the anonymous function and return the object. Alternatively, he wanted to do the following.
lapply(my.list, function(x) x[, colSums(x) != 0])
For each data frame in my.list, run a logical check for each column. If colSums(x) != 0 is TRUE, keep the column. Otherwise remove it. Hope this will help future readers.
[[1]]
a c e
1 0 1 1
2 1 0 1
3 0 1 1
4 1 1 1
5 0 1 0
6 0 1 1
7 0 1 0
8 0 1 1
9 0 1 0
10 1 1 1
11 0 0 1
12 1 1 1
[[2]]
g i j
1 1 1 0
2 0 0 0
3 1 0 0
4 0 0 0
5 1 0 1
6 1 0 1
7 1 0 1
8 1 0 1
9 1 0 1
10 1 0 0
[[3]]
l m o
1 1 1 1
2 0 0 0
3 1 1 1
4 0 0 0
5 1 0 1

rewriting variable using for loop R

I've got a column in my dataset that contains a collection of 0,1 and 2. The 2's are a weird leftover from some previous transformation, and I need to convert them to 1. I've written a simple loop to do this
for (i in my.cl.accept$enroll){
if (i==2){
i=1
}
}
however, this doesn't change the actual contents of the dataframe. ifelse() doesn't work, because I don't need to change the other digits at all; just the number 2.
I've been using R a little more after coming from python, what simple thing am I misunderstanding here?
Lets generate a sample set:
set.seed(10)
DF <- data.frame(
a=1:10,
b=sample(0:2,10,rep=T))
DF
Now, replace every entry corresponding to 2 with 1:
DF$b[DF$b==2] <- 1
DF
Note: This is a vectorized method, and will always work faster than loop iterations.
Dunno whether this is what you want?
> A<- 1:10
> B<- c(rep(0,5), rep(1,3), rep(2,2))
> data <- data.frame(A,B)
> data
A B
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 1
7 7 1
8 8 1
9 9 2
10 10 2
> data[data$B==2,]$B <- 1
> data
A B
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 1
7 7 1
8 8 1
9 9 1
10 10 1
Are you sure you're using ifelse correctly? It actually does allow you to only change one value to another. Here's an example:
> x <- sample(c(0, 1, 2), 10, TRUE)
> x
## [1] 2 1 1 0 2 2 0 0 2 1
> ifelse(x == 2, 1, x)
## [1] 1 1 1 0 1 1 0 0 1 1
For future reference, your good old-fashioned for loop should go something like this...
for (i in 1:length(my.cl.accept$enroll)){
if (my.cl.accept$enroll[i] == 2){
my.cl.accept$enroll[i] <- 1
} else {
my.cl.accept$enroll[i]
}
}

Identify and label the largest number in each group

Hi I want to identify and label the largest number for each group, can someone tell me how to get this done in r (or maybe excel would be easier)?
The following is an example data, the original data contains only the left 2 columns and I want to generate the third one. In the 3rd column, I want to label the largest value in the group as 1, e.g., in group 1, the largest is .02874 so it's marked as 1, otherwise 0. Thank you!
x <- read.table(header=T, text="group value largest
1 0.02827 0
1 0.02703 0
1 0.02874 1
2 0.03255 0
2 0.10394 1
2 0.03417 0
3 0.13858 0
3 0.16084 0
3 0.99830 1
3 0.24563 0")
UPDATE: Thank you all for your help! They all are great solutions!
Finally, the base (no package required) approach:
is.largest <- function(x) as.integer(seq_along(x) == which.max(x))
x <- transform(x, largest = ave(value, group, FUN = is.largest))
Note that if I were you, I would remove the as.integer and just store a logical (TRUE/FALSE) vector.
library(data.table)
x <- data.table(x)
y <- x[,list(value = max(value), maxindicator = TRUE), by = c('group')]
z <- merge(x,y, by = c('group','value'), all = TRUE)
Output
> z
group value largest maxindicator
1: 1 0.02703 0 NA
2: 1 0.02827 0 NA
3: 1 0.02874 1 TRUE
4: 2 0.03255 0 NA
5: 2 0.03417 0 NA
6: 2 0.10394 1 TRUE
7: 3 0.13858 0 NA
8: 3 0.16084 0 NA
9: 3 0.24563 0 NA
10: 3 0.99830 1 TRUE
Here is a solution with plyr :
x$largest <- 0
x <- ddply(x, .(group), function(df) {
df$largest[which.max(df$value)] <- 1
df
})
And one with base R :
x$largest <- 0
l <- split(x, x$group)
l <- lapply(l, function(df) {
df$largest[which.max(df$value)] <- 1
df
})
x <- do.call(rbind, l)
Here's a less cool base approach:
FUN <- function(x) {y <- rep(0, length(x)); y[which.max(x)] <- 1; y}
x$largest <- unlist(tapply(x$value, x$group, FUN))
## group value largest
## 1 1 0.02827 0
## 2 1 0.02703 0
## 3 1 0.02874 1
## 4 2 0.03255 0
## 5 2 0.10394 1
## 6 2 0.03417 0
## 7 3 0.13858 0
## 8 3 0.16084 0
## 9 3 0.99830 1
## 10 3 0.24563 0
It was more difficult to do in base than I had anticipated.

Resources