Subtracting columns in a loop - r

I've got a data frame like that:
df:
A B C
1 1 2 3
2 2 2 4
3 2 2 3
I would like to subtract each column with the next smaler one (A-0, B-A, C-B). So my results should look like that:
df:
A B C
1 1 1 1
2 2 0 2
3 2 0 1
I tried the following loop, but it didn't work.
for (i in 1:3) {
j <- data[,i+1] - data[,i]
}

Try this
df - cbind(0, df[-ncol(df)])
# A B C
# 1 1 1 1
# 2 2 0 2
# 3 2 0 1
Data
df <- data.frame(A = c(1, 2, 2), B = c(2, 2, 2), C = c(3, 4, 3))

We can also remove the first and last column and do the subtraction
df[-1] <- df[-1]-df[-length(df)]
data
df <- data.frame(A = c(1, 2, 2), B = c(2, 2, 2), C = c(3, 4, 3))

Related

Change the column of same values to column of all zeros in R

Assume I have a list called: LS1 and within the list I have 20 matrix of 100 by 5. Now some columns might have just one value repeated like one column is all 100. I want to make these all 100 to all zeros. I can write a for loop to do that but I want to do it more efficiently with lapply and apply. For example one example of this matrix is
1 2 3 4 5
1 3 4 5 6
1 5 6 8 9
I want the first column which is all ones is changed to all zeros.
This is what I have done :
A= lapply(LS1, function(x) {apply(x,2,function(x1) {if(max(x1)== min(x1))
{0}}}
but this makes all the values NULL. Can anyone suggest doing this with lapply and apply?
This should work, especially for integer matrices.
lapply(lst,
function(mat) {
all_dupes = apply(mat, 2, function(x) length(unique(x)) ==1)
mat[, all_dupes] = 0L
return(mat)
}
)
This is my solution:
df <- data.frame(a = c(1, 1, 1),
b = c(2, 3, 5),
c = c(4, 5, 8),
d = c(5, 6, 9),
e = c(5, 5, 5))
A = data.frame(lapply(df, function(x) x = (max(x)!=min(x))*x ))
A
> A
a b c d e
1 0 2 4 5 0
2 0 3 5 6 0
3 0 5 8 9 0
If use sapply:
A = sapply(df, function(x) x = (max(x)!=min(x))*x)
A
a b c d e
[1,] 0 2 4 5 0
[2,] 0 3 5 6 0
[3,] 0 5 8 9 0

Is there a function to know how many times a column has the best value?

I have a data.frame like this :
A B C
4 8 2
1 3 5
5 7 6
It could have more column and lines.
So what I'd like to know is for each column how many times they have the lowest values (in my example the result should be 2 for A and 1 for C).
d = data.frame(a = c(4, 1, 5), b = c(8, 3, 7), c = c(2, 5, 6))
row_mins = apply(d, 1, min)
# alternately, slightly more efficient
row_mins = do.call(pmin, d)
colSums(d == row_mins)
# a b c
# 2 0 1

How do I create a simple table in R using for loops?

I was asked to create a table with three columns, A, B and C and eight rows. Column A must go 1, 1, 1, 1, 2, 2, 2, 2. Column B must alternate 1, 2, 1, 2, 1, 2, 1, 2. And column C must go 1, 1, 2, 2, 1, 1, 2, 2. I am able to produce the A column data fine, but don't know how to get B or C. This is the code I have so far:
dataSheet <- matrix(nrow = 0, ncol = 3)
colnames(dataSheet) <- c('A', 'B', 'C')
A <- 1
B <- 1
C <- 1
for (A in 1:4){
A=1
dataSheet <- rbind(dataSheet, c(A, B, C))
}
for (A in 5:8){
A=2
dataSheet <- rbind(dataSheet, c(A, B, C))
}
This seems like a good excuse to get familiar with the rep() function as it easily supports this question, but many more complicated questions if you're clever enough:
dt <- data.frame(A = rep(1:2, each = 4),
B = rep(1:2, times = 4),
C = rep(1:2, each = 2))
dt
#> A B C
#> 1 1 1 1
#> 2 1 2 1
#> 3 1 1 2
#> 4 1 2 2
#> 5 2 1 1
#> 6 2 2 1
#> 7 2 1 2
#> 8 2 2 2
Created on 2019-01-26 by the reprex package (v0.2.1)
Simply use R's vectorization for this task, i.e.
A <- c(1, 1, 1, 1, 2, 2, 2, 2)
B <- c(1, 2, 1, 2, 1, 2, 1, 2) # or rep(1:2, 4)
C <- c(1, 1, 2, 2, 1, 1, 2, 2)
cbind(A,B,C)
Maybe something along the lines of the following will be acceptable by your professor.
for (i in 1:8){
A <- if(i <= 4) 1 else 2
B <- if(i %% 2) 1 else 2
C <- if(any(i %% 4 == c(0, 1, 4, 5))) 1 else 2
dataSheet <- rbind(dataSheet, c(A, B, C))
}
dataSheet
# A B C
#[1,] 1 1 1
#[2,] 1 2 2
#[3,] 1 1 2
#[4,] 1 2 1
#[5,] 2 1 1
#[6,] 2 2 2
#[7,] 2 1 2
#[8,] 2 2 1

exchange two columns and remove the duplicates in a data frame using R

Here is an example to explain what I want to do. I have a data frame like:
X Y
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
I want to change it to another format:
X1 Y1 X2 Y2
1 1 1 1
1 2 2 1
1 3 3 1
......
For two rows in the first table, say X=1, Y=2 and X=2, Y=1. They just exchange each other's values. So I want to put such rows in on row, as shown in the second table, and then remove the duplicates. So, the 'thin and long' table is turned to 'short and fat'. I know how to do it using two for loops. But in R, such operation takes for ever. So, can anyone help me with a quick way?
Here is a smallest example:
The original table is:
X Y
1 2
2 1
The transferred table that I want is like:
X1 Y1 X2 Y2
1 2 2 1
So, the rows in the first table that just exchanges values are integrated into one row in the second table and the extra row in the first table is removed.
Maybe the code below in base R can work
dfout <- `names<-`(cbind(r <- subset(df,df$Y>=df$X),rev(r)),
c("X1","Y1","X2","Y2"))
such that
> dfout
X1 Y1 X2 Y2
1 1 1 1 1
2 1 2 2 1
3 1 3 3 1
5 2 2 2 2
6 2 3 3 2
9 3 3 3 3
DATA
df <- structure(list(X = c(1, 1, 1, 2, 2, 2, 3, 3, 3), Y = c(1, 2,
3, 1, 2, 3, 1, 2, 3)), class = "data.frame", row.names = c(NA,
-9L))
library(tidyverse)
df <- tibble(x1 = 1, 1, 1, 2, 2, 2, 3, 3, 3,
y1 = 1, 2, 3, 1, 2, 3, 1, 2, 3)
df <- df %>% mutate(x2 = y1, y2 = x1) %>% distinct()
I think this does the trick.

Sort list of vectors into single frequency table with a factor column

I have a data frame containing a list vector with jagged entries:
df = data.frame(x = rep(c(1,2), 2), y = rep(c("a", "b"), each = 2))
L = list()
for (each in round(runif(4, 1,5))) L = c(L, list(1:each))
df$L = L
For example,
x y L
1 a 1
2 a 1, 2, 3, 4
1 b 1, 2, 3
2 b 1, 2, 3
How could I create a table which counts the values of L for each x, across the values of y? So, in this example it would output something like,
1 2 3 4
X
1 2 1 1 0
2 2 2 2 1
I had some luck using
tablist = function(L) table(unlist(L))
tapply(df$L, df$x, tablist)
which produces,
$`1`
1 2 3
2 1 1
$`2`
1 2 3 4
2 2 2 1
However, I'm not sure how to go from here to a single table. Also, I'm beggining to suspect that this approach might start taking an unruly amount of time for large data frames. Any thoughts / suggestions would be greatly appreciated!
Using pylr
library(plyr)
df = data.frame(x = rep(c(1,2), 2), y = rep(c("a", "b"), each = 2))
L = list()
set.seed(2)
for (each in round(runif(4, 1,5))) L = c(L, list(1:each))
df$L = L
> df
x y L
1 1 a 1, 2
2 2 a 1, 2, 3, 4
3 1 b 1, 2, 3
4 2 b 1, 2
table(ddply(df,.(x),summarize,unlist(L)))
> table(ddply(df,.(x),summarize,unlist(L)))
..1
x 1 2 3 4
1 2 2 1 0
2 2 2 1 1
If you're not into plyr...
vals <- unique(unlist(df$L))
names(vals) <- vals
do.call("rbind",
lapply(split(df,df$x),function(byx){
sapply(vals, function(i){
sum(unlist(sapply(byx$L,"==",i)))
})
})
)

Resources