How to loop over a column in a data frame in R? - r

I'm quite new to R. I have a df with a V1 column. I would like to create a loop to calculate the ratio (cuf-off values).
I want to take the first number 1 and divide by 301 and put the value in a df$V2. Then I want to sum first two numbers and divide by 301 etc.
For example:
V2
1/301 (first value of df) 0.0033
2/301 (sum of the first two values of df) 0.0066
2/301 (sum of the first three values of df) 0.0066
df
V1
1
1
0
0
1
0
1
1
1
0

You can take cumulative sum of V1 values and divide it by 301.
df$V2 <- cumsum(df$V1)/301
df
# V1 V2
#1 1 0.00332
#2 1 0.00664
#3 0 0.00664
#4 0 0.00664
#5 1 0.00997
#6 0 0.00997
#7 1 0.01329
#8 1 0.01661
#9 1 0.01993
#10 0 0.01993

We can transform to create the 'V2' by dividing the cumulative sum of 'V1' with 301
df <- transform(df, V2 = cumsum(V1)/301)
-output
df
# V1 V2
#1 1 0.003322259
#2 1 0.006644518
#3 0 0.006644518
#4 0 0.006644518
#5 1 0.009966777
#6 0 0.009966777
#7 1 0.013289037
#8 1 0.016611296
#9 1 0.019933555
#10 0 0.019933555
Or another option is Reduce with accumulate = TRUE
transform(df, V2 = Reduce(`+`, V1, accumulate = TRUE)/301)
Or if we need to loop, loop over the sequence of rows, then assign each value of 'V2' by the sum of the sequence of elements from 1 to that row of 'V1' divided by 301
df$V2 <- 0
for(i in seq_len(nrow(df))) {
df$V2[i] <- sum(df$V1[1:i])/301
}
Or using tidyverse
library(dplyr)
df %>%
mutate(V2 = cumsum(V1)/301)
Or using accumulate
library(purrr)
df %>%
mutate(V2 = accumulate(V1, `+`)/301)
data
df <- structure(list(V1 = c(1, 1, 0, 0, 1, 0, 1, 1, 1, 0)), class = "data.frame",
row.names = c(NA,
-10L))

Related

How to change specific values in a dataframe

Could anyone explain how to change the negative values in the below dataframe?
we have been asked to create a data structure to get the below output.
# > df
# x y z
# 1 a -2 3
# 2 b 0 4
# 3 c 2 -5
# 4 d 4 6
Then we have to use control flow operators and/or vectorisation to multiply only the negative values by 10.
I tried so many different ways but cannot get this to work. I get an error when i try to use a loop and because of the letters.
Create indices of the negative values and multiply by 10, i.e.
i1 <- which(df < 0, arr.ind = TRUE)
df[i1] <- as.numeric(df[i1]) * 10
# x y z
#1 a -20 3
#2 b 0 4
#3 c 2 -50
#4 d 4 6
First find out the numeric columns of the dataframe and multiply the negative values by 10.
cols <- sapply(df, is.numeric)
#Multiply negative values by 10 and positive with 1
df[cols] <- df[cols] * ifelse(sign(df[cols]) == -1, 10, 1)
df
# x y z
#1 a -20 3
#2 b 0 4
#3 c 2 -50
#4 d 4 6
Using dplyr -
library(dplyr)
df <- df %>% mutate(across(where(is.numeric), ~. * ifelse(sign(.) == -1, 10, 1)))

Dispatch values in list column to separate columns

I have a data.table with a list column "c":
df <- data.table(a = 1:3, c = list(1L, 1:2, 1:3))
df
a c
1: 1 1
2: 2 1,2
3: 3 1,2,3
I want to create separate columns for the values in "c".
I create a set of new columns F_1, F_2, F_3:
mmax <- max(df$a)
flux <- paste("F", 1:mmax, sep = "_")
df[, (flux) := 0]
df
a c F_1 F_2 F_3
1: 1 1 0 0 0
2: 2 1,2 0 0 0
3: 3 1,2,3 0 0 0
I want to dispatch values in "c" to columns F_1, F_2, F_3 like this:
df
a c F_1 F_2 F_3
1: 1 1 1 0 0
2: 2 1,2 1 2 0
3: 3 1,2,3 1 2 3
What I have tried:
comp_vect <- function(vec, mmax){
vec <- vec %>% unlist()
n <- length(vec)
answr <- c(vec, rep(0, l = mmax -n))
}
df[ , ..flux := mapply(comp_vect, c, mmax)]
The expected data.table is :
> df
a c F_1 F_2 F_3
1: 1 1 1 0 0
2: 2 1,2 1 2 0
3: 3 1,2,3 1 2 3
I followed a radically different approach. I rbinded the list column and then dcasted it, obtaining the desired result. Last part is to set the names.
library(data.table)
df <- data.table(a = 1:3, d = list(1L, c(1L, 2L), c(1L, 2L, 3L)))
df2 <- df[, rbind(d), by = a][, dcast(.SD, a ~ V1, fill = 0)]
setnames(df2, 2:4, flux)[]
a F_1 F_2 F_3
1: 1 1 0 0
2: 2 1 2 0
3: 3 1 2 3
where flux is the variable of names that you defined in your question.
Please notice that avoided using the column name c, as it may be confused with the function c().
Solution :
for(idx in seq(max(sapply(df$c, length)))){ # maximum number of values according to all the elements of the list
set(x = df,
i = NULL,
j = paste0("F_",idx), # column's name
value = sapply(df$c, function(x){
if(is.na(x[idx])){
return(0) # 0 instead of NA
} else {
return(x[idx])
}
})
)
}
Explications :
We can extract the values from a list like this :
sapply(df$c, function(ll) return(ll[1])) # first value
[1] 1 1 1
sapply(df$c, function(ll) return(ll[2])) # second value
[1] NA 2 2
sapply(df$c, function(ll) return(ll[3])) # third value
[1] NA NA 3
We see that if there is no value, we have a NA.
We need an iterator to extract all values at the position idx. For that, we'll find the number of values in each element of df$c (the list) and keep the maximum.
max(sapply(df$c, length))
[1] 3
If we want zeros instead of NAs, we need to create a function in the sapply to convert them :
vec <- c(NA, 5, 1, NA)
> sapply(vec, function(x) if(is.na(x)) return(0) else return(x))
[1] 0 5 1 0

Converting table with missing values to matrix of counts

I have table with an unequal number of elements in each row, with each element having a count of 1 or 2 appended to a string. I want to create a matrix of presence/absence of each string, but including the count (1,2) and placing a zero if the string is not found.
From this:
V1 V2 V3 V4 V5
1 A cat:2 dog:1 mouse:1 horse:2
2 B dog:2 mouse:2 dolphin:2
3 C horse:2
4 D cat:1 mouse:2 dolphin:2
To this:
cat dog mouse horse dolphin
A 2 1 1 2 0
B 0 2 2 0 2
C 0 0 0 2 0
D 1 0 2 0 2
I have looked up previous solutions to similar problems:
Convert a dataframe to presence absence matrix
put they create a 0/1 matrix of absence, not including the count.
sample data:
structure(list(V1 = c("A", "B", "C", "D"),
V2 = c("cat:2", "dog:2", "horse:2", "cat:1"),
V3 = c("dog:1", "mouse:2", "", "mouse:2"),
V4 = c("mouse:1", "dolphin:2", "", "dolphin:2"),
V5 = c("horse:2", "", "", "")),
.Names = c("V1", "V2", "V3", "V4", "V5"),
class = "data.frame", row.names = c(NA, -4L))
Maybe some package could make this easier, but here is a solution. It won't be fast for large data, but it does the job:
#split the strings
tmp <- apply(DF[,-1], 1, strsplit, ":")
#extract the first strings
names <- lapply(tmp,function(x) c(na.omit(sapply(x, "[", 1))))
uniquenames <- unique(unlist(names))
#extract the numbers
reps <- lapply(tmp,function(x) as.numeric(na.omit(sapply(x, "[", 2))))
#make the numbers named vectors
res <- mapply(setNames, reps, names)
#subset the named vectors and combine result in a matrix
res <- do.call(rbind, lapply(res, "[",uniquenames))
#cosmetics
colnames(res) <- uniquenames
rownames(res) <- DF$V1
res[is.na(res)] <- 0
# cat dog mouse horse dolphin
#A 2 1 1 2 0
#B 0 2 2 0 2
#C 0 0 0 2 0
#D 1 0 2 0 2
You can separate the animals from the counts with separate from tidyr right after melting the data into long format and then dcasting to wide using the counts as values (which need to be casted from character to numeric as a previous step).
data %>%
melt("V1") %>%
separate(value, c("animal", "count"), ":", fill = "left") %>%
transform(count = as.numeric(count)) %>%
dcast(V1 ~ animal, value.var = "count", fun.aggregate = sum) %>%
select(-"NA")
# V1 cat dog dolphin horse mouse
# 1 A 2 1 0 2 1
# 2 B 0 2 2 0 2
# 3 C 0 0 0 2 0
# 4 D 1 0 2 0 2

Forming a co-occurence matrix from a data frame

I have a data frame which looks something like this:
id val
1 a
1 b
2 a
2 c
2 d
3 a
3 a
think of each row as a label, val, that was given to some observation with an id.
What I ultimately want to get to is a "co-occurence" matrix that looks something like this where I get a count of how many times each letter appears within the same id with each other letter:
a b c d
a 1 1 1 1
b 1 0 0 0
c 1 0 0 1
d 1 0 1 0
I've been wracking my brain looking for ways to do this, but have come up empty so far. Any hints? Preferably using tidyverse tools, but open to other options as well at this point.
EDIT: the solutions to the question linked as a possible duplicate do not work in this case. I'm not sure why, but I suspect it has to do with that question having a data frame with 3 columns.
Here's a solution in base R. Not quite elegant but seems to work
temp = data.frame(do.call(cbind, lapply(split(df, df$id), function(a)
combn(a$val, 2))), stringsAsFactors = FALSE)
sapply(sort(unique(df$val)), function(rows)
sapply(sort(unique(df$val)), function(cols)
sum(sapply(temp, function(x)
identical(sort(x), sort(c(rows, cols)))))))
# a b c d
#a 1 1 1 1
#b 1 0 0 0
#c 1 0 0 1
#d 1 0 1 0
OR with igraph
temp = t(do.call(cbind, lapply(split(df, df$id), function(a) combn(a$val, 2))))
library(igraph)
as.matrix(get.adjacency(graph(temp, directed = FALSE)))
# a c b d
#a 1 1 1 1
#c 1 0 0 1
#b 1 0 0 0
#d 1 1 0 0
DATA
df = structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L, 3L),
val = c("a", "b", "a", "c", "d", "a", "a")),
.Names = c("id", "val"),
class = "data.frame",
row.names = c(NA, -7L))
A solution with dplyr + purrr:
library(dplyr)
library(purrr)
df %>%
split(.$id) %>%
map_dfr(function(x){
t(combn(x$val, 2)) %>%
data.frame(stringsAsFactors = FALSE)
}) %>%
mutate_all(funs(factor(., levels = c("a", "b", "c", "d")))) %>%
table() %>%
pmax(., t(.))
Result:
X2
X1 a b c d
a 1 1 1 1
b 1 0 0 0
c 1 0 0 1
d 1 0 1 0
Notes:
I first split the df by id, then used map_dfr from purrr to map the combn function to each id group.
combn finds all combinations of elements within a vector (length(vec) choose 2) and returns a matrix.
_dfr at the end of map_dfr means that the result will be a dataframe by row binding each element of the list. So this is effectively do.call(rbind, lapply()).
mutate_all makes sures that table retains all the levels needed even if a letter does not exist in a column.
Finally, since after the table step, an upper triangular matrix is produced, I fed that matrix and its transpose into pmax
pmax finds the parallel maxima from the two inputs and returns a symmetric matrix as desired.
Data:
df = read.table(text= "id val
1 a
1 b
2 a
2 c
2 d
3 a
3 a", header = TRUE, stringsAsFactors = FALSE)

Take certain value in a data frame

I have a data.frame and would like to take a certain value from a cell if another is in a dataframe.
I tried the apply function.
n <- c(2, 3, 0 ,1)
s <- c(0, 1, 1, 2)
b <- c("THIS", "FALSE", "NOT", "THIS")
df <- data.frame(n, s, b)
df <- sapply(df$Vals, FUN=function(x){ if(b[x]=="THIS") ? n[x] : s[x] } )
My logic is:
if(b at position x is equal to "This") {
add n[x] to the column df$Vals
} else {
add s[x] to the column df$Vals
}
Whereas x is a single row.
Any recommendation what I am doing wrong?
I appreciate your reply!
Like this:
df$Vals = with(df, ifelse(b=="THIS", n, s))
Or giving direct the resulting data.frame:
transform(df, Vals=with(df, ifelse(b=="THIS", n, s)))
# n s b Vals
#1 2 0 THIS 2
#2 3 1 FALSE 1
#3 0 1 NOT 1
#4 1 2 THIS 1
With your additional conditions:
func=Vectorize(function(b, s, n){if(b=='THIS') return(n);if(b==F) return(n+s);s})
df$Vals = with(df, func(b,s,n))
Or you could use the row/column indexing
df$Vals <- df[1:2][cbind(1:nrow(df),(df$b!='THIS')+1)]
df
# n s b Vals
#1 2 0 THIS 2
#2 3 1 FALSE 1
#3 0 1 NOT 1
#4 1 2 THIS 1

Resources