using table() with for loop - r

df <- data.frame(class=c('A', 'B'),
var2=c(1, 0),
var3=c(0, 1))
for (i in colnames(df)[2:3]) {
#print(i)
table(paste0('df$', i), df$class)
}
results in
Error in table(paste0("df$", i), df$class) :
all arguments must have the same length
Also tried putting
get(paste0('df$',i))
Is there a way to loop through these columns and tabulate?

The issue with your code is that because paste0() returns a character vector e.g, 'var2' and is not a correct argument for table() function. You can use the double bracket '[[' to extract the columns:
# create a list to save the results from loop
tl<-vector(mode = 'list')
# run the loop and add the results for each column in the corresponding element of 'tl'
for (i in colnames(df)[2:3]) {
tl[[i]]<-table(df[[i]], df$class)
}
output
tl
$var2
A B
0 0 1
1 1 0
$var3
A B
0 1 0
1 0 1
alternatively you can use lapply() function:
lapply(df[, 2:3], function(x) table(x, df$class))
var2
x A B
0 0 1
1 1 0
$var3
x A B
0 1 0
1 0 1

Not much info on what exactly your preferred output is other than it tabulates the columns, but here's a potential solution:
# This is your df:
class<- c('A','B')
var2<- c(1,0)
var3 <- c(0,1)
df<- data.frame(class,var2,var3)
# Using lapply to tabulate each column. The output is a list of tables:
dftable <- lapply(df, table)
The output looks like this:
> dftb
$class
A B
1 1
$var2
0 1
1 1
$var3
0 1
1 1
The map() function from the purr package (part of tidyverse) can also be used:
library(tidyverse)
dftb <- lapply(df, table)

Related

How do I transform a 1 column data frame into a vector while the column name stays the vector name in R

I have a dataframe with one column and some rows which I want to transform into a vector. The name of the column should be the name of the vector as well. Usually, I create another object doing this:
new_object <- as.vector(df$variable_name)
new_object
Is there a way to keep the variable name as the name of the vector?
(I am asking as I try to build this in a function and need it therefore)
Thank you!
You can use list2env -
df <- data.frame(a = 1:5)
list2env(df, .GlobalEnv)
a
#[1] 1 2 3 4 5
sub <-c("A","A","A","A","B","B","B","B","C","C","C","C")
n<-c(0,1,1,1,0,1,0,1,0,1,0,1)
df <- data.frame(sub, n)
n <- df$n
[1] 0 1 1 1 0 1 0 1 0 1 0 1
You should be able to just simply call the variable name to get the vector.
df <- data.frame(x = c(1, 2, 3))
assign(names(df), df[,1])
x
# [1] 1 2 3
We could use attach
attach(df)
-output
> a
[1] 1 2 3 4 5
data
df <- data.frame(a = 1:5)

Create a list of vectors from a vector where n consecutive values are not 0 in R

So I have this vector:
a = sample(0:3, size=30, replace = T)
[1] 0 1 3 3 0 1 1 1 3 3 2 1 1 3 0 2 1 1 2 0 1 1 3 2 2 3 0 1 3 2
What I want to have is a list of vectors with all the elements that are separated by n 0s. So in this case, with n = 0 (there can't be any 0 between the consecutive values), this would give:
res = c([1,3,3], [1,1,1,3,3,2,1,1,3], [2,1,1,2]....)
However, I would like to control the n-parameter flexible to that if I would set it for example to 2, that something like this:
b = c(1,2,0,3,0,0,4)
would still result in a result like this
res = c([1,2,3],[4])
I tried a lot of approaches with while loops in for-loops while trying to count the number of 0s. But I just could not achieve it.
Update
I tried to post the question in a more real-world setting here:
Flexibly calculate column based on consecutive counts in another column in R
Thank you all for the help. I just don't seem to manage put your help into practice with my limited knowledge..
Here is a base R option using rle + split for general cases, i.e., values in b is not limited to 0 to 3.
with(
rle(with(rle(b == 0), rep(values & lengths == n, lengths))),
Map(
function(x) x[x != 0],
unname(split(b, cut(seq_along(b), c(0, cumsum(lengths))))[!values])
)
)
which gives (assuming n=2)
[[1]]
[1] 1 2 3
[[2]]
[1] 4
If you have values within ragne 0 to 9, you can try the code below
lapply(
unlist(strsplit(paste0(b, collapse = ""), strrep(0, n))),
function(x) {
as.numeric(
unlist(strsplit(gsub("0", "", x), ""))
)
}
)
which also gives
[[1]]
[1] 1 2 3
[[2]]
[1] 4
I also wanted to paste a somehow useful solution with the function SplitAt from DescTools:
SplitAt(a, which(a==0)) %>% lapply(., function(x) x[which(x != 0)])
where a is your intial vector. It gives you a list where every entry contains the pair of numbers between zeros:
If you than add another SplitAt with empty chars, you can create sublist after sublist and split it in as many sublists as you want: e.g.:
n <- 4
SplitAt(a, which(a==0)) %>% lapply(., function(x) x[which(x != 0)]) %>% SplitAt(., n)
gives you:
set.seed(1)
a <- sample(0:3, size=30, replace = T)
a
[1] 0 3 2 0 1 0 2 2 1 1 2 2 0 0 0 1 1 1 1 2 0 2 0 0 0 0 1 0 0 1
a2 <- paste(a, collapse = "") # Turns into a character vector, making it easier to handle patterns.
a3 <- unlist(strsplit(a2, "0")) # Change to whatever pattern you want, like "00".
a3 <- a3[a3 != ""] # Remove empty elements
a3 <- as.numeric(a3) # Turn back to numeric
a3
[1] 32 1 221122 11112 2 1 1

Vectorize for-loop

I have the following data:
Letters <- c("A","B","C")
Numbers <- c(1,0,1)
Numbers <- as.integer(Numbers)
Data.Frame <- data.frame(Letters,Numbers)
I want to create a Dummy Variable for the Letters and wrote the following for-loop:
for(level in unique(Data.Frame$Letters)){Data.Frame[paste("", level, sep = "")]
<- ifelse(Data.Frame$Letters == level, 1, 0)}
Is there a way to vectorize this for-loop? Is the following use of dcast alredy vectorized?
dt <- data.table(Letters,Numbers)
dcast.data.table(dt, Letters+Numbers~Letters,fun.aggregate=length)
You could use outer
cbind(Data.Frame, +outer(Letters, setNames(nm=Letters), "=="))
# Letters Numbers A B C
# 1 A 1 1 0 0
# 2 B 0 0 1 0
# 3 C 1 0 0 1

R - setting a value based on a function applied to other values in the same row

I have a dataframe containing (surprise) data. I have one column which I wish to populated on a per-row basis, calculated from the values of other columns in the same row.
From googling, it seems like I need 'apply', or one of it's close relatives. Unfortunately I haven't managed to make it actually work.
Example code:
#Example function
getCode <- function (ar1, ar2, ar3){
if(ar1==1 && ar2==1 && ar3==1){
return(1)
} else if(ar1==0 && ar2==0 && ar3==0){
return(0)
}
return(2)
}
#Create data frame
a = c(1,1,0)
b = c(1,0,0)
c = c(1,1,0)
df <- data.frame(a,b,c)
#Add column for new data
df[,"x"] <- 0
#Apply function to new column
df[,"x"] <- apply(df[,"x"], 1, getCode(df[,"a"], df[,"b"], df[,"c"]))
I would like df to be taken from:
a b c x
1 1 1 1 0
2 1 0 1 0
3 0 0 0 0
to
a b c x
1 1 1 1 1
2 1 0 1 2
3 0 0 0 0
Unfortunately running this spits out:
Error in match.fun(FUN) : 'getCode(df[, "a"], df[, "b"], df[,
"c"])' is not a function, character or symbol
I'm new to R, so apologies if the answer is blindingly simple. Thanks.
A few things: apply would be along the dataframe itself (i.e. apply(df, 1, someFunc)); it's more idiomatic to access columns by name using the $ operator.. so if I have a dataframe named df with a column named a, access a with df$a.
In this case, I like to do an sapply along the index of the dataframe, and then use that index to get the appropriate elements from the dataframe.
df$x <- sapply(1:nrow(df), function(i) getCode(df$a[i], df$b[i], df$c[i]))
As #devmacrile mentioned above, I would just modify the function to be able to get a vector with 3 elements as input and use it within an apply command as you mentioned.
#Example function
getCode <- function (x){
ifelse(x[1]==1 & x[2]==1 & x[3]==1,
1,
ifelse(x[1]==0 & x[2]==0 & x[3]==0,
0,
2)) }
#Create data frame
a = c(1,1,0)
b = c(1,0,0)
c = c(1,1,0)
df <- data.frame(a,b,c)
df
# a b c
# 1 1 1 1
# 2 1 0 1
# 3 0 0 0
# create your new column of results
df$x = apply(df, 1, getCode)
df
# a b c x
# 1 1 1 1 1
# 2 1 0 1 2
# 3 0 0 0 0

Loop to create dummy variable R

I am trying to generate dummy variables (must be 1/0) using a loop based on the most frequent response of a variable. After lots of googling, I haven't managed to come up with a solution. I have extracted the most frequent responses (strings, say the top 5 are "A","B",...,"E") using
top5<-names(head(sort(table(data$var1), decreasing = TRUE),5)
I would like the loop to check if another variable ("var2") equals A, if so set =1, OW =0, then give a summary using aggregate(). In Stata, I can refer to the looped variable i using `i' but not in R... The code that does not work is:
for(i in top5) {
data$i.dummy <- ifelse(data$var2=="i",1,0)
aggregate(data$i.dummy~data$age+data$year,data,mean)
}
Any suggestions?
If you want one column per item in your top 5 then I would use sapply along the elements in top5. No need for ifelse because == compares and gives TRUE or 1 if the comparison is TRUE and 0 otherwise
Here we cbind a matrix of 5 columns, one each for each element of top5 containing 1 if the row in data$var2 equals the respective element of 'top5':
data <- cbind( data , sapply( top5 , function(x) as.integer( data$var2 == x ) ) )
If you want one column for matches of any of top5 it's even easier:
data$dummies <- as.integer( data$var2 %in% top5 )
The as.integer() in both cases is used to turn TRUE or FALSE to 1 and 0 respectively.
A cut down example to illustrate how it works:
set.seed(123)
top2 <- c("A","B")
data <- data.frame( var2 = sample(LETTERS[1:4],6,repl=TRUE) )
# Make dummy variables, one column for each element in topX vector
data <- cbind( data , sapply( top2 , function(x) as.integer( data$var2 == x ) ) )
data
# var2 A B
#1 B 0 1
#2 D 0 0
#3 B 0 1
#4 D 0 0
#5 D 0 0
#6 A 1 0
# Make single column for all elements in topX vector
data$ANY <- as.integer( data$var2 %in% top2 )
data
# var2 ANY A B
#1 B 1 0 1
#2 D 0 0 0
#3 B 1 0 1
#4 D 0 0 0
#5 D 0 0 0
#6 A 1 1 0
See fortune(312), then read the help ?"[[" and possibly the help for paste0.
Then possibly consider using other tools like model.matrix and sapply rather than doing everything yourself using loops.

Resources