Grouping data with missing value - r

I need to compare questions of two different surveys (t1, t2).
Therefore, I have two dataframes like those below:
t1 t2
x x
x y
y z
z w
y z
x x
z y
z w
w x
z
v
This data needs to be grouped by v, w, x, y and z.
Unfortunately, value v does not occur in the first dataframe and both dataframes have a diffent amount of rows, so I cannot put them together in one dataframe.
When I use "group_by" and "summarise", I get two columns, but one with 4 and the one with 5 rows. Same like before, I cannot put them together.
I don't want to add an additional row in the first dataframe, as I don't want to manipulate the origin dataset.
At the end I need a table, which should look like the following one:
t1 t2
v 0 1
w 1 2
x 3 3
y 2 2
z 3 3
I hope you can help me!
Thank you!

One way would be:
library(tidyverse)
bind_rows(
gather(t1),
gather(t2)
) %>% {table(.$val, .$key)}
Output:
t1 t2
v 0 1
w 1 2
x 3 3
y 2 2
z 3 3

Related

Change values smoothly over time in R

I have a variable called "exposed" and I know already the sum of exposed people over time: have a look to understand
i
exposed
1
y
2
y
3
y
4
n
5
n
So I have 3 exposed individuals and 2 are not.
t <- 5
#I know that each i in t :
sum(exposed[i]) <- c(3,4,1,4,5)
I created this line of code to capture the change in data:
evol <- list()
for(i in 1:t){evol[[i]]<- df}
for (i in 2:t) {
# condition
}
My question is: what is the condition that I have to write to have in:
evol[[1]]
a data that looks like this:
i
exposed
1
y
2
y
3
y
4
n
5
n
evol[[2]]
the data looks like this:
i
exposed
1
y
2
y
3
y
4
y
5
n
evol[[3]]
a data that looks like this:
i
exposed
1
y
2
n
3
n
4
n
5
n
I hope I made it clear;
Any ideas please;
kind regards.
If I'm understanding you correctly, you want a list of dataframes based on the exposed sums.
Using lapply you can do
exposed <- c(3,4,1,4,5)
evol <- lapply(exposed, \(x) data.frame(i = seq_along(exposed), exposed = c(rep("y", x), rep("n", length(exposed) - x ))) )
evol[[1]]
i exposed
1 1 y
2 2 y
3 3 y
4 4 n
5 5 n

select row based on value of another row in R

EDIT: to make myself clear, I know how to select individual rows and I know there are many different ways of doing it. I want to write a code that will work no matter what the actual value of the rows is, so it works over a larger dataframe, that is, I don't have to change the code based on the content. So instead of saying, select row 1, then 3, it'll say, select row one, then row [value in row 1 column Z] then row [value in column Z from the row just selected] and so on - so my question is, how to tell R to read that value as row number
I'm trying to figure out how to select and save a row based on a value in another row, so that I get get a new df with row 1(aA), then go to row 3 and save it (cC), then go to row 2 etc.
X Y Z
a A 3
b B 5
c C 2
d D 1
e E NA
Knowing the row number, I can use rbind which will give me the following
rbind(df[1, ], df[3, ]
a A 3
c C 2
But I want R to extract the number 3 from the column not to explicitly tell it which row to pick - how do I do that?
Thanks
You can use a while loop to keep on selecting rows until NA occurs or all the rows are selected in the dataframe.
all_rows <- 1
next_row <- df$Z[all_rows]
while(!is.na(next_row) || length(all_rows) >= nrow(df)) {
all_rows <- c(all_rows, next_row)
next_row <- df$Z[all_rows[length(all_rows)]]
}
result <- df[all_rows, ]
# X Y Z
#1 a A 3
#3 c C 2
#2 b B 5
#5 e E NA
if you know which rows of which column that you want, you can use ;
df <- read.table(textConnection('X Y Z
a A 3
b B 5
c C 2
d D 1
e E NA'),
header=T)
desired_rows <- c('a','c')
df2 <- df[df$X %in% desired_rows,]
df2
output;
X Y Z
<fct> <fct> <int>
1 a A 3
2 c C 2

"Weighted" counts at each combination of factor levels

I have the following dataframe:
> df=data.frame(from = c("x","y","x","z"), to=c("w","x","w","y"),weight=c(1,1,3,4))
> df
from to weight
1 x w 1
2 y x 1
3 x w 3
4 z y 4
If I want to calculate how many times an element of column from appears in the dataframe, I need to use:
> table(df$from)
x y z
2 1 1
This is not a weighted sum. Anyway, how could I consider also the column weight? E.g. in my example, the correct answer should be:
x y z
4 1 4
You can use tapply and calculate sum for each unique value in from
tapply(df$weight, df$from, sum)
#x y z
#4 1 4
We can use count from dplyr
library(dplyr)
df %>%
count(from, wt = weight)
# from n
#1 x 4
#2 y 1
#3 z 4
In base R, we can use xtabs
xtabs(weight~ from, df)
#from
#x y z
#4 1 4

group by count when count is zero in r

I use aggregate function to get count by group. The aggregate function only returns count for groups if count > 0. This is what I have
dt <- data.frame(
n = c(1,2,3,4,5,6),
id = c('A','A','A','B','B','B'),
group = c("x","x","y","x","x","x"))
applying the aggregate function
my.count <- aggregate(n ~ id+group, dt, length)
now see the results
my.count[order(my.count$id),]
I get following
id group n
1 A x 2
3 A y 1
2 B x 3
I need the following (the last row has zero that i need)
id group n
1 A x 2
3 A y 1
2 B x 3
4 B y 0
thanks for you help in in advance
We can create another column 'ind' and then use dcast to reshape from 'long' to 'wide', specifying the fun.aggregate as length and drop=FALSE.
library(reshape2)
dcast(transform(dt, ind='n'), id+group~ind,
value.var='n', length, drop=FALSE)
# id group n
#1 A x 2
#2 A y 1
#3 B x 3
#4 B y 0
Or a base R option is
as.data.frame(table(dt[-1]))
You can merge your "my.count" object with the complete set of "id" and "group" columns:
merge(my.count, expand.grid(lapply(dt[c("id", "group")], unique)), all = TRUE)
## id group n
## 1 A x 2
## 2 A y 1
## 3 B x 3
## 4 B y NA
There are several questions on SO that show you how to replace NA with 0 if that is required.
aggregate with drop=FALSE worked for me.
my.count <- aggregate(n ~ id+group, dt, length, drop=FALSE)
my.count[is.na(my.count)] <- 0
my.count
# id group n
# 1 A x 2
# 2 B x 3
# 3 A y 1
# 4 B y 0
If you are interested in frequencies only, you create with your formula a frequency table an turn it into a dataframe:
as.data.frame(xtabs(formula = ~ id + group, dt))
Obviously this won't work for other aggregate functions. I'm still waiting for dplyr's summarise function to let the user decide whether zero-groups are kept or not. Maybe you can vote for this improvement here: https://github.com/hadley/dplyr/issues/341

In R, extract rows based on strings in different columns

Sorry if the solution to my problem is already out there, and I overlooked it. There are a lot of similar topics which all helped me understand the basics of what I'm trying to do, but did not quite solve my exact problem.
I have a data frame df:
> type = c("A","A","A","A","A","A","B","B","B","B","B","B")
> place = c("x","y","z","x","y","z","x","y","z","x","y","z")
> value = c(1:12)
>
> df=data.frame(type,place,value)
> df
type place value
1 A x 1
2 A y 2
3 A z 3
4 A x 4
5 A y 5
6 A z 6
7 B x 7
8 B y 8
9 B z 9
10 B x 10
11 B y 11
12 B z 12
>
(my real data has 3 different values in type and 10 in place, if that makes a difference)
I want to extract rows based on the strings in columns m and n.
E.g. I want to extract all rows that contain A in type and x and z in place, or all rows with A and B in type and y in place.
This works perfectly with subset, but I want to run my scripts on different combinations of extracted rows, and adjusting the subset command every time isn't very effective.
I thought of using a vector containing as elements what to get from type and place, respectively.
I tried:
v=c("A","x","z")
df.extract <- df[df$type&df$place %in% v]
but this returns an error.
I'm a total beginner with R and programming, so please bear with me.
You could try
df[df$type=='A' & df$place %in% c('x','y'),]
# type place value
#1 A x 1
#2 A y 2
#4 A x 4
#5 A y 5
For the second case
df[df$type %in% c('A', 'B') & df$place=='y',]
Update
Suppose, you have many columns and needs to subset the dataset based on values from many columns. For example.
set.seed(24)
df1 <- cbind(df, df[sample(1:nrow(df)),], df[sample(1:nrow(df)),])
colnames(df1) <- paste0(c('type', 'place', 'value'), rep(1:3, each=3))
row.names(df1) <- NULL
You can create a list of the values from the columns of interest
v1 <- setNames(list('A', 'x', c('A', 'B'),
'x', 'B', 'z'), paste0(c('type', 'place'), rep(1:3, each=2)))
and then use Reduce
df1[Reduce(`&`,Map(`%in%`, df1[names(v1)], v1)),]
you can make a function extract :
extract<-function(df,type,place){
df[df$type %in% type & df$place %in% place,]
}
that will work for the different subsets you want to do :
df.extract<-extract(df=df,type="A",place=c("x","y")) # or just extract(df,"A",c("x","y"))
> df.extract
type place value
1 A x 1
2 A y 2
4 A x 4
5 A y 5
df.extract<-extract(df=df,type=c("A","B"),place="y") # or just extract(df,c("A","B"),"y")
> df.extract
type place value
2 A y 2
5 A y 5
8 B y 8
11 B y 11

Resources