Change values smoothly over time in R - r

I have a variable called "exposed" and I know already the sum of exposed people over time: have a look to understand
i
exposed
1
y
2
y
3
y
4
n
5
n
So I have 3 exposed individuals and 2 are not.
t <- 5
#I know that each i in t :
sum(exposed[i]) <- c(3,4,1,4,5)
I created this line of code to capture the change in data:
evol <- list()
for(i in 1:t){evol[[i]]<- df}
for (i in 2:t) {
# condition
}
My question is: what is the condition that I have to write to have in:
evol[[1]]
a data that looks like this:
i
exposed
1
y
2
y
3
y
4
n
5
n
evol[[2]]
the data looks like this:
i
exposed
1
y
2
y
3
y
4
y
5
n
evol[[3]]
a data that looks like this:
i
exposed
1
y
2
n
3
n
4
n
5
n
I hope I made it clear;
Any ideas please;
kind regards.

If I'm understanding you correctly, you want a list of dataframes based on the exposed sums.
Using lapply you can do
exposed <- c(3,4,1,4,5)
evol <- lapply(exposed, \(x) data.frame(i = seq_along(exposed), exposed = c(rep("y", x), rep("n", length(exposed) - x ))) )
evol[[1]]
i exposed
1 1 y
2 2 y
3 3 y
4 4 n
5 5 n

Related

Change in state based on sorted variable in R

I have a variable called "exposed" and I know already the sum of exposed people over time, "index" is how many people the individual meet in a week: have a look to understand
i
exposed
index
1
y
22
2
y
12
3
y
6
4
n
54
5
n
3
So I have 3 exposed individuals and 2 are not.
t <- 5
#I know that each i in t :
sum(exposed[i]) <- c(3,4,1,4,5)
I created this line of code to capture the change in data:
evol <- list()
for(i in 1:t){evol[[i]]<- df}
for (i in 2:t) {
# condition
}
If the number of exposed at [t] is higher than the number of exposed at [t-1] so individuals with exposed==n and have a higher index change the exposed variable from n to y.
If the number of exposed at [t] is lower than the number of exposed in [t-1] so individuals with exposed==y and have the lower index change the exposed variable from y to n.
My question is: what is the condition that I have to write to have in:
A data set that looks like this:
evol[[1]]
i
exposed
index
1
y
22
2
y
12
3
y
6
4
n
54
5
n
3
should change to the data hat looks like this:
evol[[2]]
i
exposed
index
1
y
22
2
y
12
3
y
6
4
y
54
5
n
3
If the data looks like this:
evol[[3]]
i
exposed
index
1
n
22
2
n
12
3
n
6
4
y
54
5
n
3
I hope I made it clear;
Any ideas please;
kind regards.
Here is a function to change the vector exposed.
change_exposed <- function(exposed, index) {
stopifnot(length(exposed) == length(index))
for(i in seq_len(length(index))[-1L]) {
if(index[i] > index[i - 1L]) {
exposed[i] <- if(exposed[i] == "y") "n" else "y"
}
}
exposed
}
change_exposed(evol[[1]]$exposed, evol[[1]]$index)
#[1] "y" "y" "y" "y" "n"
Assign the result to exposed to actually change the data set.
evol[[1]]$exposed <- change_exposed(evol[[1]]$exposed, evol[[1]]$index)
identical(evol[[1]], evol[[2]])
# [1] TRUE
Data
evol <- list()
x <- 'i exposed index
1 y 22
2 y 12
3 y 6
4 n 54
5 n 3'
evol[[1]] <- read.table(textConnection(x), header = TRUE)
x <- 'i exposed index
1 y 22
2 y 12
3 y 6
4 y 54
5 n 3'
evol[[2]] <- read.table(textConnection(x), header = TRUE)

Grouping data with missing value

I need to compare questions of two different surveys (t1, t2).
Therefore, I have two dataframes like those below:
t1 t2
x x
x y
y z
z w
y z
x x
z y
z w
w x
z
v
This data needs to be grouped by v, w, x, y and z.
Unfortunately, value v does not occur in the first dataframe and both dataframes have a diffent amount of rows, so I cannot put them together in one dataframe.
When I use "group_by" and "summarise", I get two columns, but one with 4 and the one with 5 rows. Same like before, I cannot put them together.
I don't want to add an additional row in the first dataframe, as I don't want to manipulate the origin dataset.
At the end I need a table, which should look like the following one:
t1 t2
v 0 1
w 1 2
x 3 3
y 2 2
z 3 3
I hope you can help me!
Thank you!
One way would be:
library(tidyverse)
bind_rows(
gather(t1),
gather(t2)
) %>% {table(.$val, .$key)}
Output:
t1 t2
v 0 1
w 1 2
x 3 3
y 2 2
z 3 3

create a new variable in r using for loop based on the condition of an existing variable

Suppose the data frame is like this:
df <- data.frame(x = c(1,7,8,15,24,100,9,19,128))
How do I create a new variable that satisfies the following condition:
y = 1 if 1<=x<=7
y = 2 if 8<=x<=14
y = 3 if 15<=x<=21
...
y = k if 1+7*(k-1)<= x<= 7+7*(k-1)
so that I can have the new data frame like this
df <- data.frame(y = c(1,1,2,3,4,15, 2,3, 19))
I am wondering if a for loop can be applied in this case.
Via simple algebra, you can do:
df$y <- floor((df$x+6)/7)
df
# x y
# 1 1 1
# 2 7 1
# 3 8 2
# 4 15 3
# 5 24 4
# 6 100 15
# 7 9 2
# 8 19 3
# 9 128 19
In R you will often find it easier (less typing and less thinking) to use vectorized operators than for loops for simple computations like this. In this case we performed calls to +, /, and floor over a whole vector instead of looping and using them on each element.

In R, extract rows based on strings in different columns

Sorry if the solution to my problem is already out there, and I overlooked it. There are a lot of similar topics which all helped me understand the basics of what I'm trying to do, but did not quite solve my exact problem.
I have a data frame df:
> type = c("A","A","A","A","A","A","B","B","B","B","B","B")
> place = c("x","y","z","x","y","z","x","y","z","x","y","z")
> value = c(1:12)
>
> df=data.frame(type,place,value)
> df
type place value
1 A x 1
2 A y 2
3 A z 3
4 A x 4
5 A y 5
6 A z 6
7 B x 7
8 B y 8
9 B z 9
10 B x 10
11 B y 11
12 B z 12
>
(my real data has 3 different values in type and 10 in place, if that makes a difference)
I want to extract rows based on the strings in columns m and n.
E.g. I want to extract all rows that contain A in type and x and z in place, or all rows with A and B in type and y in place.
This works perfectly with subset, but I want to run my scripts on different combinations of extracted rows, and adjusting the subset command every time isn't very effective.
I thought of using a vector containing as elements what to get from type and place, respectively.
I tried:
v=c("A","x","z")
df.extract <- df[df$type&df$place %in% v]
but this returns an error.
I'm a total beginner with R and programming, so please bear with me.
You could try
df[df$type=='A' & df$place %in% c('x','y'),]
# type place value
#1 A x 1
#2 A y 2
#4 A x 4
#5 A y 5
For the second case
df[df$type %in% c('A', 'B') & df$place=='y',]
Update
Suppose, you have many columns and needs to subset the dataset based on values from many columns. For example.
set.seed(24)
df1 <- cbind(df, df[sample(1:nrow(df)),], df[sample(1:nrow(df)),])
colnames(df1) <- paste0(c('type', 'place', 'value'), rep(1:3, each=3))
row.names(df1) <- NULL
You can create a list of the values from the columns of interest
v1 <- setNames(list('A', 'x', c('A', 'B'),
'x', 'B', 'z'), paste0(c('type', 'place'), rep(1:3, each=2)))
and then use Reduce
df1[Reduce(`&`,Map(`%in%`, df1[names(v1)], v1)),]
you can make a function extract :
extract<-function(df,type,place){
df[df$type %in% type & df$place %in% place,]
}
that will work for the different subsets you want to do :
df.extract<-extract(df=df,type="A",place=c("x","y")) # or just extract(df,"A",c("x","y"))
> df.extract
type place value
1 A x 1
2 A y 2
4 A x 4
5 A y 5
df.extract<-extract(df=df,type=c("A","B"),place="y") # or just extract(df,c("A","B"),"y")
> df.extract
type place value
2 A y 2
5 A y 5
8 B y 8
11 B y 11

Creating a new column to a data frame using a formula from another variable

I want to create a new column to a data frame using a formula from another variable.
Example:
I have a data set "aa" is;
x y
2 3
4 5
6 7
My R code is;
>bb <- "x+y-2"
>attach(aa)
>aa$z<- bb
>detach(aa)
the result is;
x y z
2 3 x+y-2
4 5 x+y-2
6 7 x+y-2
but I want like this;
x y z
2 3 3
4 5 7
6 7 11
Could you please help me..
If you want to evaluate an expression in the context, of a data frame, you can use with and within.
aa$z <- with(aa, x + y - 2)
or
aa <- within(aa, z <- x + y - 2)
Or, if your expression is in the form of a text string (you should see if there are other ways to write your code; evaluating arbitrary text strings can lead to lots of problems):
aa$z <- eval(parse(text="x + y - 2"), aa)
you can use mutate from the package dplyr
library(dplyr)
aa <- aa %>% mutate(z = x+y-2)
Hope it helps.
You should probably read some basic tutorials on R other than An Introduction to R as despite what is written there the $ notation is more sensible and easier to understand than attach/detach. Try this in the meantime.
aa <- data.frame(x = c(2, 4, 6), y = c(3, 5, 7))
Which gives:
> aa
x y
1 2 3
2 4 5
3 6 7
Then enter:
aa$z <- (aa$x + aa$y) - 2
Which gives:
> aa
x y z
1 2 3 3
2 4 5 7
3 6 7 11

Resources