I have a variable called "exposed" and I know already the sum of exposed people over time: have a look to understand
i
exposed
1
y
2
y
3
y
4
n
5
n
So I have 3 exposed individuals and 2 are not.
t <- 5
#I know that each i in t :
sum(exposed[i]) <- c(3,4,1,4,5)
I created this line of code to capture the change in data:
evol <- list()
for(i in 1:t){evol[[i]]<- df}
for (i in 2:t) {
# condition
}
My question is: what is the condition that I have to write to have in:
evol[[1]]
a data that looks like this:
i
exposed
1
y
2
y
3
y
4
n
5
n
evol[[2]]
the data looks like this:
i
exposed
1
y
2
y
3
y
4
y
5
n
evol[[3]]
a data that looks like this:
i
exposed
1
y
2
n
3
n
4
n
5
n
I hope I made it clear;
Any ideas please;
kind regards.
If I'm understanding you correctly, you want a list of dataframes based on the exposed sums.
Using lapply you can do
exposed <- c(3,4,1,4,5)
evol <- lapply(exposed, \(x) data.frame(i = seq_along(exposed), exposed = c(rep("y", x), rep("n", length(exposed) - x ))) )
evol[[1]]
i exposed
1 1 y
2 2 y
3 3 y
4 4 n
5 5 n
Related
I have a variable called "exposed" and I know already the sum of exposed people over time, "index" is how many people the individual meet in a week: have a look to understand
i
exposed
index
1
y
22
2
y
12
3
y
6
4
n
54
5
n
3
So I have 3 exposed individuals and 2 are not.
t <- 5
#I know that each i in t :
sum(exposed[i]) <- c(3,4,1,4,5)
I created this line of code to capture the change in data:
evol <- list()
for(i in 1:t){evol[[i]]<- df}
for (i in 2:t) {
# condition
}
If the number of exposed at [t] is higher than the number of exposed at [t-1] so individuals with exposed==n and have a higher index change the exposed variable from n to y.
If the number of exposed at [t] is lower than the number of exposed in [t-1] so individuals with exposed==y and have the lower index change the exposed variable from y to n.
My question is: what is the condition that I have to write to have in:
A data set that looks like this:
evol[[1]]
i
exposed
index
1
y
22
2
y
12
3
y
6
4
n
54
5
n
3
should change to the data hat looks like this:
evol[[2]]
i
exposed
index
1
y
22
2
y
12
3
y
6
4
y
54
5
n
3
If the data looks like this:
evol[[3]]
i
exposed
index
1
n
22
2
n
12
3
n
6
4
y
54
5
n
3
I hope I made it clear;
Any ideas please;
kind regards.
Here is a function to change the vector exposed.
change_exposed <- function(exposed, index) {
stopifnot(length(exposed) == length(index))
for(i in seq_len(length(index))[-1L]) {
if(index[i] > index[i - 1L]) {
exposed[i] <- if(exposed[i] == "y") "n" else "y"
}
}
exposed
}
change_exposed(evol[[1]]$exposed, evol[[1]]$index)
#[1] "y" "y" "y" "y" "n"
Assign the result to exposed to actually change the data set.
evol[[1]]$exposed <- change_exposed(evol[[1]]$exposed, evol[[1]]$index)
identical(evol[[1]], evol[[2]])
# [1] TRUE
Data
evol <- list()
x <- 'i exposed index
1 y 22
2 y 12
3 y 6
4 n 54
5 n 3'
evol[[1]] <- read.table(textConnection(x), header = TRUE)
x <- 'i exposed index
1 y 22
2 y 12
3 y 6
4 y 54
5 n 3'
evol[[2]] <- read.table(textConnection(x), header = TRUE)
I need to compare questions of two different surveys (t1, t2).
Therefore, I have two dataframes like those below:
t1 t2
x x
x y
y z
z w
y z
x x
z y
z w
w x
z
v
This data needs to be grouped by v, w, x, y and z.
Unfortunately, value v does not occur in the first dataframe and both dataframes have a diffent amount of rows, so I cannot put them together in one dataframe.
When I use "group_by" and "summarise", I get two columns, but one with 4 and the one with 5 rows. Same like before, I cannot put them together.
I don't want to add an additional row in the first dataframe, as I don't want to manipulate the origin dataset.
At the end I need a table, which should look like the following one:
t1 t2
v 0 1
w 1 2
x 3 3
y 2 2
z 3 3
I hope you can help me!
Thank you!
One way would be:
library(tidyverse)
bind_rows(
gather(t1),
gather(t2)
) %>% {table(.$val, .$key)}
Output:
t1 t2
v 0 1
w 1 2
x 3 3
y 2 2
z 3 3
Suppose the data frame is like this:
df <- data.frame(x = c(1,7,8,15,24,100,9,19,128))
How do I create a new variable that satisfies the following condition:
y = 1 if 1<=x<=7
y = 2 if 8<=x<=14
y = 3 if 15<=x<=21
...
y = k if 1+7*(k-1)<= x<= 7+7*(k-1)
so that I can have the new data frame like this
df <- data.frame(y = c(1,1,2,3,4,15, 2,3, 19))
I am wondering if a for loop can be applied in this case.
Via simple algebra, you can do:
df$y <- floor((df$x+6)/7)
df
# x y
# 1 1 1
# 2 7 1
# 3 8 2
# 4 15 3
# 5 24 4
# 6 100 15
# 7 9 2
# 8 19 3
# 9 128 19
In R you will often find it easier (less typing and less thinking) to use vectorized operators than for loops for simple computations like this. In this case we performed calls to +, /, and floor over a whole vector instead of looping and using them on each element.
Sorry if the solution to my problem is already out there, and I overlooked it. There are a lot of similar topics which all helped me understand the basics of what I'm trying to do, but did not quite solve my exact problem.
I have a data frame df:
> type = c("A","A","A","A","A","A","B","B","B","B","B","B")
> place = c("x","y","z","x","y","z","x","y","z","x","y","z")
> value = c(1:12)
>
> df=data.frame(type,place,value)
> df
type place value
1 A x 1
2 A y 2
3 A z 3
4 A x 4
5 A y 5
6 A z 6
7 B x 7
8 B y 8
9 B z 9
10 B x 10
11 B y 11
12 B z 12
>
(my real data has 3 different values in type and 10 in place, if that makes a difference)
I want to extract rows based on the strings in columns m and n.
E.g. I want to extract all rows that contain A in type and x and z in place, or all rows with A and B in type and y in place.
This works perfectly with subset, but I want to run my scripts on different combinations of extracted rows, and adjusting the subset command every time isn't very effective.
I thought of using a vector containing as elements what to get from type and place, respectively.
I tried:
v=c("A","x","z")
df.extract <- df[df$type&df$place %in% v]
but this returns an error.
I'm a total beginner with R and programming, so please bear with me.
You could try
df[df$type=='A' & df$place %in% c('x','y'),]
# type place value
#1 A x 1
#2 A y 2
#4 A x 4
#5 A y 5
For the second case
df[df$type %in% c('A', 'B') & df$place=='y',]
Update
Suppose, you have many columns and needs to subset the dataset based on values from many columns. For example.
set.seed(24)
df1 <- cbind(df, df[sample(1:nrow(df)),], df[sample(1:nrow(df)),])
colnames(df1) <- paste0(c('type', 'place', 'value'), rep(1:3, each=3))
row.names(df1) <- NULL
You can create a list of the values from the columns of interest
v1 <- setNames(list('A', 'x', c('A', 'B'),
'x', 'B', 'z'), paste0(c('type', 'place'), rep(1:3, each=2)))
and then use Reduce
df1[Reduce(`&`,Map(`%in%`, df1[names(v1)], v1)),]
you can make a function extract :
extract<-function(df,type,place){
df[df$type %in% type & df$place %in% place,]
}
that will work for the different subsets you want to do :
df.extract<-extract(df=df,type="A",place=c("x","y")) # or just extract(df,"A",c("x","y"))
> df.extract
type place value
1 A x 1
2 A y 2
4 A x 4
5 A y 5
df.extract<-extract(df=df,type=c("A","B"),place="y") # or just extract(df,c("A","B"),"y")
> df.extract
type place value
2 A y 2
5 A y 5
8 B y 8
11 B y 11
I want to create a new column to a data frame using a formula from another variable.
Example:
I have a data set "aa" is;
x y
2 3
4 5
6 7
My R code is;
>bb <- "x+y-2"
>attach(aa)
>aa$z<- bb
>detach(aa)
the result is;
x y z
2 3 x+y-2
4 5 x+y-2
6 7 x+y-2
but I want like this;
x y z
2 3 3
4 5 7
6 7 11
Could you please help me..
If you want to evaluate an expression in the context, of a data frame, you can use with and within.
aa$z <- with(aa, x + y - 2)
or
aa <- within(aa, z <- x + y - 2)
Or, if your expression is in the form of a text string (you should see if there are other ways to write your code; evaluating arbitrary text strings can lead to lots of problems):
aa$z <- eval(parse(text="x + y - 2"), aa)
you can use mutate from the package dplyr
library(dplyr)
aa <- aa %>% mutate(z = x+y-2)
Hope it helps.
You should probably read some basic tutorials on R other than An Introduction to R as despite what is written there the $ notation is more sensible and easier to understand than attach/detach. Try this in the meantime.
aa <- data.frame(x = c(2, 4, 6), y = c(3, 5, 7))
Which gives:
> aa
x y
1 2 3
2 4 5
3 6 7
Then enter:
aa$z <- (aa$x + aa$y) - 2
Which gives:
> aa
x y z
1 2 3 3
2 4 5 7
3 6 7 11