Goal
I want to see how random profits/losses can accumulate over time.
library(dplyr)
names <- seq(1:1000)
wealth <- rep(0, 1000)
df <- as.data.frame(bind_cols(names, wealth))
colnames(df) <- c("households", "wealth")
View(df)
Attempt
df2 <- df %>%
mutate(new_wealth = runif(1000,-10,10),
new_wealth2 = runif(1000,-10,10),
final_wealth = wealth + new_wealth+new_wealth2) %>%
select(households, final_wealth)
Solution Wanted
Instead of creating loads of "new_wealth" columns, I want to add a vector of random numbers to the final_wealth column, say 100 times, and then see the result. Ideally, I would do it 1000 times.
I don't care about the new_wealth columns, and just want the final_wealth column. Can I use lapply to do this? If not, do any of you have a better solution?
Thanks!
We can use tidyverse
library(dplyr)
library(purrr)
df1 <- df %>%
mutate(final_wealth = rerun(100, runif(n(), -10, 10)) %>%
reduce(`+`))
We can use replicate to repeat the runif code any number of times. Use rowSums to perform the sum and add a new column.
n <- nrow(df)
n_repeat <- 100
df$final_wealth <- rowSums(replicate(n_repeat, runif(n,-10,10)))
Related
For example, my df now is:
person <- c("a","a","a","b","b","b","c","c","c")
score <- c(31,2,13,5,6,7,8,9,4)
df <- data.frame(person,score)
what I want to get is a two-column table with three rows.
[1,1]="a", [1,2]= a vector of c(31,2,13)
[2,1]="b", [2,2]= a vector of c(5,6,7)
[3,1]="c", [3,2]= a vector of c(8,9,4)
Actually, I just want the three vectors to perform another function but I tried something like the following code, it didn't work(the actual function is much more complex but it takes in two vectors of the same length where one is provided).
f <- function(x,y){x-y}
df <- df %>%
group_by(person) %>%
summarise(diff = f(c(1,2,3), score))
Thanks so much in advance!
Base R solution:
aggregate(
score ~ person,
df,
list
)
Tidyverse solution:
library(dplyr)
df %>%
group_by(person) %>%
summarise(score = list(score))
I would like to create lagged values for multiple columns in R.
First, I used a function to create lead/lag like this:
mleadlag <- function(x, n, ts_id) {
pos <- match(as.numeric(ts_id) + n, as.numeric(ts_id))
x[pos]
}
Second, I would like to apply this function for several columns in R. firm.characteristics is list of columns I would like to compute lagged values.
library(dplyr)
firm.characteristics <- colnames(df)[4:6]
for(i in 1:length(firm.characteristics)){
df <- df %>%
group_by(company) %>%
mutate(!!paste0("lag_", i) := mleadlag(df[[i]] ,-1, fye)) %>%
ungroup()
}
However, I didn't get the correct values. The output for all companies in year t is the last row in year t-1. It didn't group by the company any compute the lagged values.
Can anyone help me which is wrong in the loop? Or what should I do to get the correct lagged values?
Thank you so much for your help.
Reproducible sample could be like this:
set.seed(42) ## for sake of reproducibility
n <- 6
dat <- data.frame(company=1:n,
fye=2009,
x=rnorm(n),
y=rnorm(n),
z=rnorm(n),
k=rnorm(n),
m=rnorm(n))
dat2 <- data.frame(company=1:n,
fye=2010,
x=rnorm(n),
y=rnorm(n),
z=rnorm(n),
k=rnorm(n),
m=rnorm(n))
dat3 <- data.frame(company=1:n,
fye=2011,
x=rnorm(n),
y=rnorm(n),
z=rnorm(n),
k=rnorm(n),
m=rnorm(n))
df <- rbind(dat,dat2,dat3)
I would try to stay away from loops in the tidyverse. Many of the tidyverse applications that would traditionally require loops already exist and are very fast, which creates more efficient and intuitive code (the latter being my opinion). This is a great use case for dplyr's across() functionality. I first changed the df to a tibble.
df %>%
as_tibble() %>%
group_by(company) %>%
mutate(
across(firm.characteristics, ~lag(., 1L))
) %>%
ungroup()
This generates the required lagged values. For more information see dplyr's across documentation.
I have a data frame where two columns mark the beginning and end of regions I need to manipulate in another data frame. Instead of applying a for I decided to create a logical vector with the rows I'm interested
df <- data.frame(b=c(7,25,32,44),e=c(11,27,39,48),n=c('a','b','c','d'))
logint <- rep(F,50)
log_vec <- apply(df[,c('b','e')],1, function(x){logint[x['b']:x['e']] <- T;return(logint)})
However, the result a matrix with one column for each row of df. I know I can solve this with
log_vec <- Reduce(`|`,as.data.frame(log_vec))
but if the number of rows in df is too large, there is not enough memory to allocate the matrix resulting from apply.
Do you have a better solution?
Thanks!
We can use mapply/Map to create a sequence between b and e values and turn them to TRUE.
logint <- rep(FALSE,50)
logint[unlist(Map(`:`, df$b, df$e))] <- TRUE
We can also do this with map2
library(dplyr)
library(purrr)
df %>%
transmute(new = map2(b, e, `:`)) %>%
pull(new) %>%
flatten_int %>%
replace(logint, ., TRUE)
I'd like to apply a function, which calculates the difference of two values from adjacent columns and scores the difference based on one of the input values, to a range of columns of a dataframe. The score shout appear as new column next to one of the columns that was used for the calculation. I wrote a function which is doing the job for single vectors/columns but I got stuck when I tried to use this function with mutate_at over a range of columns.
Here is what I tried so far:
# data
set.seed(123)
df <-data.frame(d1= 20,
d2= seq(20,15,-0.1)[1:50],
d3= seq(20,15,-0.1)[1:50]+ rnorm(50,0,3))
# scoring function
f_score <- function(a,b){
ifelse(a-b>=a*0.2,"high",
ifelse(a*0.2>a-b & a-b>=a*0.15,"mid",
ifelse(a*0.15>a-b & a-b>=a*0.1,"low","ok")))
}
# scoring function works for single columns
f_score(df$d1,df$d2) %>% setNames(round(df$d1-df$d2,2))
# and scoring function works this way,too
f_score(df[,1:2],df[,2:3])
# I can easily do this
df1 <- mutate(df,score=f_score(d1,d2))
df1
# this comes close to what I want to achieve
df2 <- df %>% mutate_at(vars(names(.)[2:3]), .funs= funs(score= f_score(d1,.)))
df2
#but the second calculation should use the values from d2 instead of d1
#I would like to do something like this
df3 <- df %>% mutate_at(vars(names(.)[2:3]), .funs= funs(score=f_score(c(1:2),.)))
#but this is not working
# or
df3 <- df %>% mutate_at(vars(names(.)[2:3]), .funs= funs(score=f_score(df[1:2],.)))
# I would like to end up with something like this
df4 <- mutate_at(df, vars(c(d2)), .funs= funs(score_d2= f_score(d1,.)))
df4 <- mutate_at(df4, vars(c(d3)), .funs= funs(score_d3= f_score(d2,.)))
df4 <- select(df4,d1,d2, score_d2, d3, score_d3)
I am quite new to R and SO and I hope I could my problem clear. Any help and explanation to the problem with my code is highly appreciated.
and thanks to all in advance.
I have the following data:
set.seed(123)
data <- data.frame (name=LETTERS[sample(1:26, 500, replace=T)],present=sample(0:1,500,replace = T))
And I want to quickly calculate the percentage of present observations (1's) for each letter. I can do it manually, but I believe there is an easier way to do this:
library(dplyr)
A <- filter(data, name=="A" & present==1)
A2 <- filter(data, name=="A")
data$Percentage[data$name=="A"] <- nrow(A)/nrow(A2)
And so on until I arrive to "Z".
Can I make this task automatically without having to change the values of the "name" colum manually?
Best regards,
We can use prop.table with table to get the proportion
prop.table(table(data), 1)[,2]
To add it as a column, we can expand it by matching with the 'names'
data$Percentage <- prop.table(table(data), 1)[,2][as.character(data$name)]
Or as #Lars Lau Raket suggested, we don't need to convert to character
prop.table(table(data), 1)[,2][data$name]
If we need to create a column
library(dplyr)
data %>%
group_by(name) %>%
mutate(Percentage = mean(present==1))